CN109598229A - Monitoring system and its method based on action recognition - Google Patents
Monitoring system and its method based on action recognition Download PDFInfo
- Publication number
- CN109598229A CN109598229A CN201811453471.2A CN201811453471A CN109598229A CN 109598229 A CN109598229 A CN 109598229A CN 201811453471 A CN201811453471 A CN 201811453471A CN 109598229 A CN109598229 A CN 109598229A
- Authority
- CN
- China
- Prior art keywords
- posture
- action
- video
- skeleton
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000009471 action Effects 0.000 title claims abstract description 106
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000012544 monitoring process Methods 0.000 title claims abstract description 35
- 239000013598 vector Substances 0.000 claims abstract description 64
- 238000012549 training Methods 0.000 claims description 33
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 238000007635 classification algorithm Methods 0.000 claims description 3
- 230000003068 static effect Effects 0.000 claims description 3
- 238000013145 classification model Methods 0.000 claims 2
- 230000036544 posture Effects 0.000 description 159
- 238000001514 detection method Methods 0.000 description 16
- 238000005516 engineering process Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 7
- 238000003860 storage Methods 0.000 description 7
- 210000003414 extremity Anatomy 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 210000000988 bone and bone Anatomy 0.000 description 5
- 238000000354 decomposition reaction Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 230000000875 corresponding effect Effects 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 210000003423 ankle Anatomy 0.000 description 3
- 210000001513 elbow Anatomy 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 210000001624 hip Anatomy 0.000 description 3
- 210000003127 knee Anatomy 0.000 description 3
- 210000003739 neck Anatomy 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 210000002832 shoulder Anatomy 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 210000000707 wrist Anatomy 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 210000002478 hand joint Anatomy 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
This disclosure relates to a kind of monitoring system and method for the offer based on action recognition.This method comprises: identifying the position of the personnel in monitored video frame using Attitude estimation method, skeleton 2D modeling is carried out;Classified using preparatory trained posture disaggregated model to the 2D model of the skeleton in monitored video frame;Posture classification results in successive video frames are stored in attitude vectors, and type of action is judged according to preparatory trained action recognition model;And the type of action such as judged belongs to monitored type, then stores the video clip for marking the video frame for making specific action and/or the movement to memory and trigger alarm.
Description
Technical field
This disclosure relates to monitoring system and its method based on action recognition, more particularly, to using attitude prediction technology,
Gesture recognition technology and action recognition technology, judge whether the personnel in monitored video make specific action, if detected
It is specific to act then automatic alarm, and save the system and method for associated video frame and video file segments file for future examination.
Background technique
The movement of object has decisive role in terms of judging object behavior.Regardless of object be people or animal also or
Machinery will realize set objective, require to realize by corresponding actions.
Chinese invention patent application discloses CN107992858A and proposes a kind of real-time three-dimensional gesture based on single RGB frame
Estimating system and its method detect using hand detector and confine hand region, identify hand joint using OpenPose
The position 2D is minimized hand 3D models fitting to 2D joint position using non-linear squares, restores hand gestures.The method benefit
The modeling to hand is realized with OpenPose method and gesture is identified.However, this method is not suitable for other limbs to people
Body (such as wrist, elbow, shoulder, neck, hip, knee, ankle, refer to) joint is detected, and also is suitble to divide other limbs without configuration
Effective sorting algorithm of class.In addition, method disclosed in CN107992858A mainly identifies the gesture in single frame,
Be not suitable for identifying the movement in the video of continuous multiple frames.
Chinese invention patent application discloses CN108427331A and proposes a kind of man-machine collaboration safety protecting method and system,
It passes through meter using RGB-D sensor and OpenPose testing staff's coordinate using RGB-D sensor identification robot coordinate
Calculate the speed that the distance between people and robot control robot.This method also realizes position using OpenPose method
Modeling, and be by the judgement to position and distance is done after Human Modeling, to realize to the control of the dynamic speed of robot.Cause
This, the prior art is not over to the action recognition after Human Modeling, to judge the type of movement.
Chinese invention patent application, which discloses CN108416795A and proposes, a kind of merges space characteristics based on sequence pondization
Video actions recognition methods calculates the visual feature vector collection of video frame, constructs two-dimensional space pyramid model to video frame,
And action classification is judged after the visual feature vector collection in subspace is handled and classified.The method uses the two dimension to video frame
The mode that space carries out multi-scale division realizes the detection to movement, therefore is that the visual signature based on original video frame is divided
Class.The technology is not suitable for distinguishing independent human skeleton model, and carries out to skeleton model (rather than original video feature)
Classification judges posture, thus is not suitable for being acted according to posture sequence judgement.
Therefore, it is necessary to one kind can utilize attitude prediction technology, gesture recognition technology and action recognition technology, judge to be supervised
Whether the personnel in control video make specific action, the automatic alarm if detecting specific movement, and save associated video
The system and method for frame and video file segments file for future examination.
Summary of the invention
For this purpose, the purpose of the disclosure is to judge quilt using attitude prediction technology, gesture recognition technology and action recognition technology
Whether the personnel in monitor video make specific action, the automatic alarm if detecting specific movement, and save related view
Frequency frame and video file segments file for future examination.
To achieve the goals above, according to one aspect of the disclosure, a kind of monitoring method based on action recognition is provided,
The following steps are included: a) identifying the position of the personnel in monitored video frame using Attitude estimation method, skeleton is carried out
2D modeling;B) the 2D model of the skeleton in monitored video frame is divided using preparatory trained posture disaggregated model
Class;C) posture classification results in successive video frames are stored in attitude vectors, and are sentenced according to preparatory trained action recognition model
Disconnected type of action;And type of action d) such as judged belongs to monitored type, then will depict label and make specific action
Video frame and/or the video clip of the movement store and to memory and trigger alarm.
Preferably, the step a) includes: a1) judge that the major joint position of one or more personnel in video frame is sat
Mark;And skeleton 2D modeling a2) is carried out to each personnel using the relationship between major joint position coordinates and joint.
Preferably, the step a2) it further include that human body is carried out to the hand of each of video and/or the face of face
Bone 2D modeling.
Preferably, the step b) includes: that the continuous limb action identified will be needed to resolve into discrete key poses;It is right
Skeleton 2D modeling result carries out key poses mark;And utilize convolutional neural networks algorithm and the skeleton being marked
2D modeling result trains posture disaggregated model.
Preferably, the step c) includes: to be labeled the attitude vectors collection of known action;And the appearance that will have been marked
State vector set is as training set training action identification model.
Preferably, the step d) at least one of includes the following steps: being denoted as out specific action in original video acceptance of the bid
Object, and trigger alarm;The video frame that specific action is marked archive is stayed into card;And the video that specific action will be marked
Segment archive stays card.
Preferably, the step c) is further comprised the steps of: is sentenced using hot-zone (ROI, Region of Interest) comparison method
The movement for the tracked object broken under more people's scenes.
Preferably, the hot-zone refers to the specified region of monitor video, if not specifying region, hot-zone is exactly entire
Monitored picture region.
Preferably, the step c) further include: tracker is added to monitor its movement to each of video, and is sentenced
Whether disconnected tracked object needs to continue to be tracked, if you do not need to continuing to track, then deletes the tracker.
Preferably, wherein by judging whether detected object is at least one of following state to determine whether needs
Continue to track: detected object reaches specified region;Region is left in detected object arrival;Detected object remains static
More than certain time;And whether receive the instruction for stopping carrying out continuing tracking to the object in monitoring area.
According to another aspect of the disclosure, a kind of monitoring system based on action recognition is provided, comprising: attitude prediction portion
Point, the position of the personnel in monitored video frame is identified using Attitude estimation method, and carry out according to position obtained
Skeleton 2D modeling;Posture classified part, using preparatory trained posture disaggregated model in monitored video frame
The 2D model of skeleton is classified;Posture administrative section, by successive video frames posture classification results be stored in posture to
Amount;Action recognition part judges type of action according to preparatory trained action recognition model;And output par, c,
When the type of action judged belongs to monitored type, label is made to the video frame of specific action and/or the video of the movement
Segment storage is to memory and triggers alarm.
Preferably, the monitoring system further include: posture classification based training part, to obtained skeleton 2D model
Key poses mark is carried out, and will be as training set input convolutional neural networks training point by the skeleton 2D model marked
Class model is trained, to obtain posture disaggregated model;And action recognition training part, by known action video through posture
The attitude vectors that administrative section generates are trained attitude vectors using multivariate classification algorithm, as training set to be moved
Make identification model, carries out the classification of motion for attitude vectors.
Since the disclosure is modeled using the skeleton 2D that human body gesture prediction method carries out personnel in detection video, and utilize
Posture classification classifies to posture, with attitude vectors record posture sequence, and using classification of motion method to human action into
Row identification, it is achieved that real-time perfoming human action identifies in automated production, so as to realize unattended work
Industry monitoring.
The human body of key person under more people's scenes in fixed area is moved in addition, the disclosure is realized using hot-zone comparison method
It identifies, more human actions, which identify, to be realized to multiple players under more people's scenes using object tracing method, so as to be used for
The different application scenarios such as production commander and environmental monitoring.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure
Example, and together with specification for explaining the principles of this disclosure.
Fig. 1 is the schematic block diagram for illustrating the monitoring system based on action recognition of one embodiment according to the disclosure;
Fig. 2 is the schematic block diagram for illustrating the gesture recognition part according to one embodiment of the disclosure;
Fig. 3 is the flow chart for illustrating posture administrative section and updating attitude vectors;
Fig. 4 is the detailed schematic block diagram according to the action recognition part of one embodiment of the disclosure;
Fig. 5 is the operational flowchart according to the monitoring system based on action recognition of one embodiment of the disclosure;
Fig. 6 is the view for showing the major joint of human body;
Fig. 7 is the view for showing the incidence relation between each joint;
Fig. 8 shows the example of several postures;
Fig. 9 a and 9b respectively illustrate the example of several postures;
Figure 10 shows a specified hot-zone in video area;And
Figure 11 shows the schematic diagram that object tracing is carried out in the case where detected object is movement.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
It is only to be not intended to be limiting and originally open merely for for the purpose of describing particular embodiments in the term that the disclosure uses.It removes
Non- defined otherwise, every other scientific and technical terms used herein have and those skilled in the art
Normally understood identical meaning.The "an" of the singular used in disclosure and the accompanying claims book, " institute
State " and "the" be also intended to including most forms, unless the context clearly indicates other meaning.It is also understood that making herein
Term "and/or" refers to and may combine comprising one or more associated any or all of project listed.
It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the disclosure
A little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing from
In the case where disclosure range, first can also be referred to as second, and vice versa.Depending on context, word as used in this
Language " if " can be construed to " ... when " or " when ... " or " in response to determination ".
In order to make those skilled in the art more fully understand the disclosure, with reference to the accompanying drawings and detailed description to this public affairs
It opens and is described in further detail.
Fig. 1 is the schematic block diagram for illustrating the monitoring system based on action recognition of one embodiment according to the disclosure.
As shown in Figure 1, the monitoring system includes video acquisition part 110, gesture recognition part 120, action recognition part 130 and defeated
Part 140 out.
Video data is collected by video capture device, such as mobile phone, camera, network etc. in video acquisition part 110, so
The collected video data (video flowing) of institute is converted into video frame afterwards, is supplied to the use of gesture recognition part 120.
Whether the personnel that gesture recognition part 120 is detected in video frame are in predefined posture, and pre- detecting
In the case where the posture of definition, the skeleton 2D model of each personnel is established according to testing result.If in attitude detection
Attitude detection result, then be added in attitude vectors by the decomposition posture for finding any predefined multiple movements in the process, and
Attitude vectors are sent to action recognition part 130.
Action recognition part 130 judges whether attitude vectors are monitored movement.If it is monitored movement, movement
Identification division 120 just exports the video frame of the movement key poses and/or the video clip of the movement to output par, c 140.
Key poses frame and/or piece of video corresponding to the movement that output par, c 140 identifies the action recognition part
Section is output in data storage device, video display apparatus and/or audio playing device (not shown).
Fig. 2 is the schematic block diagram for illustrating the gesture recognition part 130 according to one embodiment of the disclosure.Gesture recognition portion
Dividing 130 includes attitude prediction part 210, posture classified part 220, posture classification based training part 230, posture administrative section 240.
Attitude prediction part 210 is predicted by the human body attitude in video frame.According to one embodiment of the disclosure, appearance
State predicted portions 210 judge the 2D coordinate position in the limbs key joint of all personnel in video frame using OpenPose technology,
Then according to customized joint incidence relation, the personnel each to detect establish skeleton 2D model.For predefined
Human action, need to resolve into continuous limb action discrete key poses (be similar to setting-up exercises to radio music map).It is preferred that
Ground, attitude prediction part 210 can also carry out skeleton 2D modeling to the hand of each of video and/or in videos
Everyone face carry out human body face 2D modeling.
Although using OpenPose technology according to one embodiment of the disclosure and carrying out human body gesture prediction, but it should
Understand, human body gesture prediction can also be carried out using any other similar technology.
There are two types of operation modes for the tool of gesture recognition part 130: posture training mode and gesture recognition mode.In posture training
Under mode, posture classification based training part 230 carries out key poses to the obtained skeleton 2D model in attitude prediction part 210
Mark.The skeleton 2D model marked will be trained disaggregated model as training set input convolutional neural networks.It will instruction
The posture disaggregated model perfected classifies automatically to skeleton 2D model as the posture disaggregated model under recognition mode.
Under gesture recognition mode, attitude prediction part 210 sends skeleton 2D model to posture classified part
220, and classified using posture disaggregated model trained under training mode to posture.It, should for any application scenarios
An initial attitude is defined for the posture exploded view of each movement.
Posture administrative section 240 safeguards the attitude vectors P (c, s) of current video, and wherein c is posture classification, such as some
Common human body attitude: standard sitting posture, making and receiving calls bend over the desk to rest or some specific business gestures, such as arm extension, hand
Brachiocylloosis is clenched fist, and s is the number that each posture classification c is consecutively detected.
Fig. 3 is the flow chart for illustrating posture administrative section 240 and updating attitude vectors.Once as shown in figure 3, posture division
Divide 220 to detect predefined posture (310) in the video frame, just sends this posture classification to posture administrative section 240.
Whether the test pose vector P (c, s) first of posture administrative section 240 is sky, and attitude vectors are not have in " sky " i.e. attitude vectors
Save any posture record (320).If attitude vectors are empty (320), posture administrative section 240 judges that current pose is
No is initial attitude (330).So-called initial attitude, i.e. some posture decompose first map posture in map.For example, working as quilt
In the case that monitoring personnel is in standard sitting posture, when he make one crook one's arm the movement clenched fist when, first decomposition posture
Be arm is flattened, at this moment to help the posture for holding chair handle to have naturally with the arm under standard sitting posture bright for the position of arm in posture
Aobvious difference, so as to be considered the beginning of a movement.If not initial attitude, then just terminating to judge, and abandon
The posture (370).For example, it is not any known movement that this, which decomposes posture, if the posture monitored is hand arm held upward
First posture of posture is decomposed, therefore may determine that its subsequent action is also not and need monitored movement, therefore can neglect
The slightly posture.In other words, the posture and the subsequent posture that may develop are not required to one of the various gesture sets of monitoring
Point, therefore do not have to continue to monitor.Certainly, if be monitored in the future for such posture needs, monitoring is included at it
It may also be considered as initial attitude without being ignored after range.If it is initial attitude, then posture administrative section
240 are just stored in attitude vectors (340) using the posture as current pose, and to the counter+1 of current pose (360).It is so-called to work as
Preceding posture refers to the newest posture recorded in posture vector.Because video is usually 30 frames/second, so if to continuous videos
It is monitored frame by frame, the posture in many frames can be identified as same posture because difference is smaller, so being worked as with counter is accumulative
The frequency of occurrence of preceding posture.If not initial attitude, then posture administrative section 240 is judged as whether the posture is current appearance
State (the i.e. upper key poses detected) (350).In other words, initial attitude is to judge and record opening for a movement
Begin.Current pose refer to record one movement multiple decomposition posture (i.e. key poses) when, the key poses of state-of-the-art record.
When one movement of rigid start recording, initial attitude with current pose should be it is identical, it is different during action record
When the posture of initial attitude is detected, new movement posture will become current pose, at this moment, initial attitude and current appearance
It is different when state.If it is not, so posture administrative section 240 is just stored in attitude vectors using the posture as current pose
(340), start to count new current pose, that is, new initial attitude, i.e., to the counter+1 of current pose
(360).If so, namely detected posture and tightly previous key poses difference are tiny, then be judged as with it is previous
Key poses are identical, then posture administrative section 240 is just to the counter+1 of current pose (360).
Under the following conditions, posture administrative section 240 updates the vector for terminating one group of posture:
Current pose is to terminate posture, and end posture is corresponding with initial attitude, is multiple decomposition appearances of a movement
The last one posture in state (i.e. key poses).And its continuous frequency of occurrence is more than predefined threshold value.
The counter of current pose is not updated in N frame.This means that the movement of monitored people is in the predetermined time
It is detected in section.
The state of current pose is not changed in N frame.This means that posture is continuously in although detecting posture
Identical posture state.
System command terminates posture renewal
Video terminates or video flowing interrupts
After attitude vectors terminate update, which will be sent to subsequent module for processing, while posture management department
Divide the attitude vectors in 240 to be initialised, prepares for the record of next group of attitude vectors.
After attitude vectors terminate update, attitude vectors are done with standardization processing, including filtering is continuously monitored to secondary
Number is lower than the posture of predefined thresholds, to prevent accidental erroneous detection bring false judgment.Due to based on computer vision artificial
Intelligent gesture recognition technology may by light, angle, the factors such as block and influenced, there are the possibility of a degree of erroneous judgement
Property.Since this erroneous judgement is usually accidental, in order to reduce the influence judged by accident to integrally classifying, need to set a threshold
Value, it is no to guarantee just to retain the posture record in the case where only detecting identical posture in continuous multiple frames (being higher than threshold value)
Then think that the posture is accidental erroneous detection, not record, to improve the accuracy of whole attitude detection.After filtration, it also needs
The posture to occur to continuous several times merges.
Fig. 4 is the detailed schematic frame according to the action recognition part 130 shown in FIG. 1 of one embodiment of the disclosure
Figure.As shown in figure 3, the action recognition part 130 includes action recognition device 410, action recognition training part 420.
Also there are two types of operating modes for action recognition part: action training mode and action recognition mode.In action training mould
Under formula, the attitude vectors that sample training action video is generated through posture administrative section 240 are entered action recognition as training set
Training part 420.It is dynamic to obtain sample that action recognition training part 420 uses multivariate classification algorithm to be trained attitude vectors
Make identification model.The sample action identification model trained carries out the contrast sample of action recognition as action recognition device 410, uses
The classification of motion is carried out in the attitude vectors generated to posture administrative section 240.
Under recognition mode, the attitude vectors that posture administrative section 240 generates are directly inputted to action recognition device 410
In.Action recognition device 410 judges type of action using the trained specimen discerning model of action recognition training aids 420.In general, being
It is monitored action classification (also referred to as specific action classification) that system, which can preassign some action classifications, if it is determined that the movement
Classification be monitored movement classification, then output par, c 140 will will label make specific action video frame and/or this move
The video clip of work is stored into memory (not shown).Preferably, output par, c 140 can trigger alarm simultaneously.
Preferably, (only one crucial appearance of the movement when the initial attitude of some movement is identical with posture is terminated
State), then whether predefined posture judgment threshold can be greater than according to the value of the key poses counter, without instructing in advance
Experienced classification of motion model is to determine whether be corresponding movement.For example, to judge whether a people is making a phone call, it is assumed that posture
The movement judgment threshold of counter is to see, if that there is the posture made a phone call in continuous N (N >=k) frame, then can be sentenced
The fixed personnel are in making and receiving calls.
Fig. 5 is the operational flowchart according to the monitoring system based on action recognition of one embodiment of the disclosure.Such as Fig. 5
It is shown, in step s 51, Attitude estimation method should be utilized to identify the people in monitored video frame based on the monitoring system of action recognition
The position of member.Adoptable estimation method can be existing any method, such as OpenPose method, carry out human body
Bone 2D modeling.Modeling object includes manpower, the face of face, wrist, elbow, shoulder, neck, hip, knee, ankle, refers to.Here it is possible to sharp
Skeleton is carried out to each personnel with the relationship between the major joint position coordinates of one or more personnel in video frame
2D modeling.
Next, if having trained posture disaggregated model in advance, just in step S52, using trained in advance
Posture disaggregated model classifies to the 2D model of the skeleton in monitored video frame.If not training posture in advance
Disaggregated model then will also train first such posture disaggregated model in step S52, then recycle trained appearance in advance
State disaggregated model classifies to the 2D model of the skeleton in monitored video frame.The process of training posture disaggregated model
It is: the continuous limb action identified will be needed to resolve into discrete key poses, then skeleton 2D modeling result is carried out
Key poses mark is finally classified using convolutional neural networks algorithm and the skeleton 2D modeling result being marked training posture
Model.
Then, in step S53, attitude vectors is established and mark the attitude vectors collection of known action as training training
Practice action recognition model, to judge whether type of action belongs to monitored type.Specifically, in this step, once appearance
State classified part 220 detects trained sample posture in the video frame, just sends this posture classification to posture
Administrative section 240.Whether the test pose vector first of posture administrative section 240 is empty.If attitude vectors are sky, posture
Administrative section 240 judges whether current pose is initial attitude.If not initial attitude, then just terminating to judge, and abandon
The posture.If it is initial attitude, then posture administrative section 240 is just stored in attitude vectors using the posture as current pose,
And to the counter+1 of current pose.If not initial attitude, then posture administrative section 240 be judged as the posture whether be
Current pose (the i.e. upper key poses detected).If it is not, so posture administrative section 240 just using the posture as
Current pose is stored in attitude vectors, and to the counter+1 of current pose.If it is then posture administrative section 240 is just to working as
The counter+1 of preceding posture.
Under the following conditions, posture administrative section 240 updates the vector for terminating one group of posture:
Current pose is to terminate posture, and its continuous frequency of occurrence is more than predefined threshold value
The state of current pose is not updated in N frame
The state of current pose is not changed in N frame
System command terminates posture renewal
Video terminates or video flowing interrupts
Next, the type of action such as judged belongs to monitored type, then label is made in step S54 specific dynamic
The video frame of work and/or the storage of the video clip of the movement to memory and trigger alarm.
It should be appreciated that in the other embodiments of the disclosure, if there are more people in video, hot-zone can be used
(ROI, Region of Interest) comparison method or object tracing method are distinguished object and are tracked.
If monitored object is located at fixed area, proper using hot-zone comparison method.First in video area
Specified hot-zone, then draws its outline polygon (defaulting to rectangle) to the skeleton 2D model of each foundation, then compares
The coincidence ratio of the contour area and hot-zone.Ratio is maximum to be considered as detected object.
If detected object is movement, object is tracked using object tracing method, and record is each respectively
The attitude vectors of object.According to one embodiment of the disclosure, KCF can be used
(Kernelized Correlation Filters), BOOSTING, MIL (Multiple Instance
Learning), TLD (Tracking, learning and detection), GOTURN or other object tracing algorithms come to view
Object in screen is tracked.
In the other embodiments of the disclosure, tracker can be added to each of video.It may determine that and be tracked
Whether object needs to continue to be tracked.For example, can be by judging whether detected object is at least one of following state
To determine whether needing to continue to track: detected object reaches specified region;Region is left in detected object arrival;Detected pair
It is more than certain time as remaining static;And whether receives stopping and the object in monitoring area is carried out continuing to track
Instruction.If you do not need to continuing to track, then the tracker is deleted.
Example
The exemplary purpose is to carry out motion detection to the personage in the video generated using video camera shooting.Detection process
It is as follows.
1) skeleton 2D modeling is carried out in video area.
Fig. 6 is the view for showing the major joint of human body.As shown in fig. 6, utilizing the detection of existing human body key node
Deep learning model (such as OpenPose) detect human body major joint (such as wrist, elbow, shoulder, neck, hip, knee, ankle, refer to,
As shown in the white point in figure on human body).
Fig. 7 is the view for showing the incidence relation between each joint.As shown in fig. 7, according to joint predetermined it
Between incidence relation (such as between right hand elbow and right finesse be associated), draw skeleton, and do and standardize to model, make it
Output Size is uniform.
2) attitude prediction then is carried out to the skeleton 2D model that each frame is established out.
Fig. 8 shows the example of several postures.As shown in figure 8, using preparatory trained posture disaggregated model to human body
Bone 2D model carries out attitude prediction, and appearance is written in the prediction result that reliability forecasting is greater than predefined thresholds (such as 50%)
State vector P (c, s), wherein c is posture classification, and s is the number that each posture classification is consecutively detected.In other words, as
The posture that will acquire is compared with posture disaggregated model, and obtains the similarity with posture disaggregated model.It wherein predicts credible
Degree is that gesture recognition algorithms are calculated, and the posture for quantitative prediction is close with the sample posture of training posture disaggregated model
Like degree.Reliability forecasting threshold value is the empirical value summed up in the test environment according to practical application scene, and value can root
It is configured according to actual scene.
In example as shown in Figure 8, the reliability forecasting of one of posture is 12%, is less than predefined threshold value, then
The posture is not recorded in attitude vectors.Therefore, the attitude vectors of this group of posture are as follows:
P1=[(1,2), (2,2)]
3) finally, after terminating to the update of attitude vectors P (c, s), using action recognition model trained in advance to appearance
State vector is classified, and judges its action classification.Under the following conditions, the vector for terminating one group of posture is updated:
Current pose is to terminate posture, and its continuous frequency of occurrence is more than predefined threshold value
The state of current pose is not updated in N frame
The state of current pose is not changed in N frame
System command terminates posture renewal
Video terminates or video flowing interrupts
For example, posture 2 terminates after continuously there are 2 frames if 2 frames continuously occurs in posture 1 in attitude vectors P1, move at the same time
Identification model may judge its for movement 1 confidence level be 85%, then can be determined that the movement for movement 1.Detected by these
Posture 1 and posture 2 are all the composition postures for acting 1.
P1=[(1,2), (2,2)]=> 1 (0.85) of movement.
Fig. 9 a and 9b respectively illustrate the example of several postures.It is worth noting that, as shown in figure 9, in actual measurement,
It is also possible that the posture (such as posture 3) of certain error detections between posture 1 and posture 2.These information are to dynamic in order to prevent
The influence of judge then needs to carry out standardization processing to attitude vectors before carrying out motion detection.Standardization processing includes:
Set the threshold value (such as 2) of the continuous frequency of occurrence of posture, and filter out continuous frequency of occurrence lower than threshold value to
Magnitude.
As illustrated in fig. 9, P1=[(1,2), (3,1), (2,2)] occurs 1 times since posture 3 is continuous, by from posture
It is removed in vector, then revised vector P1'=[(1,2), (2,2)].
After vector value of the removal number lower than threshold value, merge the identical vector row value of posture.
As shown in figure 9b, [(1,2), (3,1), (1,2), (2,2)] P1=are then filtering out the vector value for only occurring 1 time
Afterwards, vector becomes:
P1'=[(1,2), (1,2), (2,2)],
Continuously occur twice it can be seen that depositing posture 1 in P1', therefore be merged into:
P1 "=[(Isosorbide-5-Nitrae), (2,2)].
If finding that the movement is predefined movement after motion detection, then will triggering alarm, and by the action video
Segment output, or the decomposition video frame of the movement is exported one by one.
Preferably, if there are more people in video, can be compared using hot-zone (ROI, Region of Interest)
Method or object tracing method are distinguished object and are tracked.
If monitored object is located at fixed area, proper using hot-zone comparison method.Figure 10 shows video area
A specified hot-zone in domain.As shown in Figure 10, hot-zone is specified first in video area, then to the human body bone of each foundation
Bone 2D model draws its outline polygon (defaulting to rectangle), then compares the coincidence ratio of the contour area and hot-zone.Ratio
It is maximum to be considered as detected object.
If detected object is movement, object is tracked using object tracing method, and record is each respectively
The attitude vectors of object.Figure 11 shows the schematic diagram that object tracing is carried out in the case where detected object is movement.Such as figure
Shown in 11, according to one embodiment of the disclosure, KCF (Kernelized Correlation Filters) can be used,
BOOSTING, MIL (Multiple Instance Learning), TLD (Tracking, learning and
Detection), GOTURN or other object tracing algorithms are tracked the object in screen.
Since the disclosure is modeled using the skeleton 2D that human body gesture prediction method carries out personnel in detection video, and utilize
Posture classification classifies to posture, with attitude vectors record posture sequence, and using classification of motion method to human action into
Row identification, it is achieved that real-time perfoming human action identifies in automated production, so as to realize unattended work
Industry monitoring.
The human body of key person under more people's scenes in fixed area is moved in addition, the disclosure is realized using hot-zone comparison method
It identifies, more human actions, which identify, to be realized to multiple players under more people's scenes using object tracing method, so as to be used for
The different application scenarios such as production commander and environmental monitoring.
The present invention is not limited to the range of specific embodiments described herein, these embodiments are intended as exemplary implementation
Example.Functionally identical product and method is obviously contained in the range of invention described herein.
The basic principle of the disclosure is described in conjunction with specific embodiments above, however, it is desirable to, it is noted that this field
For those of ordinary skill, it is to be understood that the whole or any steps or component of disclosed method and device, Ke Yi
Any computing device (including processor, storage medium etc.) perhaps in the network of computing device with hardware, firmware, software or
Their combination is realized that this is that those of ordinary skill in the art use them in the case where having read the explanation of the disclosure
Basic programming skill can be achieved with.
Therefore, the purpose of the disclosure can also by run on any computing device a program or batch processing come
It realizes.The computing device can be well known fexible unit.Therefore, the purpose of the disclosure can also include only by offer
The program product of the program code of the method or device is realized to realize.That is, such program product is also constituted
The disclosure, and the storage medium for being stored with such program product also constitutes the disclosure.Obviously, the storage medium can be
Any well known storage medium or any storage medium that developed in the future.
It may also be noted that in the device and method of the disclosure, it is clear that each component or each step are can to decompose
And/or reconfigure.These decompose and/or reconfigure the equivalent scheme that should be regarded as the disclosure.Also, execute above-mentioned series
The step of processing, can execute according to the sequence of explanation in chronological order naturally, but not need centainly sequentially in time
It executes.Certain steps can execute parallel or independently of one another.
Above-mentioned specific embodiment does not constitute the limitation to disclosure protection scope.Those skilled in the art should be bright
It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any
Made modifications, equivalent substitutions and improvements etc., should be included in disclosure protection scope within the spirit and principle of the disclosure
Within.
Claims (12)
1. a kind of monitoring method based on action recognition, comprising the following steps:
A) position that the personnel in monitored video frame are identified using Attitude estimation method, carries out skeleton 2D modeling;
B) classified using preparatory trained posture disaggregated model to the 2D model of the skeleton in monitored video frame;
C) posture classification results in successive video frames are stored in attitude vectors, and are sentenced according to preparatory trained action recognition model
Disconnected type of action;And
D) type of action such as judged belongs to monitored type, and label is made to the video frame and/or the movement of specific action
Video clip store and to memory and trigger alarm.
2. monitoring method according to claim 1, wherein step a) includes:
A1) judge the major joint position coordinates of one or more personnel in video frame;And
A2 skeleton 2D modeling) is carried out to each personnel using the relationship between major joint position coordinates and joint.
3. monitoring method according to claim 2, wherein step a2) further include
The face of hand and/or face to each of video carry out skeleton 2D modeling.
4. monitoring method according to claim 1, wherein step b) includes:
The continuous limb action identified will be needed to resolve into discrete key poses;
Key poses mark is carried out to skeleton 2D modeling result;And
Utilize convolutional neural networks algorithm and the skeleton 2D modeling result being marked training posture disaggregated model.
5. monitoring method according to claim 1, wherein step c) includes:
The attitude vectors collection of known action is labeled;And
Using the attitude vectors collection marked as training set training action identification model.
6. monitoring method according to claim 1, wherein step d) at least one of includes the following steps:
It is denoted as out the object of specific action in original video acceptance of the bid, and triggers alarm;
The video frame that specific action is marked archive is stayed into card;And
The video clip that specific action is marked archive is stayed into card.
7. monitoring method according to claim 1, wherein step c), which is further comprised the steps of:, judges more people using hot-zone comparison method
The movement of tracked object under scene.
8. monitoring method according to claim 7, wherein the hot-zone is the specified region of monitor video or entirely monitors
Picture area.
9. monitoring method according to claim 1, wherein step c) further include:
Tracker is added to monitor its movement to each of video, and
Judge whether tracked object needs to continue to be tracked, if you do not need to continuing to track, then deletes the tracker.
10. monitoring method according to claim 9, wherein by judging whether detected object is in following state extremely
It is one of few to determine whether needing to continue to track:
Detected object reaches specified region;
Region is left in detected object arrival;
Detected object remains static more than certain time;And
Whether the instruction that stops to object monitoring area in carry out continue tracking is received.
11. a kind of monitoring system based on action recognition, comprising:
Attitude prediction part identifies the position of the personnel in monitored video frame using Attitude estimation method, and according to institute
The position of acquisition carries out skeleton 2D modeling;
Posture classified part, using preparatory trained posture disaggregated model to the 2D of the skeleton in monitored video frame
Model is classified;
Posture classification results in successive video frames are stored in attitude vectors by posture administrative section;
Action recognition part judges type of action according to preparatory trained action recognition model;And
Label is made the video frame of specific action when the type of action judged belongs to monitored type by output par, c
And/or the video clip of the movement stores to memory and triggers alarm.
12. monitoring system according to claim 11, further includes:
Posture classification based training part carries out key poses mark to obtained skeleton 2D model, and will mark
Skeleton 2D model will be trained as training set input convolutional neural networks train classification models, to obtain posture classification
Model;And
Action recognition trains part, the attitude vectors that known action video is generated through posture administrative section as training set,
Attitude vectors are trained using multivariate classification algorithm, to obtain action recognition model, carry out movement point for attitude vectors
Class.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811453471.2A CN109598229A (en) | 2018-11-30 | 2018-11-30 | Monitoring system and its method based on action recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811453471.2A CN109598229A (en) | 2018-11-30 | 2018-11-30 | Monitoring system and its method based on action recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109598229A true CN109598229A (en) | 2019-04-09 |
Family
ID=65960432
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811453471.2A Pending CN109598229A (en) | 2018-11-30 | 2018-11-30 | Monitoring system and its method based on action recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109598229A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110166650A (en) * | 2019-04-29 | 2019-08-23 | 北京百度网讯科技有限公司 | Generation method and device, the computer equipment and readable medium of video set |
CN110399690A (en) * | 2019-07-31 | 2019-11-01 | 佳都新太科技股份有限公司 | Subway station pedestrian simulation method, apparatus, electronic equipment and storage medium |
CN110490109A (en) * | 2019-08-09 | 2019-11-22 | 郑州大学 | A kind of online human body recovery action identification method based on monocular vision |
CN110580446A (en) * | 2019-07-16 | 2019-12-17 | 上海交通大学 | Behavior semantic subdivision understanding method, system, computer device and medium |
CN110852248A (en) * | 2019-11-07 | 2020-02-28 | 江苏弘冉智能科技有限公司 | Flammable and explosive area illegal equipment based on machine vision and action monitoring method |
CN110969101A (en) * | 2019-11-21 | 2020-04-07 | 浙江工业大学 | Face detection and tracking method based on HOG and feature descriptor |
CN111078093A (en) * | 2019-12-20 | 2020-04-28 | 深圳创维-Rgb电子有限公司 | Screen picture rotation control method and device, electronic product and storage medium |
CN111476118A (en) * | 2020-03-26 | 2020-07-31 | 长江大学 | Animal behavior automatic identification method and device |
CN112488073A (en) * | 2020-12-21 | 2021-03-12 | 苏州科达特种视讯有限公司 | Target detection method, system, device and storage medium |
CN113033252A (en) * | 2019-12-24 | 2021-06-25 | 株式会社理光 | Attitude detection method, attitude detection device and computer-readable storage medium |
CN113128298A (en) * | 2019-12-30 | 2021-07-16 | 上海际链网络科技有限公司 | Analysis method and monitoring system for loading and unloading behavior |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110110560A1 (en) * | 2009-11-06 | 2011-05-12 | Suranjit Adhikari | Real Time Hand Tracking, Pose Classification and Interface Control |
EP2707834A2 (en) * | 2011-05-13 | 2014-03-19 | LiberoVision AG | Silhouette-based pose estimation |
CN104517097A (en) * | 2014-09-24 | 2015-04-15 | 浙江大学 | Kinect-based moving human body posture recognition method |
CN105809144A (en) * | 2016-03-24 | 2016-07-27 | 重庆邮电大学 | Gesture recognition system and method adopting action segmentation |
CN106611157A (en) * | 2016-11-17 | 2017-05-03 | 中国石油大学(华东) | Multi-people posture recognition method based on optical flow positioning and sliding window detection |
CN106897670A (en) * | 2017-01-19 | 2017-06-27 | 南京邮电大学 | A kind of express delivery violence sorting recognition methods based on computer vision |
CN108491754A (en) * | 2018-02-02 | 2018-09-04 | 泉州装备制造研究所 | A kind of dynamic representation based on skeleton character and matched Human bodys' response method |
-
2018
- 2018-11-30 CN CN201811453471.2A patent/CN109598229A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110110560A1 (en) * | 2009-11-06 | 2011-05-12 | Suranjit Adhikari | Real Time Hand Tracking, Pose Classification and Interface Control |
EP2707834A2 (en) * | 2011-05-13 | 2014-03-19 | LiberoVision AG | Silhouette-based pose estimation |
CN104517097A (en) * | 2014-09-24 | 2015-04-15 | 浙江大学 | Kinect-based moving human body posture recognition method |
CN105809144A (en) * | 2016-03-24 | 2016-07-27 | 重庆邮电大学 | Gesture recognition system and method adopting action segmentation |
CN106611157A (en) * | 2016-11-17 | 2017-05-03 | 中国石油大学(华东) | Multi-people posture recognition method based on optical flow positioning and sliding window detection |
CN106897670A (en) * | 2017-01-19 | 2017-06-27 | 南京邮电大学 | A kind of express delivery violence sorting recognition methods based on computer vision |
CN108491754A (en) * | 2018-02-02 | 2018-09-04 | 泉州装备制造研究所 | A kind of dynamic representation based on skeleton character and matched Human bodys' response method |
Non-Patent Citations (2)
Title |
---|
汪成峰;陈洪;张瑞萱;朱德海;王庆;梅树立;: "带有关节权重的DTW动作识别算法研究", 图学学报, no. 04 * |
王军;许永明;王东辉;郭文波;: "基于多支点骨骼模型的实时行为识别方法", 华中科技大学学报(自然科学版), no. 1 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110166650A (en) * | 2019-04-29 | 2019-08-23 | 北京百度网讯科技有限公司 | Generation method and device, the computer equipment and readable medium of video set |
CN110580446A (en) * | 2019-07-16 | 2019-12-17 | 上海交通大学 | Behavior semantic subdivision understanding method, system, computer device and medium |
CN110399690A (en) * | 2019-07-31 | 2019-11-01 | 佳都新太科技股份有限公司 | Subway station pedestrian simulation method, apparatus, electronic equipment and storage medium |
CN110490109A (en) * | 2019-08-09 | 2019-11-22 | 郑州大学 | A kind of online human body recovery action identification method based on monocular vision |
CN110490109B (en) * | 2019-08-09 | 2022-03-25 | 郑州大学 | Monocular vision-based online human body rehabilitation action recognition method |
CN110852248A (en) * | 2019-11-07 | 2020-02-28 | 江苏弘冉智能科技有限公司 | Flammable and explosive area illegal equipment based on machine vision and action monitoring method |
CN110969101A (en) * | 2019-11-21 | 2020-04-07 | 浙江工业大学 | Face detection and tracking method based on HOG and feature descriptor |
CN111078093A (en) * | 2019-12-20 | 2020-04-28 | 深圳创维-Rgb电子有限公司 | Screen picture rotation control method and device, electronic product and storage medium |
CN113033252A (en) * | 2019-12-24 | 2021-06-25 | 株式会社理光 | Attitude detection method, attitude detection device and computer-readable storage medium |
CN113128298A (en) * | 2019-12-30 | 2021-07-16 | 上海际链网络科技有限公司 | Analysis method and monitoring system for loading and unloading behavior |
CN111476118A (en) * | 2020-03-26 | 2020-07-31 | 长江大学 | Animal behavior automatic identification method and device |
CN112488073A (en) * | 2020-12-21 | 2021-03-12 | 苏州科达特种视讯有限公司 | Target detection method, system, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109598229A (en) | Monitoring system and its method based on action recognition | |
CN110751022B (en) | Urban pet activity track monitoring method based on image recognition and related equipment | |
WO2021047232A1 (en) | Interaction behavior recognition method, apparatus, computer device, and storage medium | |
CN108256433B (en) | Motion attitude assessment method and system | |
US20060093185A1 (en) | Moving object recognition apparatus | |
CN108933925A (en) | Information processing unit, information processing method and storage medium | |
Patruno et al. | People re-identification using skeleton standard posture and color descriptors from RGB-D data | |
CN109298785A (en) | A kind of man-machine joint control system and method for monitoring device | |
US20210224752A1 (en) | Work support system and work support method | |
KR20150089482A (en) | Method and apparatus of recognizing facial expression using motion dictionary | |
US11048917B2 (en) | Method, electronic device, and computer readable medium for image identification | |
JPWO2018154709A1 (en) | Motion learning device, skill discrimination device and skill discrimination system | |
CN112732071A (en) | Calibration-free eye movement tracking system and application | |
CN109873979A (en) | Camera-based static image difference comparison method and device | |
CN110688980A (en) | Human body posture classification method based on computer vision | |
Dragan et al. | Human activity recognition in smart environments | |
RU2315352C2 (en) | Method and system for automatically finding three-dimensional images | |
Badave et al. | Evaluation of person recognition accuracy based on OpenPose parameters | |
Ali et al. | Deep Learning Algorithms for Human Fighting Action Recognition. | |
Foytik et al. | Tracking and recognizing multiple faces using Kalman filter and ModularPCA | |
Bevilacqua et al. | A new tool for gestural action recognition to support decisions in emotional framework | |
CN108647662A (en) | A kind of method and system of automatic detection face | |
CN115546825A (en) | Automatic monitoring method for safety inspection normalization | |
CN114898287A (en) | Method and device for dinner plate detection early warning, electronic equipment and storage medium | |
CN114663796A (en) | Target person continuous tracking method, device and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |