CN109598229A

CN109598229A - Monitoring system and its method based on action recognition

Info

Publication number: CN109598229A
Application number: CN201811453471.2A
Authority: CN
Inventors: 李刚毅
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2019-04-09

Abstract

This disclosure relates to a kind of monitoring system and method for the offer based on action recognition.This method comprises: identifying the position of the personnel in monitored video frame using Attitude estimation method, skeleton 2D modeling is carried out；Classified using preparatory trained posture disaggregated model to the 2D model of the skeleton in monitored video frame；Posture classification results in successive video frames are stored in attitude vectors, and type of action is judged according to preparatory trained action recognition model；And the type of action such as judged belongs to monitored type, then stores the video clip for marking the video frame for making specific action and/or the movement to memory and trigger alarm.

Description

Monitoring system and its method based on action recognition

Technical field

This disclosure relates to monitoring system and its method based on action recognition, more particularly, to using attitude prediction technology, Gesture recognition technology and action recognition technology, judge whether the personnel in monitored video make specific action, if detected It is specific to act then automatic alarm, and save the system and method for associated video frame and video file segments file for future examination.

Background technique

The movement of object has decisive role in terms of judging object behavior.Regardless of object be people or animal also or Machinery will realize set objective, require to realize by corresponding actions.

Chinese invention patent application discloses CN107992858A and proposes a kind of real-time three-dimensional gesture based on single RGB frame Estimating system and its method detect using hand detector and confine hand region, identify hand joint using OpenPose The position 2D is minimized hand 3D models fitting to 2D joint position using non-linear squares, restores hand gestures.The method benefit The modeling to hand is realized with OpenPose method and gesture is identified.However, this method is not suitable for other limbs to people Body (such as wrist, elbow, shoulder, neck, hip, knee, ankle, refer to) joint is detected, and also is suitble to divide other limbs without configuration Effective sorting algorithm of class.In addition, method disclosed in CN107992858A mainly identifies the gesture in single frame, Be not suitable for identifying the movement in the video of continuous multiple frames.

Chinese invention patent application discloses CN108427331A and proposes a kind of man-machine collaboration safety protecting method and system, It passes through meter using RGB-D sensor and OpenPose testing staff's coordinate using RGB-D sensor identification robot coordinate Calculate the speed that the distance between people and robot control robot.This method also realizes position using OpenPose method Modeling, and be by the judgement to position and distance is done after Human Modeling, to realize to the control of the dynamic speed of robot.Cause This, the prior art is not over to the action recognition after Human Modeling, to judge the type of movement.

Chinese invention patent application, which discloses CN108416795A and proposes, a kind of merges space characteristics based on sequence pondization Video actions recognition methods calculates the visual feature vector collection of video frame, constructs two-dimensional space pyramid model to video frame, And action classification is judged after the visual feature vector collection in subspace is handled and classified.The method uses the two dimension to video frame The mode that space carries out multi-scale division realizes the detection to movement, therefore is that the visual signature based on original video frame is divided Class.The technology is not suitable for distinguishing independent human skeleton model, and carries out to skeleton model (rather than original video feature) Classification judges posture, thus is not suitable for being acted according to posture sequence judgement.

Therefore, it is necessary to one kind can utilize attitude prediction technology, gesture recognition technology and action recognition technology, judge to be supervised Whether the personnel in control video make specific action, the automatic alarm if detecting specific movement, and save associated video The system and method for frame and video file segments file for future examination.

Summary of the invention

For this purpose, the purpose of the disclosure is to judge quilt using attitude prediction technology, gesture recognition technology and action recognition technology Whether the personnel in monitor video make specific action, the automatic alarm if detecting specific movement, and save related view Frequency frame and video file segments file for future examination.

To achieve the goals above, according to one aspect of the disclosure, a kind of monitoring method based on action recognition is provided, The following steps are included: a) identifying the position of the personnel in monitored video frame using Attitude estimation method, skeleton is carried out 2D modeling；B) the 2D model of the skeleton in monitored video frame is divided using preparatory trained posture disaggregated model Class；C) posture classification results in successive video frames are stored in attitude vectors, and are sentenced according to preparatory trained action recognition model Disconnected type of action；And type of action d) such as judged belongs to monitored type, then will depict label and make specific action Video frame and/or the video clip of the movement store and to memory and trigger alarm.

Preferably, the step a) includes: a1) judge that the major joint position of one or more personnel in video frame is sat Mark；And skeleton 2D modeling a2) is carried out to each personnel using the relationship between major joint position coordinates and joint.

Preferably, the step a2) it further include that human body is carried out to the hand of each of video and/or the face of face Bone 2D modeling.

Preferably, the step b) includes: that the continuous limb action identified will be needed to resolve into discrete key poses；It is right Skeleton 2D modeling result carries out key poses mark；And utilize convolutional neural networks algorithm and the skeleton being marked 2D modeling result trains posture disaggregated model.

Preferably, the step c) includes: to be labeled the attitude vectors collection of known action；And the appearance that will have been marked State vector set is as training set training action identification model.

Preferably, the step d) at least one of includes the following steps: being denoted as out specific action in original video acceptance of the bid Object, and trigger alarm；The video frame that specific action is marked archive is stayed into card；And the video that specific action will be marked Segment archive stays card.

Preferably, the step c) is further comprised the steps of: is sentenced using hot-zone (ROI, Region of Interest) comparison method The movement for the tracked object broken under more people's scenes.

Preferably, the hot-zone refers to the specified region of monitor video, if not specifying region, hot-zone is exactly entire Monitored picture region.

Preferably, the step c) further include: tracker is added to monitor its movement to each of video, and is sentenced Whether disconnected tracked object needs to continue to be tracked, if you do not need to continuing to track, then deletes the tracker.

Preferably, wherein by judging whether detected object is at least one of following state to determine whether needs Continue to track: detected object reaches specified region；Region is left in detected object arrival；Detected object remains static More than certain time；And whether receive the instruction for stopping carrying out continuing tracking to the object in monitoring area.

According to another aspect of the disclosure, a kind of monitoring system based on action recognition is provided, comprising: attitude prediction portion Point, the position of the personnel in monitored video frame is identified using Attitude estimation method, and carry out according to position obtained Skeleton 2D modeling；Posture classified part, using preparatory trained posture disaggregated model in monitored video frame The 2D model of skeleton is classified；Posture administrative section, by successive video frames posture classification results be stored in posture to Amount；Action recognition part judges type of action according to preparatory trained action recognition model；And output par, c, When the type of action judged belongs to monitored type, label is made to the video frame of specific action and/or the video of the movement Segment storage is to memory and triggers alarm.

Preferably, the monitoring system further include: posture classification based training part, to obtained skeleton 2D model Key poses mark is carried out, and will be as training set input convolutional neural networks training point by the skeleton 2D model marked Class model is trained, to obtain posture disaggregated model；And action recognition training part, by known action video through posture The attitude vectors that administrative section generates are trained attitude vectors using multivariate classification algorithm, as training set to be moved Make identification model, carries out the classification of motion for attitude vectors.

Since the disclosure is modeled using the skeleton 2D that human body gesture prediction method carries out personnel in detection video, and utilize Posture classification classifies to posture, with attitude vectors record posture sequence, and using classification of motion method to human action into Row identification, it is achieved that real-time perfoming human action identifies in automated production, so as to realize unattended work Industry monitoring.

The human body of key person under more people's scenes in fixed area is moved in addition, the disclosure is realized using hot-zone comparison method It identifies, more human actions, which identify, to be realized to multiple players under more people's scenes using object tracing method, so as to be used for The different application scenarios such as production commander and environmental monitoring.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the implementation for meeting the disclosure Example, and together with specification for explaining the principles of this disclosure.

Fig. 1 is the schematic block diagram for illustrating the monitoring system based on action recognition of one embodiment according to the disclosure；

Fig. 2 is the schematic block diagram for illustrating the gesture recognition part according to one embodiment of the disclosure；

Fig. 3 is the flow chart for illustrating posture administrative section and updating attitude vectors；

Fig. 4 is the detailed schematic block diagram according to the action recognition part of one embodiment of the disclosure；

Fig. 5 is the operational flowchart according to the monitoring system based on action recognition of one embodiment of the disclosure；

Fig. 6 is the view for showing the major joint of human body；

Fig. 7 is the view for showing the incidence relation between each joint；

Fig. 8 shows the example of several postures；

Fig. 9 a and 9b respectively illustrate the example of several postures；

Figure 10 shows a specified hot-zone in video area；And

Figure 11 shows the schematic diagram that object tracing is carried out in the case where detected object is movement.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.

It is only to be not intended to be limiting and originally open merely for for the purpose of describing particular embodiments in the term that the disclosure uses.It removes Non- defined otherwise, every other scientific and technical terms used herein have and those skilled in the art Normally understood identical meaning.The "an" of the singular used in disclosure and the accompanying claims book, " institute State " and "the" be also intended to including most forms, unless the context clearly indicates other meaning.It is also understood that making herein Term "and/or" refers to and may combine comprising one or more associated any or all of project listed.

It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the disclosure A little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing from In the case where disclosure range, first can also be referred to as second, and vice versa.Depending on context, word as used in this Language " if " can be construed to " ... when " or " when ... " or " in response to determination ".

In order to make those skilled in the art more fully understand the disclosure, with reference to the accompanying drawings and detailed description to this public affairs It opens and is described in further detail.

Fig. 1 is the schematic block diagram for illustrating the monitoring system based on action recognition of one embodiment according to the disclosure. As shown in Figure 1, the monitoring system includes video acquisition part 110, gesture recognition part 120, action recognition part 130 and defeated Part 140 out.

Video data is collected by video capture device, such as mobile phone, camera, network etc. in video acquisition part 110, so The collected video data (video flowing) of institute is converted into video frame afterwards, is supplied to the use of gesture recognition part 120.

Whether the personnel that gesture recognition part 120 is detected in video frame are in predefined posture, and pre- detecting In the case where the posture of definition, the skeleton 2D model of each personnel is established according to testing result.If in attitude detection Attitude detection result, then be added in attitude vectors by the decomposition posture for finding any predefined multiple movements in the process, and Attitude vectors are sent to action recognition part 130.

Action recognition part 130 judges whether attitude vectors are monitored movement.If it is monitored movement, movement Identification division 120 just exports the video frame of the movement key poses and/or the video clip of the movement to output par, c 140.

Key poses frame and/or piece of video corresponding to the movement that output par, c 140 identifies the action recognition part Section is output in data storage device, video display apparatus and/or audio playing device (not shown).

Fig. 2 is the schematic block diagram for illustrating the gesture recognition part 130 according to one embodiment of the disclosure.Gesture recognition portion Dividing 130 includes attitude prediction part 210, posture classified part 220, posture classification based training part 230, posture administrative section 240.

Attitude prediction part 210 is predicted by the human body attitude in video frame.According to one embodiment of the disclosure, appearance State predicted portions 210 judge the 2D coordinate position in the limbs key joint of all personnel in video frame using OpenPose technology, Then according to customized joint incidence relation, the personnel each to detect establish skeleton 2D model.For predefined Human action, need to resolve into continuous limb action discrete key poses (be similar to setting-up exercises to radio music map).It is preferred that Ground, attitude prediction part 210 can also carry out skeleton 2D modeling to the hand of each of video and/or in videos Everyone face carry out human body face 2D modeling.

Although using OpenPose technology according to one embodiment of the disclosure and carrying out human body gesture prediction, but it should Understand, human body gesture prediction can also be carried out using any other similar technology.

There are two types of operation modes for the tool of gesture recognition part 130: posture training mode and gesture recognition mode.In posture training Under mode, posture classification based training part 230 carries out key poses to the obtained skeleton 2D model in attitude prediction part 210 Mark.The skeleton 2D model marked will be trained disaggregated model as training set input convolutional neural networks.It will instruction The posture disaggregated model perfected classifies automatically to skeleton 2D model as the posture disaggregated model under recognition mode.

Under gesture recognition mode, attitude prediction part 210 sends skeleton 2D model to posture classified part 220, and classified using posture disaggregated model trained under training mode to posture.It, should for any application scenarios An initial attitude is defined for the posture exploded view of each movement.

Posture administrative section 240 safeguards the attitude vectors P (c, s) of current video, and wherein c is posture classification, such as some Common human body attitude: standard sitting posture, making and receiving calls bend over the desk to rest or some specific business gestures, such as arm extension, hand Brachiocylloosis is clenched fist, and s is the number that each posture classification c is consecutively detected.

Fig. 3 is the flow chart for illustrating posture administrative section 240 and updating attitude vectors.Once as shown in figure 3, posture division Divide 220 to detect predefined posture (310) in the video frame, just sends this posture classification to posture administrative section 240. Whether the test pose vector P (c, s) first of posture administrative section 240 is sky, and attitude vectors are not have in " sky " i.e. attitude vectors Save any posture record (320).If attitude vectors are empty (320), posture administrative section 240 judges that current pose is No is initial attitude (330).So-called initial attitude, i.e. some posture decompose first map posture in map.For example, working as quilt In the case that monitoring personnel is in standard sitting posture, when he make one crook one's arm the movement clenched fist when, first decomposition posture Be arm is flattened, at this moment to help the posture for holding chair handle to have naturally with the arm under standard sitting posture bright for the position of arm in posture Aobvious difference, so as to be considered the beginning of a movement.If not initial attitude, then just terminating to judge, and abandon The posture (370).For example, it is not any known movement that this, which decomposes posture, if the posture monitored is hand arm held upward First posture of posture is decomposed, therefore may determine that its subsequent action is also not and need monitored movement, therefore can neglect The slightly posture.In other words, the posture and the subsequent posture that may develop are not required to one of the various gesture sets of monitoring Point, therefore do not have to continue to monitor.Certainly, if be monitored in the future for such posture needs, monitoring is included at it It may also be considered as initial attitude without being ignored after range.If it is initial attitude, then posture administrative section 240 are just stored in attitude vectors (340) using the posture as current pose, and to the counter+1 of current pose (360).It is so-called to work as Preceding posture refers to the newest posture recorded in posture vector.Because video is usually 30 frames/second, so if to continuous videos It is monitored frame by frame, the posture in many frames can be identified as same posture because difference is smaller, so being worked as with counter is accumulative The frequency of occurrence of preceding posture.If not initial attitude, then posture administrative section 240 is judged as whether the posture is current appearance State (the i.e. upper key poses detected) (350).In other words, initial attitude is to judge and record opening for a movement Begin.Current pose refer to record one movement multiple decomposition posture (i.e. key poses) when, the key poses of state-of-the-art record. When one movement of rigid start recording, initial attitude with current pose should be it is identical, it is different during action record When the posture of initial attitude is detected, new movement posture will become current pose, at this moment, initial attitude and current appearance It is different when state.If it is not, so posture administrative section 240 is just stored in attitude vectors using the posture as current pose (340), start to count new current pose, that is, new initial attitude, i.e., to the counter+1 of current pose (360).If so, namely detected posture and tightly previous key poses difference are tiny, then be judged as with it is previous Key poses are identical, then posture administrative section 240 is just to the counter+1 of current pose (360).

Under the following conditions, posture administrative section 240 updates the vector for terminating one group of posture:

Current pose is to terminate posture, and end posture is corresponding with initial attitude, is multiple decomposition appearances of a movement The last one posture in state (i.e. key poses).And its continuous frequency of occurrence is more than predefined threshold value.

The counter of current pose is not updated in N frame.This means that the movement of monitored people is in the predetermined time It is detected in section.

The state of current pose is not changed in N frame.This means that posture is continuously in although detecting posture Identical posture state.

System command terminates posture renewal

Video terminates or video flowing interrupts

After attitude vectors terminate update, which will be sent to subsequent module for processing, while posture management department Divide the attitude vectors in 240 to be initialised, prepares for the record of next group of attitude vectors.

After attitude vectors terminate update, attitude vectors are done with standardization processing, including filtering is continuously monitored to secondary Number is lower than the posture of predefined thresholds, to prevent accidental erroneous detection bring false judgment.Due to based on computer vision artificial Intelligent gesture recognition technology may by light, angle, the factors such as block and influenced, there are the possibility of a degree of erroneous judgement Property.Since this erroneous judgement is usually accidental, in order to reduce the influence judged by accident to integrally classifying, need to set a threshold Value, it is no to guarantee just to retain the posture record in the case where only detecting identical posture in continuous multiple frames (being higher than threshold value) Then think that the posture is accidental erroneous detection, not record, to improve the accuracy of whole attitude detection.After filtration, it also needs The posture to occur to continuous several times merges.

Fig. 4 is the detailed schematic frame according to the action recognition part 130 shown in FIG. 1 of one embodiment of the disclosure Figure.As shown in figure 3, the action recognition part 130 includes action recognition device 410, action recognition training part 420.

Also there are two types of operating modes for action recognition part: action training mode and action recognition mode.In action training mould Under formula, the attitude vectors that sample training action video is generated through posture administrative section 240 are entered action recognition as training set Training part 420.It is dynamic to obtain sample that action recognition training part 420 uses multivariate classification algorithm to be trained attitude vectors Make identification model.The sample action identification model trained carries out the contrast sample of action recognition as action recognition device 410, uses The classification of motion is carried out in the attitude vectors generated to posture administrative section 240.

Under recognition mode, the attitude vectors that posture administrative section 240 generates are directly inputted to action recognition device 410 In.Action recognition device 410 judges type of action using the trained specimen discerning model of action recognition training aids 420.In general, being It is monitored action classification (also referred to as specific action classification) that system, which can preassign some action classifications, if it is determined that the movement Classification be monitored movement classification, then output par, c 140 will will label make specific action video frame and/or this move The video clip of work is stored into memory (not shown).Preferably, output par, c 140 can trigger alarm simultaneously.

Preferably, (only one crucial appearance of the movement when the initial attitude of some movement is identical with posture is terminated State), then whether predefined posture judgment threshold can be greater than according to the value of the key poses counter, without instructing in advance Experienced classification of motion model is to determine whether be corresponding movement.For example, to judge whether a people is making a phone call, it is assumed that posture The movement judgment threshold of counter is to see, if that there is the posture made a phone call in continuous N (N >=k) frame, then can be sentenced The fixed personnel are in making and receiving calls.

Fig. 5 is the operational flowchart according to the monitoring system based on action recognition of one embodiment of the disclosure.Such as Fig. 5 It is shown, in step s 51, Attitude estimation method should be utilized to identify the people in monitored video frame based on the monitoring system of action recognition The position of member.Adoptable estimation method can be existing any method, such as OpenPose method, carry out human body Bone 2D modeling.Modeling object includes manpower, the face of face, wrist, elbow, shoulder, neck, hip, knee, ankle, refers to.Here it is possible to sharp Skeleton is carried out to each personnel with the relationship between the major joint position coordinates of one or more personnel in video frame 2D modeling.

Next, if having trained posture disaggregated model in advance, just in step S52, using trained in advance Posture disaggregated model classifies to the 2D model of the skeleton in monitored video frame.If not training posture in advance Disaggregated model then will also train first such posture disaggregated model in step S52, then recycle trained appearance in advance State disaggregated model classifies to the 2D model of the skeleton in monitored video frame.The process of training posture disaggregated model It is: the continuous limb action identified will be needed to resolve into discrete key poses, then skeleton 2D modeling result is carried out Key poses mark is finally classified using convolutional neural networks algorithm and the skeleton 2D modeling result being marked training posture Model.

Then, in step S53, attitude vectors is established and mark the attitude vectors collection of known action as training training Practice action recognition model, to judge whether type of action belongs to monitored type.Specifically, in this step, once appearance State classified part 220 detects trained sample posture in the video frame, just sends this posture classification to posture Administrative section 240.Whether the test pose vector first of posture administrative section 240 is empty.If attitude vectors are sky, posture Administrative section 240 judges whether current pose is initial attitude.If not initial attitude, then just terminating to judge, and abandon The posture.If it is initial attitude, then posture administrative section 240 is just stored in attitude vectors using the posture as current pose, And to the counter+1 of current pose.If not initial attitude, then posture administrative section 240 be judged as the posture whether be Current pose (the i.e. upper key poses detected).If it is not, so posture administrative section 240 just using the posture as Current pose is stored in attitude vectors, and to the counter+1 of current pose.If it is then posture administrative section 240 is just to working as The counter+1 of preceding posture.

Current pose is to terminate posture, and its continuous frequency of occurrence is more than predefined threshold value

The state of current pose is not updated in N frame

The state of current pose is not changed in N frame

System command terminates posture renewal

Video terminates or video flowing interrupts

Next, the type of action such as judged belongs to monitored type, then label is made in step S54 specific dynamic The video frame of work and/or the storage of the video clip of the movement to memory and trigger alarm.

It should be appreciated that in the other embodiments of the disclosure, if there are more people in video, hot-zone can be used (ROI, Region of Interest) comparison method or object tracing method are distinguished object and are tracked.

If monitored object is located at fixed area, proper using hot-zone comparison method.First in video area Specified hot-zone, then draws its outline polygon (defaulting to rectangle) to the skeleton 2D model of each foundation, then compares The coincidence ratio of the contour area and hot-zone.Ratio is maximum to be considered as detected object.

If detected object is movement, object is tracked using object tracing method, and record is each respectively The attitude vectors of object.According to one embodiment of the disclosure, KCF can be used

(Kernelized Correlation Filters), BOOSTING, MIL (Multiple Instance Learning), TLD (Tracking, learning and detection), GOTURN or other object tracing algorithms come to view Object in screen is tracked.

In the other embodiments of the disclosure, tracker can be added to each of video.It may determine that and be tracked Whether object needs to continue to be tracked.For example, can be by judging whether detected object is at least one of following state To determine whether needing to continue to track: detected object reaches specified region；Region is left in detected object arrival；Detected pair It is more than certain time as remaining static；And whether receives stopping and the object in monitoring area is carried out continuing to track Instruction.If you do not need to continuing to track, then the tracker is deleted.

Example

The exemplary purpose is to carry out motion detection to the personage in the video generated using video camera shooting.Detection process It is as follows.

1) skeleton 2D modeling is carried out in video area.

Fig. 6 is the view for showing the major joint of human body.As shown in fig. 6, utilizing the detection of existing human body key node Deep learning model (such as OpenPose) detect human body major joint (such as wrist, elbow, shoulder, neck, hip, knee, ankle, refer to, As shown in the white point in figure on human body).

Fig. 7 is the view for showing the incidence relation between each joint.As shown in fig. 7, according to joint predetermined it Between incidence relation (such as between right hand elbow and right finesse be associated), draw skeleton, and do and standardize to model, make it Output Size is uniform.

2) attitude prediction then is carried out to the skeleton 2D model that each frame is established out.

Fig. 8 shows the example of several postures.As shown in figure 8, using preparatory trained posture disaggregated model to human body Bone 2D model carries out attitude prediction, and appearance is written in the prediction result that reliability forecasting is greater than predefined thresholds (such as 50%) State vector P (c, s), wherein c is posture classification, and s is the number that each posture classification is consecutively detected.In other words, as The posture that will acquire is compared with posture disaggregated model, and obtains the similarity with posture disaggregated model.It wherein predicts credible Degree is that gesture recognition algorithms are calculated, and the posture for quantitative prediction is close with the sample posture of training posture disaggregated model Like degree.Reliability forecasting threshold value is the empirical value summed up in the test environment according to practical application scene, and value can root It is configured according to actual scene.

In example as shown in Figure 8, the reliability forecasting of one of posture is 12%, is less than predefined threshold value, then The posture is not recorded in attitude vectors.Therefore, the attitude vectors of this group of posture are as follows:

P1=[(1,2), (2,2)]

3) finally, after terminating to the update of attitude vectors P (c, s), using action recognition model trained in advance to appearance State vector is classified, and judges its action classification.Under the following conditions, the vector for terminating one group of posture is updated:

The state of current pose is not updated in N frame

The state of current pose is not changed in N frame

System command terminates posture renewal

Video terminates or video flowing interrupts

For example, posture 2 terminates after continuously there are 2 frames if 2 frames continuously occurs in posture 1 in attitude vectors P1, move at the same time Identification model may judge its for movement 1 confidence level be 85%, then can be determined that the movement for movement 1.Detected by these Posture 1 and posture 2 are all the composition postures for acting 1.

P1=[(1,2), (2,2)]=> 1 (0.85) of movement.

Fig. 9 a and 9b respectively illustrate the example of several postures.It is worth noting that, as shown in figure 9, in actual measurement, It is also possible that the posture (such as posture 3) of certain error detections between posture 1 and posture 2.These information are to dynamic in order to prevent The influence of judge then needs to carry out standardization processing to attitude vectors before carrying out motion detection.Standardization processing includes:

Set the threshold value (such as 2) of the continuous frequency of occurrence of posture, and filter out continuous frequency of occurrence lower than threshold value to Magnitude.

As illustrated in fig. 9, P1=[(1,2), (3,1), (2,2)] occurs 1 times since posture 3 is continuous, by from posture It is removed in vector, then revised vector P1'=[(1,2), (2,2)].

After vector value of the removal number lower than threshold value, merge the identical vector row value of posture.

As shown in figure 9b, [(1,2), (3,1), (1,2), (2,2)] P1=are then filtering out the vector value for only occurring 1 time Afterwards, vector becomes:

P1'=[(1,2), (1,2), (2,2)],

Continuously occur twice it can be seen that depositing posture 1 in P1', therefore be merged into:

P1 "=[(Isosorbide-5-Nitrae), (2,2)].

If finding that the movement is predefined movement after motion detection, then will triggering alarm, and by the action video Segment output, or the decomposition video frame of the movement is exported one by one.

Preferably, if there are more people in video, can be compared using hot-zone (ROI, Region of Interest) Method or object tracing method are distinguished object and are tracked.

If monitored object is located at fixed area, proper using hot-zone comparison method.Figure 10 shows video area A specified hot-zone in domain.As shown in Figure 10, hot-zone is specified first in video area, then to the human body bone of each foundation Bone 2D model draws its outline polygon (defaulting to rectangle), then compares the coincidence ratio of the contour area and hot-zone.Ratio It is maximum to be considered as detected object.

If detected object is movement, object is tracked using object tracing method, and record is each respectively The attitude vectors of object.Figure 11 shows the schematic diagram that object tracing is carried out in the case where detected object is movement.Such as figure Shown in 11, according to one embodiment of the disclosure, KCF (Kernelized Correlation Filters) can be used, BOOSTING, MIL (Multiple Instance Learning), TLD (Tracking, learning and Detection), GOTURN or other object tracing algorithms are tracked the object in screen.

The present invention is not limited to the range of specific embodiments described herein, these embodiments are intended as exemplary implementation Example.Functionally identical product and method is obviously contained in the range of invention described herein.

The basic principle of the disclosure is described in conjunction with specific embodiments above, however, it is desirable to, it is noted that this field For those of ordinary skill, it is to be understood that the whole or any steps or component of disclosed method and device, Ke Yi Any computing device (including processor, storage medium etc.) perhaps in the network of computing device with hardware, firmware, software or Their combination is realized that this is that those of ordinary skill in the art use them in the case where having read the explanation of the disclosure Basic programming skill can be achieved with.

Therefore, the purpose of the disclosure can also by run on any computing device a program or batch processing come It realizes.The computing device can be well known fexible unit.Therefore, the purpose of the disclosure can also include only by offer The program product of the program code of the method or device is realized to realize.That is, such program product is also constituted The disclosure, and the storage medium for being stored with such program product also constitutes the disclosure.Obviously, the storage medium can be Any well known storage medium or any storage medium that developed in the future.

It may also be noted that in the device and method of the disclosure, it is clear that each component or each step are can to decompose And/or reconfigure.These decompose and/or reconfigure the equivalent scheme that should be regarded as the disclosure.Also, execute above-mentioned series The step of processing, can execute according to the sequence of explanation in chronological order naturally, but not need centainly sequentially in time It executes.Certain steps can execute parallel or independently of one another.

Above-mentioned specific embodiment does not constitute the limitation to disclosure protection scope.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any Made modifications, equivalent substitutions and improvements etc., should be included in disclosure protection scope within the spirit and principle of the disclosure Within.

Claims

1. a kind of monitoring method based on action recognition, comprising the following steps:

A) position that the personnel in monitored video frame are identified using Attitude estimation method, carries out skeleton 2D modeling；

B) classified using preparatory trained posture disaggregated model to the 2D model of the skeleton in monitored video frame；

C) posture classification results in successive video frames are stored in attitude vectors, and are sentenced according to preparatory trained action recognition model Disconnected type of action；And

D) type of action such as judged belongs to monitored type, and label is made to the video frame and/or the movement of specific action Video clip store and to memory and trigger alarm.

2. monitoring method according to claim 1, wherein step a) includes:

A1) judge the major joint position coordinates of one or more personnel in video frame；And

A2 skeleton 2D modeling) is carried out to each personnel using the relationship between major joint position coordinates and joint.

3. monitoring method according to claim 2, wherein step a2) further include

The face of hand and/or face to each of video carry out skeleton 2D modeling.

4. monitoring method according to claim 1, wherein step b) includes:

The continuous limb action identified will be needed to resolve into discrete key poses；

Key poses mark is carried out to skeleton 2D modeling result；And

Utilize convolutional neural networks algorithm and the skeleton 2D modeling result being marked training posture disaggregated model.

5. monitoring method according to claim 1, wherein step c) includes:

The attitude vectors collection of known action is labeled；And

Using the attitude vectors collection marked as training set training action identification model.

6. monitoring method according to claim 1, wherein step d) at least one of includes the following steps:

It is denoted as out the object of specific action in original video acceptance of the bid, and triggers alarm；

The video frame that specific action is marked archive is stayed into card；And

The video clip that specific action is marked archive is stayed into card.

7. monitoring method according to claim 1, wherein step c), which is further comprised the steps of:, judges more people using hot-zone comparison method The movement of tracked object under scene.

8. monitoring method according to claim 7, wherein the hot-zone is the specified region of monitor video or entirely monitors Picture area.

9. monitoring method according to claim 1, wherein step c) further include:

Tracker is added to monitor its movement to each of video, and

Judge whether tracked object needs to continue to be tracked, if you do not need to continuing to track, then deletes the tracker.

10. monitoring method according to claim 9, wherein by judging whether detected object is in following state extremely It is one of few to determine whether needing to continue to track:

Detected object reaches specified region；

Region is left in detected object arrival；

Detected object remains static more than certain time；And

Whether the instruction that stops to object monitoring area in carry out continue tracking is received.

11. a kind of monitoring system based on action recognition, comprising:

Attitude prediction part identifies the position of the personnel in monitored video frame using Attitude estimation method, and according to institute The position of acquisition carries out skeleton 2D modeling；

Posture classified part, using preparatory trained posture disaggregated model to the 2D of the skeleton in monitored video frame Model is classified；

Posture classification results in successive video frames are stored in attitude vectors by posture administrative section；

Action recognition part judges type of action according to preparatory trained action recognition model；And

Label is made the video frame of specific action when the type of action judged belongs to monitored type by output par, c And/or the video clip of the movement stores to memory and triggers alarm.

12. monitoring system according to claim 11, further includes:

Posture classification based training part carries out key poses mark to obtained skeleton 2D model, and will mark Skeleton 2D model will be trained as training set input convolutional neural networks train classification models, to obtain posture classification Model；And

Action recognition trains part, the attitude vectors that known action video is generated through posture administrative section as training set, Attitude vectors are trained using multivariate classification algorithm, to obtain action recognition model, carry out movement point for attitude vectors Class.