CN111191511B

CN111191511B - Dynamic real-time behavior recognition method and system for prison

Info

Publication number: CN111191511B
Application number: CN201911222201.5A
Authority: CN
Inventors: 张洋; 姚登峰
Original assignee: Beijing Union University
Current assignee: Beijing Union University
Priority date: 2019-12-03
Filing date: 2019-12-03
Publication date: 2023-08-18
Anticipated expiration: 2039-12-03
Also published as: CN111191511A

Abstract

The invention provides a prison dynamic real-time behavior recognition method and a prison dynamic real-time behavior recognition system, wherein the method comprises the steps of obtaining a data input stream of a model, and further comprises the following steps: performing processing of the data input stream; a Cartesian space coordinate system is established by taking the connection point of the human vertebra and the pelvic bone as an origin; modeling the whole figure by using a figure recognition module, and storing a human body characteristic matrix; extracting and linking the characteristic points of other body parts into segments by using an image extraction module, establishing the segments in a Cartesian space coordinate system, and storing a human body characteristic matrix of the corresponding part; in a Cartesian space coordinate system, after the coordinates of the figure of the person are determined, the relative position of the figure in the coordinate system is identified, and a feature matrix of the relative position is recorded; and combining the feature matrix, and improving the traditional extraction of the bottom layer physical features to the extraction of the high-level semantic features. The invention utilizes a video analysis algorithm to calculate and analyze video content to extract some specific events or monitoring targets occurring in video scenes.

Description

Dynamic real-time behavior recognition method and system for prison

Technical Field

The invention relates to the field of prison security, in particular to a method and a system for identifying dynamic real-time behaviors of prisons.

Background

The prison security system has complicated working content and great responsibility, and mainly comprises the steps of preventing prisons from escaping, fighting, self-injury and self-disabled, harassment, riot, prison, hypochondriac injury to a pipe trunk and staff, information leakage, illegal entry of outsiders, holding and carrying in and out of forbidden objects, processing after the events, storing and recording historical data and the like. Prisons are an important component of national jurisdictions, and are responsible for the vital mission of performing juggle and educational alterations. Modern prison security management has been well developed in terms of intelligence. The technology is in an era of rapid progress, digitization and networking are prominent features of the 21 world. The modern prison security management system mainly comprises two parts, namely a security system and an information management system. Security systems are important, and modern security products are mainly applied, for example: television monitoring system, access control system, perimeter guard system, illegal access alarm system, etc. The information management system is mainly used for statistics, management and prison personnel information recording. The common prison video linkage alarm platform relies on a wireless communication technology, various wireless alarm detectors (such as infrared detectors, smoke alarms, door magnetism, patrol buttons, cameras, current sensors and the like) are utilized to actively collect monitoring field information, the alarm monitoring information is sent to an application platform through telephone lines, broadband and GPRS/CDMA networks, and the platform rapidly informs a user of the alarm information in a telephone, short message and broadcasting mode, so that the user can know the condition of the monitoring field in real time. The system has complex composition and is generally composed of various subsystems, however, the linkage effect among the subsystems is quite poor, the systems are in an information island state, and the defects are often perfected through a large number of personnel monitoring and regulation, for example: when prisons are in climbing, fighting, etc., a series of actions violating prison management can be found only by constantly looking at video by prison polices, or can be detected only passively by various physical detectors after the occurrence. Moreover, the management and control of the prison staff can only be established by physically obtaining devices, and the prison staff often wears various devices, for example: tamper resistant bracelets, chest cards, work cards, etc. or in certain prohibited areas are controlled by infrared devices. Therefore, the existing prison security system has obvious defects and shortcomings in personnel utilization rate, equipment weight reduction, real-time monitoring and system intellectualization

The invention application with the application number of CN10864582A discloses a target behavior recognition and prediction method under a complex dynamic environment, wherein a human body behavior database and a scientific large message flying voice recognition database are firstly established; then, when the security inspection robot is in inspection, the surrounding environment is acquired in real time through the visual sensor and the sound sensor; and then fusing the voice information and the visual video image information, judging the target to be locked from the whole judgment to the region through target identification, identifying the target to the individual, finally matching the target-locked human body action data with the human body action database data in real time, identifying and predicting the target human body action, and when the target human body action is higher than a certain dangerous level, carrying out early warning on the robot. The method has the defects that the required physical equipment is complex, the surrounding environment can be only expected to be acquired in real time, and cannot be analyzed in real time, and the judgment is carried out only through common images and sounds, so that the hierarchy of behavior semantics is not reached, and the recognition rate is possibly low.

The invention application with the application number of CN108629946A discloses a human body falling detection method based on an RGBD sensor, which comprises the following steps of: calibrating and correcting parameters of an internal camera and an external camera of the RGBD sensor; taking a calibration step aiming at the RGBD video sequence, and extracting three-dimensional structural information of the active space environment; extracting the joint points based on a multi-stage convolutional neural network to obtain a group of three-dimensional coordinate positions of the human joint points; three-dimensional structure information, static information and dynamic information of human body joint points are extracted to serve as characteristics for describing human body action behaviors, so that whether the human body falls abnormally or not is comprehensively analyzed. The method has the defects that the depth camera is high in cost, thousands of depth cameras are required even the cheapest method is adopted, bone information is collected and then analyzed, the method cannot be carried out in real time, the identified action is single, the uncombined action semantics are explained, and only static three-bit coordinates of front and back actions are simply analyzed.

Disclosure of Invention

In order to solve the technical problems, the prison dynamic real-time behavior recognition method and system provided by the invention are used for calculating and analyzing video contents by utilizing a video analysis algorithm through the current most popular deep neural network model so as to extract some specific events or monitoring targets occurring in a video scene.

The first object of the present invention is to provide a prison dynamic real-time behavior recognition method, which comprises the steps of obtaining a data input stream of a model, and further comprises the following steps:

step 1: performing processing of the data input stream;

step 2; a Cartesian space coordinate system is established by taking the connection point of the human vertebra and the pelvic bone as an origin;

step 3: modeling the whole figure by using a figure recognition module, and storing a human body characteristic matrix;

step 4: extracting and linking the characteristic points of other body parts into segments by using an image extraction module, establishing the segments in a Cartesian space coordinate system, and storing a human body characteristic matrix of the corresponding part;

step 5: in a Cartesian space coordinate system, after the coordinates of the figure of the person are determined, the relative position of the figure in the coordinate system is identified, and a feature matrix of the relative position is recorded;

step 6: combining the feature matrix in the step 3 to the step 5 with a behavior semantic library, and improving the traditional extraction of the bottom-layer physical features to the extraction of the high-layer semantic features.

Preferably, the data input stream is video and/or image.

In any of the above aspects, preferably, the step 1 includes a preliminary process and a frame-separating extraction process.

In any of the above schemes, preferably, the preliminary processing method is to use BM3D to filter noise in the input video stream.

In any of the above aspects, it is preferable that the video processing module performs the preliminary processing and then performs the frame-separated extraction processing on the video.

In any of the above schemes, preferably, the method of the frame-separating extraction processing is to extract N frames of pictures extracted by the camera into a picture stream of N/2 frames, where N is a positive integer.

In any of the above schemes, preferably, the step 2 includes the following substeps:

step 21: setting the horizontal right of a person as a negative half shaft of a Y axis and the left as a positive half shaft of the Y axis; a positive half shaft with the X axis forward and a negative half shaft with the X axis backward; the vertical upward direction is taken as a positive half shaft of the Z axis, and the vertical downward direction is taken as the Z axis;

step 22: extracting coordinate values (x, y, z) of key points of the skeleton diagram from a plurality of sequential video frames, and arranging the coordinate values in time sequence to obtain the following three matrixes: the horizontal axis contains a time sequence information matrix, and the vertical axis contains a space information matrix and a pixel value matrix;

step 23: and (3) turning and/or cutting the matrix to increase the data quantity.

In any of the above schemes, preferably, the step 3 includes the following substeps:

step 31: the identification module extracts the whole body type in a frame in the video, and extracts M characteristic points of the whole body, wherein M is a number threshold;

step 32: training the M characteristic points in the step 31 under a convolutional neural network;

step 33: and calculating to obtain the optimal matching of the weighted bipartite graph by using a KM algorithm.

Preferably, in any of the above schemes, the KM algorithm includes assigning a value to each vertex, called a top label, assigning a value to the left vertex as the maximum weight of the edge connected thereto, assigning a value to the right vertex as 0, and then performing matching.

In any of the above schemes, it is preferable that the matching principle is to match only the edges having the same weight as the left score (top label).

In any of the above schemes, it is preferable that if no edge match is found, d is subtracted from the top labels of all left vertices of the path and d is added to the top labels of all right vertices, where d is a threshold constant.

In any of the above aspects, preferably, the other body part includes at least one of a left hand, a right hand, a left hand arm, a right hand arm, a left foot, a left leg, a right foot, a right leg, and a head.

In any of the above schemes, preferably, the step 4 includes the following substeps:

step 41: calculating an included angle between the right hand arm and X, Y, Z;

step 42: calculating the distance between the right hand arm pair and the X, Y, Z axis;

step 43: the angle and the distance of the included angle are synthesized to judge the hand position, and the characteristic matrix of the right hand is stored;

step 44: calculating the motion of the right hand arm by using a motion tracking module, tracking the motion of the right hand arm, establishing a motion track in a Cartesian space coordinate system, and establishing a motion vector;

step 45: and judging the motion mode of the right hand by integrating the motion vector fusion characteristics, and storing a motion characteristic matrix of the right hand.

In any of the above embodiments, preferably, the step 4 further includes obtaining the motion feature matrices of the left hand, the left foot, the right foot, and the head using the methods of the steps 41 to 45.

In any of the above aspects, preferably, the step 6 includes performing real-time status representation of the action in combination with a behavior semantic library.

A second object of the present invention is to provide a prison dynamic real-time behavior recognition system, comprising an input module for obtaining a model data input stream, further comprising the following modules:

a processing module; for performing a processing of said data input stream;

the figure recognition module: for modeling the overall body shape and preserving body feature matrices

An image extraction module: for extracting feature points of other body parts;

motion tracking module: for calculating movements of the extremities and the head using a target tracking algorithm of opencv;

the system identifies dynamic real-time prison behavior in accordance with the method as described for the first purpose.

The invention provides a dynamic real-time behavior recognition method and a dynamic real-time behavior recognition system for prisons, which adopt a quasi-3D simulation algorithm, and acquire 70% more spatial data than a common two-dimensional picture, so that the accuracy of the algorithm is greatly improved, the false alarm rate is effectively reduced, and the algorithm is far higher than other behavior recognition levels. The bottom layer algorithm is customized according to the actual space volume in the monitoring industry, and the video analysis accuracy can be improved by 30% by adopting directional modeling and limb energy modeling, wherein the intelligent video analysis algorithm has zero missing report, low false alarm and high algorithm accuracy. The front end analysis is low in bandwidth, the system stability response is fast, and the data is safe and virus-free; while incorporating the advantages of the platform. Flexible compatibility of multiple brands, low intelligent migration cost, convenient operation, easy upgrading and double security of safe backup.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of a prison dynamic real-time behavior recognition method according to the present invention.

FIG. 1A is a flow chart of a method for establishing a Cartesian space coordinate system of the embodiment shown in FIG. 1 of a prison dynamic real-time behavior recognition method according to the present invention.

FIG. 1B is a flow chart of a modeling method of the embodiment shown in FIG. 1 of a prison dynamic real-time behavior recognition method according to the present invention.

FIG. 1C is a flow chart of a right hand movement feature matrix establishment method of the embodiment shown in FIG. 1 of a prison dynamic real-time behavior recognition method according to the present invention.

FIG. 2 is a block diagram of a preferred embodiment of a prison dynamic real-time behavior recognition system in accordance with the present invention.

Fig. 3 is a flow chart of another preferred embodiment of a prison dynamic real-time behavior recognition method according to the present invention.

FIG. 4 is a behavior semantic bottom graph of the embodiment shown in FIG. 3 of a prison dynamic real-time behavior recognition method according to the present invention.

FIG. 5 is a schematic diagram of one embodiment of running early warning of a prison dynamic real-time behavior recognition method according to the present invention.

Fig. 6 is a schematic diagram of an embodiment of fall early warning according to the prison dynamic real-time behavior recognition method of the present invention.

Fig. 7 is a schematic diagram of another embodiment of fall early warning of a prison dynamic real-time behavior recognition method according to the present invention.

Fig. 8 is a schematic diagram of an embodiment of loitering early warning of a prison dynamic real-time behavior recognition method according to the present invention.

FIG. 9 is a schematic diagram of one embodiment of a climbing warning for a prison dynamic real-time behavior recognition method in accordance with the present invention.

FIG. 10 is a schematic diagram of one embodiment of out-of-range early warning of a prison dynamic real-time behavior recognition method according to the present invention.

Detailed Description

The invention is further illustrated by the following figures and specific examples.

Example 1

As shown in fig. 1 and 2, step 100 is performed to obtain a data input stream of a model, the data input stream being a video and/or an image.

Step 110 is performed, using the input module 200, to perform the processing of the data input stream, including preliminary processing and frame-spaced extraction processing. The method of preliminary processing is to filter noise in the input video stream by using BM3D, and after the preliminary processing is performed on the video by the processing module 210, the video is subjected to the frame-separation extraction processing. The method of the frame-separated extraction processing is to extract N (n=30 in the present embodiment) frame pictures extracted by the camera into a picture stream of N/2 (N/2=15) frames.

Step 120 is performed to establish a Cartesian space coordinate system with the connection point of the human spine and the pelvic bone as the origin. As shown in fig. 1A, step 121 is performed to set the human horizontal right as the negative half-axis of the Y-axis and the left as the positive half-axis of the Y-axis; a positive half shaft with the X axis forward and a negative half shaft with the X axis backward; the vertical upward direction is taken as a positive half shaft of the Z axis, and the vertical downward direction is taken as the Z axis. Step 122 is executed to extract coordinate values (x, y, z) of key points of the skeleton map from the plurality of sequential video frames, and arrange the coordinate values in time sequence to obtain the following three matrices: the horizontal axis contains the timing information matrix, and the vertical axis contains the spatial information matrix and the pixel value matrix. Step 123 is executed to flip and/or clip the matrix, increasing the data size.

Step 130 is performed to model the overall person's shape with the shape recognition module 220 and save the body feature matrix. As shown in fig. 1B, step 131 is performed, where the identification module extracts the whole body type in one frame in the video, and extracts M (m=18 in this embodiment) feature points of the whole person, where M is a number threshold. Step 132 is executed to train the M feature points in step 31 under a convolutional neural network. Step 133 is performed to calculate the best match of the weighted bipartite graph using KM algorithm. The KM algorithm involves assigning a value to each vertex, called a top label, assigning the left vertex to the maximum weight of the edge to which it is connected, assigning the right vertex to 0, and then matching. The matching principle is to match only the edges with the same weight as the left score (top label), if no edge matching is found, the top labels of all left vertexes of the path are subtracted by d, and the top labels of all right vertexes are added by d, wherein d is a threshold constant.

Step 140 is executed, where the image extraction module 230 is used to extract and link the feature points of other body parts into segments, and the segments are built in a cartesian space coordinate system, and save the human feature matrix of the corresponding parts, where the other body parts include at least one of a left hand, a right hand, a left hand arm, a right hand arm, a left foot, a left leg, a right foot, a right leg, and a head. As shown in fig. 1C, step 141 is performed to calculate the angle between the right hand arm and X, Y, Z. Step 142 is performed to calculate the distance of the right hand arm pair from the X, Y, Z axis. Step 143 is executed to integrate the angle and distance of the included angle to determine the hand position, and save the feature matrix of the right hand. Step 144 is performed to calculate the motion of the right hand arm using the motion tracking module 240, track the motion of the right hand arm, establish a motion trajectory in a cartesian space coordinate system, and establish a motion vector. Step 145 is executed to integrate the motion vector fusion characteristics to determine the motion mode of the right hand, and save the motion feature matrix of the right hand. The motion feature matrices for the left hand, left foot, right foot and head are obtained using the methods of steps 141-145. Wherein the motion tracking module 250 is used to calculate the motion of the extremities and the head using an opencv target tracking algorithm.

Step 150 is executed to identify the relative position of the figure in the coordinate system after the coordinates of the figure of the person are determined in the cartesian space coordinate system, and record the feature matrix of the relative position.

Step 160 is executed, wherein the feature matrix in step 130 to step 150 is combined with a behavior semantic library, the traditional extraction of the bottom-layer physical features is improved to the extraction of the high-layer semantic features, and the behavior semantic library is combined to perform real-time representation of the state of the action.

Example two

The invention provides a dynamic real-time prison behavior recognition system, which is used for calculating and analyzing video contents by utilizing a video analysis algorithm through a current most popular deep neural network model so as to extract specific events or monitoring targets occurring in a video scene. The quasi-3D simulation algorithm is adopted, 70% more spatial data are acquired than the common two-dimensional picture, the accuracy of the algorithm is greatly improved, the false alarm rate is effectively reduced, and the algorithm is far higher than other behavior recognition levels. The bottom layer algorithm is customized according to the actual space volume in the monitoring industry, and the video analysis accuracy can be improved by 30% by adopting directional modeling and limb energy modeling. The intelligent video analysis algorithm has zero missing report and low false alarm, and the algorithm accuracy is high. The front end analysis is low in bandwidth, the system stability response is fast, and the data is safe and virus-free; while incorporating the advantages of the platform. Flexible compatibility of multiple brands, low intelligent migration cost, convenient operation, easy upgrading and double security of safe backup.

The invention adopts the following technical scheme:

system overall framework: the prison personnel behavior recognition systems on the market require complex physical connection devices, the systems are linked by small signal towers, and each prison personnel needs to wear a prescribed tag. The innovation of the system of the invention is that behavior recognition can be performed only by the image acquisition equipment and the model optimized by the user. As shown in the figure 3 of the drawings,

step 1: input stream of models (video or image);

step 2: the processing of the video picture filters noise in the input video stream, and spatial filtering, frequency domain filtering, wavelet domain filtering and the like are used in the systems on the market, so that the system can only be used for denoising a single image, and the principle and display are more visual because the operation is more convenient. The BM3D denoising method is specially used for image denoising, the self-similarity and redundant information of the whole image are fully utilized, the image denoising effect is very good, and the method is one of the best denoising methods currently acknowledged. But this builds on a tremendous complexity because the self-similarity and redundant information of the entire graph is exploited, not just for a single neighborhood.

Step 2-1: after the video is subjected to preliminary processing by a processing module, the video is subjected to frame-separation extraction processing, 30 frames of pictures extracted by a camera are extracted into 15 frames of picture streams, so that the operation amount of a computer is reduced, and the operation efficiency is improved;

step 3: a Cartesian space coordinate system is established by taking the connection point of the vertebra and the pelvic bone of a person as an origin, a negative half shaft of the Y axis is set to the right of the person level, and a positive half shaft of the Y axis is set to the left of the person level; a positive half shaft with the X axis forward and a negative half shaft with the X axis backward; the vertical upward direction is taken as a positive half shaft of the Z axis, and the vertical downward direction is taken as the Z axis. Coordinate values (x, y, z) of key points of the skeleton diagram are extracted from a plurality of sequential video frames and are arranged in a time sequence to obtain three matrixes, wherein the horizontal axis contains time sequence information, the vertical axis contains space information, the pixel value matrixes are similar to an RGB diagram, and three-dimensional coordinate values of x, y and z are as three channels of R, G, B of the RGB diagram, so that CNN can be input for feature extraction. Then adopting operations such as overturning, cutting and the like to increase the data quantity.

Step 4: firstly modeling the figure of the whole person by using a figure recognition module, extracting the whole figure in a frame in a video by using the recognition module, extracting the whole 18 characteristic points, training the above identified characteristic points under a convolutional neural network, and then using a KM algorithm to replace a Hungary algorithm to solve the best matching problem of a weighted bipartite graph. The first step is to assign a value to each vertex, called a top label, to assign the left vertex to the maximum weight of the edge to which it is connected, and to assign the right vertex to 0. And secondly, starting matching, namely matching only with the edges with the same weight as the left scores (top labels), and subtracting d from the top labels of all left vertexes of the path and adding d to the top labels of all right vertexes if no edge matching is found. The parameter d we here take on a value of 0.1. After the weight is introduced by KM, the matching success rate is greatly improved. Judging the body type according to the relative relation of the characteristic points, and storing the characteristic matrix;

step 5: according to the image extraction module, extracting and linking characteristic points of the right hand arm into segments, establishing in a Cartesian space coordinate system, firstly calculating an included angle between the arm and X, Y, Z, secondly calculating a distance between the arm pair and a X, Y, Z shaft, finally judging the hand position by integrating the angle and the distance, and storing a characteristic matrix of the hand position;

step 6: calculating the motion of the right hand arm, tracking the motion of the arm by using a motion tracking module, establishing a motion track in a Cartesian space coordinate system, and establishing a motion vector; for example, recording motion vectors to palm feature points, recording motion vectors to arm features, finally, combining the previous vector fusion characteristics to judge the motion mode of the hand, and storing the motion feature matrix;

step 7: the steps 5 to 6 are all right hand recognition, and the motion feature matrixes of the left hand, the left foot, the right foot and the head can be recognized according to the same principle;

step 8: in a Cartesian space coordinate system, after the coordinates of the figure of the person are determined, the relative position of the figure in the coordinate system is identified, and a feature matrix of the relative position is recorded;

step 9: by integrating the feature matrix, the traditional extraction of the bottom physical features is improved to the extraction of the high-level semantic features, as shown in fig. 4, and then the state real-time representation of the action is performed by combining the behavior semantic library.

In the scheme, all the characteristic parameters are subjected to pooling dimension reduction treatment in the neural network, and are subjected to fusion judgment and output at the final output layer.

The coordinate system in the step 3 is also the basis for judging the positions and the relative movements of the arm, the foot and the head, so that the method plays an important role in judging the result, and the basis for establishment is not a fixed coordinate system established for the camera, but a human body trunk is taken as a reference system, and the relative positions are transformed according to the steering transformation of the human body.

In the motion determination in step 6, after capturing a moving image in a spatial coordinate system, we can set the determination from the negative half axis to the positive half axis of the Y axis to the left motion in the coordinate system, and other similar can be set.

The invention provides a dynamic real-time behavior recognition method for prisons, which provides a novel method for tracking and recognizing prison personnel in the field of prison security, improves by the current most advanced deep learning algorithm and then combines a behavior recognition semantic library so as to realize multi-action recognition under a simple system; 8 kinds of illegal actions are identified by using the minimum amount of physical equipment resources, and alarms can be analyzed and given in real time, so that the workload of people can be greatly reduced; the structure is relatively simple, the functions are highly integrated, and the operation is convenient; the equipment cost is low, the testing precision is high, and the system performance is good.

Example III

The whole behavior semantics are identified in the context of a simulated prison. By means of the video stream acquisition device, people can grasp actions in real time

Extracting human skeleton points: and (3) performing a series of operations under a Cartesian coordinate system, obtaining skeleton points of the whole person, and performing behavioral language analysis for later combination with a behavioral semantic library.

The following actions are all the action recognition after the action semantic analysis.

1. Running early warning: as shown in fig. 5, the sudden stop and jerk actions of prison staff can be identified, the prison staff is smooth to walk and no running early warning occurs, and only sudden running triggers the early warning. For management of prisons, because prisoners are unable to run suddenly.

The difference between walking and running is that:

1. different definitions are provided.

(1) The walking is to move the feet forward in an interactive way and the speed is generally slower;

(2) running is a periodic movement which makes oneself advance rapidly under the condition of pedal and swing combination and action coordination.

2. The manner is different.

(1) When walking, one foot always lands, and the walking frequency and the stride length of the walking are usually smaller, such as slow walking, fast walking, heel-and-toe walking and the like;

(2) when running, two feet are always in the air, and the running frequency and stride length are often larger, such as jogging, sprinting, constant speed running and the like.

3. The gestures are different.

(1) The walking can lead the body not to incline forward or incline forward is not obvious, and the heels always land firstly when the two feet land;

(2) running can lead to forward leaning of the body, and when the feet land, the feet often land with the front soles. So we combine the definition of running and walking according to the relative positions of the key points of the feet and then combine with the behavior semantic library to determine whether to run.

2. Fall early warning: as shown in fig. 6 and 7, the system is mainly used for rescue, and the first time is required for discovery and early warning because prisons can have the conditions of syncope, heart disease burst and the like. Because a fall refers to a sudden, involuntary, unintended change in position, falling to the ground or a lower plane. Fall includes the following two classes: (1) a drop from one plane to another: (2) falls on the same plane.

So we determine whether to fall or not based on the instantaneous speed of the whole body point, especially the buttocks, and its displacement, combined with the relative position between the key point and the surrounding plane.

3. Loitering early warning: as shown in fig. 8, the method is mainly used for prison key areas not allowing stay and wander, wander is mainly calculated by using a circulating track of key point positions, when a key point is repeated in a certain space or 98, climbing is one of main dangerous actions of prison security, and most areas are forbidden to climb. We combine a series of key points of hand, elbow, knee, hip, foot, etc. by combining the relative positions between them and key point combination patterns (e.g. hand lifting, finger grabbing something while lifting foot) with a behavioral semantic library to determine whether to climb.

5. Boundary crossing early warning: as shown in fig. 10, out-of-range is typically used to prohibit access to an area. The method comprises the steps of demarcating the area which is forbidden to be accessed, detecting whether the positions of key points of prison staff overlap with the positions of the area or whether the positions of the key points have a tendency to intersect with the area, and judging whether to trigger out-of-range early warning.

The foregoing description of the invention has been presented for purposes of illustration and description, but is not intended to be limiting. Any simple modification of the above embodiments according to the technical substance of the present invention still falls within the scope of the technical solution of the present invention. In this specification, each embodiment is mainly described in the specification as a difference from other embodiments, and the same or similar parts between the embodiments need to be referred to each other. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.

Claims

1. The prison dynamic real-time behavior recognition method comprises the steps of obtaining a data input stream of a model, and is characterized by further comprising the following steps:

step 1: performing processing of the data input stream;

step 4: extracting and linking the characteristic points of other body parts into segments by using an image extraction module, establishing the segments in a Cartesian space coordinate system, and storing a human body characteristic matrix of the corresponding part; the other body parts include at least one of a left hand, a right hand, a left hand arm, a right hand arm, a left foot, a left leg, a right foot, a right leg, and a head;

step 6: combining the feature matrix in the step 3 to the step 5 with a behavior semantic library, and improving the traditional extraction of the bottom physical features to the extraction of the high-level semantic features;

the method is used for at least one of running early warning, falling early warning, loitering early warning and out-of-range early warning;

the method for judging the loitering early warning is to judge whether climbing or not by combining the relative positions of key points, the key point combination mode and the behavior semantic library;

the judgment method of the out-of-range early warning is to demarcate the area which is forbidden to be accessed, and then detect whether the positions of key points of prison staff overlap with the positions of the area or whether the positions of the key points have a tendency of intersecting with the area;

wherein, the key points include: hands, elbows, knees, buttocks and feet.

2. The prison dynamic real-time behavior recognition method of claim 1, wherein the data input stream is a video and/or an image.

3. The method for identifying dynamic real-time behaviors of prisons according to claim 2, wherein the step 1 comprises a preliminary process and a frame-isolated extraction process.

4. A prison dynamic real-time behavior recognition method according to claim 3, wherein the preliminary processing method is to use BM3D to filter noise in the input video stream.

5. The method for identifying dynamic real-time behaviors of prisons according to claim 4, wherein after the preliminary processing is performed on the video by using a processing module, the frame-separation extraction processing is performed on the video.

6. The method for identifying dynamic real-time behavior of prison according to claim 5, wherein the method for extracting frames is to extract N frames of pictures extracted by a camera into a picture stream of N/2 frames, wherein N is a positive integer.

7. The method for dynamic real-time behavior recognition of prison according to claim 6, wherein said step 2 comprises the sub-steps of:

step 21: setting the horizontal right of a person as a negative half shaft of a Y axis and the left as a positive half shaft of the Y axis

A shaft; a positive half shaft with the X axis forward and a negative half shaft with the X axis backward; the vertical upward direction is taken as a positive half shaft of the Z axis, and the vertical downward direction is taken as the Z axis;

8. The method for dynamic real-time behavior recognition of prison according to claim 7, wherein said step 3 comprises the sub-steps of:

9. The method for identifying dynamic real-time behavior of prison according to claim 8, wherein the KM algorithm comprises assigning a value to each vertex, called a top label, assigning a value to the left vertex as the maximum weight of the edge connected thereto, assigning a value to the right vertex as 0, and then performing matching.

10. The method for identifying dynamic real-time behaviors of prison according to claim 9, wherein the matching principle is to match only the weights with the left score, i.e. the top mark, and the same edges.

11. The method of claim 10, wherein if no edge match is found, d is subtracted from the top labels of all left vertices and d is added to the top labels of all right vertices of the path, where d is a threshold constant.

12. The method for dynamic real-time behavior recognition of prisons according to claim 11, characterized in that said step 4 comprises the sub-steps of:

step 41: calculating an included angle between the right hand arm and X, Y, Z;

13. The method of dynamic real-time prison behavior recognition according to claim 12, wherein said step 4 further comprises obtaining said motion profile matrices for left hand, left foot, right foot and head using the method of steps 41-45.

14. The method for dynamic real-time behavior recognition in prison according to claim 13, wherein said step 6 comprises real-time representation of the state of an action in combination with a semantic library of behaviors.

15. A prison dynamic real-time behavior recognition system comprising an input module for obtaining a model data input stream, characterized by the following modules:

a processing module; for performing a processing of said data input stream;

the figure recognition module: the method is used for modeling the overall body shape and storing the body feature matrix; an image extraction module: for extracting feature points of other body parts; the other body parts include at least one of a left hand, a right hand, a left hand arm, a right hand arm, a left foot, a left leg, a right foot, a right leg, and a head;

the system identifies dynamic real-time prison behavior according to the method of claim 1;

the system is used for performing at least one of running early warning, falling early warning, loitering early warning and out-of-range early warning;

wherein, the key points include: hands, elbows, knees, buttocks and feet.