CN114494427A

CN114494427A - Method, system and terminal for detecting illegal behavior of person standing under suspension arm

Info

Publication number: CN114494427A
Application number: CN202111550551.1A
Authority: CN
Inventors: 薛念明; 王军建; 陆顺; 刘立强; 李超; 李勋; 徐崇豪
Original assignee: Shandong Luruan Digital Technology Co Ltd
Current assignee: Shandong Luruan Digital Technology Co Ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-05-13

Abstract

The invention provides a method, a system and a terminal for detecting violation behaviors of a person standing under a crane boom based on binocular vision and artificial intelligence technology, wherein two cameras are arranged according to the position of the crane; shooting a construction site video, and decoding to obtain an image to be detected; inputting the two images into a trained crane key point detection model to obtain coordinate values of a pixel coordinate system of the crane key point on the two images; inputting the left image and the right image into a trained human body key point detection model, and detecting a plurality of key points of a human body in the images; inputting the coordinates of the key points of the crane into a binocular vision positioning module to obtain the coordinate values of the key points of the crane and establish a crane coordinate system; inputting coordinate values of the key points of the human body in a pixel coordinate system into a binocular vision positioning module to obtain mean value coordinates; and judging the compliance according to the coordinate relation. The invention monitors the standardization of the crane operation on the construction site and reduces the illegal operation.

Description

Method, system and terminal for detecting illegal behavior of person standing under suspension arm

Technical Field

The invention relates to the technical field of image processing and deep learning, in particular to a method, a system and a terminal for detecting violation behaviors of a person standing under a boom based on binocular vision and artificial intelligence technology.

Background

In the construction scene of electric power, the application of hoist is very extensive, and the safety accident that causes because of the hoist illegal operation also takes place occasionally annually, and the crane arm people of standing down is one of leading to the leading cause of hoist accident.

In the existing supervision of construction scenes, supervision personnel are mainly arranged on site to supervise, and if violation conditions are found, the situation is corrected, but for crane equipment, as a suspension arm of a crane can move continuously in the using process, the site supervision personnel cannot track the position of the suspension arm in real time, a visual field blind area occurs, the blind area can cause danger to personnel under the suspension arm, identification and reminding cannot be achieved, and potential safety hazards are brought to construction operation.

Disclosure of Invention

In order to monitor the normative of crane operation on a construction site and reduce life and property loss caused by illegal operation, the invention adopts binocular vision and artificial intelligence technology to monitor the position of a crane jib on the construction site constantly, calculates the position of workers on the construction site, and judges whether the illegal action of the person standing under the jib exists or not by comparing the position of the crane jib and the positions of the workers.

The method comprises the following steps:

the method comprises the following steps: two cameras are arranged according to the position of the crane, so that the crane is positioned in a common visual field of the two cameras, and a binocular visual model is formed by the intersection of the visual fields of the two cameras;

step two: shooting a construction site video by using a binocular camera, decoding the video stream, and respectively obtaining to-be-detected images decoded by the two cameras;

step three: inputting the left image and the right image shot by the binocular camera into a trained crane key point detection model, detecting the two images, and respectively obtaining coordinate values of a pixel coordinate system of the crane key point on the two images;

step four: inputting a left image and a right image shot by a binocular camera into a trained human body key point detection model, wherein the human body key point detection model detects a plurality of key points of a human body in the images by taking the human body as a detection target;

step five: inputting the coordinates of the key points of the crane obtained in the third step into the binocular vision positioning module obtained in the first step to obtain coordinate values of 8 key points of the crane in a world coordinate system; fitting the minimum approximation plane P of the 8 coordinate points by the least square method principle₁Then according to the minimum approximation plane P₁Establishing a crane coordinate system;

step six: inputting the coordinate values of the key points of the human body in the pixel coordinate system of the pictures shot by the left camera and the right camera obtained in the fourth step into the binocular vision positioning module obtained in the first step to obtain the coordinate values of the key points of the human body in the world coordinate system, and calculating the mean coordinate P (x) of the key points₁,y₁,z₁)；

Step seven: calculating the point P (x) obtained in the step six₁,y₁,z₁) The approximation plane P obtained in the distance step five₁Distance d of₁(ii) a Distance d₁The calculation method of (c) is as follows:

step eight: calculating the coordinate position of the point obtained in the step six in the crane coordinate system according to the following formula;

where R is a 3 × 3 orthogonal identity matrix, the translation vector t is a 3 × 1 dimensional vector, P (x)₁,y₁,z₁) The mean coordinate (x) of the key points of the human body obtained in the sixth step_cra,y_cra,z_cra) Mapping the mean value coordinates of the key points of the human body to the values of the points in the crane coordinate system;

step nine: and (4) according to the relation between the coordinate position in the crane coordinate system and the mean value coordinate of the key point of the human body, performing compliance judgment and outputting a compliance judgment result.

It should be further noted that the first step further includes:

establishing a coordinate system;

configuring a three-dimensional coordinate system conversion mode which is mapped to a camera coordinate system from a world coordinate system; the expression of the conversion mode is:

the corresponding homogeneous expression is:

where R is a 3X 3 orthogonal identity matrix, the translation vector t is a 3X 1 dimensional vector R, t camera independent, two parameters are camera extrinsic parameters, X_w、Y_w、Z_wIs a certain point, X, in the world coordinate system_c、Y_c、Z_cMapping a next point in the world coordinate system to a point in the camera coordinate system;

projecting a camera coordinate system to an image coordinate system to obtain a conversion mode from a three-dimensional coordinate to a two-dimensional coordinate;

the conversion relation between the camera coordinate system and the image coordinate system meets a pinhole imaging model of the camera, the pinhole imaging model performs scaling operation on space points in a certain proportion, and the space points pass through a pinhole O_cProjected onto a two-dimensional imaging plane; the plane pi is called the image plane of the camera, point O_cCalled the center of the camera, f is the focal length of the camera;

the homogeneous coordinates of the spatial point P in the camera coordinate system are:

X_c＝[X_c Y_c Z_c 1]^T，

the homogeneous coordinate of the image point m in the image coordinate system is recorded as:

m＝[x y 1]^T

the matrix representation is used as:

it should be further noted that the image coordinate system is transformed to the final pixel coordinate system to obtain a transformation mode between two-dimensional coordinates;

establishing a rectangular coordinate system u-v taking a pixel as a unit by taking the upper left corner of the image as an origin, wherein the abscissa u and the Z coordinate v of the pixel are the number of columns and the number of rows of the pixel in the image array respectively;

establishing an image coordinate system with a physical unit as an index;

the above formulas are combined to write a matrix to achieve the relationship between the world coordinate system and the pixel coordinate system:

is provided with

Wherein f is_xAnd f_yReferred to as the effective focal length,the unit is a pixel; the expression can be expressed as:

wherein, the matrix M₁Representing camera intrinsic parameters; matrix M₂Representing the internal parameters of the camera, which are related to the setting of the position and world coordinates of the camera, including the rotation matrix R and the translation matrix t, describe the way in which the pose of the camera changes.

In the third step of the invention, the concrete training process of the crane key point detection model is as follows:

preparing a training set: shooting a working scene of the crane by using a binocular camera to obtain images shot by a left camera and a right camera, then marking four key points of a crane boom and four key points of a pulley block in a training set image by using a Labelme marking tool to obtain image labels, and correspondingly using the shot images and the manufactured labels one by one as a data set for detecting the crane key points;

setting a training set and a verification set of the data set according to the ratio of 8: 2;

inputting the training set into an OpenPose key point detection algorithm, and training in an NVIDIA GPU server according to default parameters of the algorithm;

and (4) evaluating the performance of the model by taking OKS as an evaluation standard until the model meets the use requirement to obtain the crane key point detection model.

In the fourth step of the invention, 17 key points of the human body in the detected image are detected; the specific training process of the human body key point detection model is as follows:

preparing a training set: shooting a working scene of the crane by using a binocular camera to obtain images shot by a left camera and a right camera, then labeling 17 key points of a human body in the images of the training set by using a Labelme labeling tool to obtain image labels, and taking the shot images and the manufactured labels in one-to-one correspondence as a data set for detecting the key points of the human body;

and setting the training set and the verification set of the data set according to the ratio of 8: 2.

And inputting the training set into an OpenPose key point detection algorithm, and training in an NVIDIA GPU server by using default parameters of the algorithm.

And (4) evaluating the performance of the model by taking OKS as an evaluation standard until the model meets the use requirement to obtain the human body key point detection model.

In the ninth step of the invention, when the crane key point detection model does not detect the crane key point, the output result of the compliance judgment is normal;

when the crane key point detection model detects the crane key point and the human body key point detection model does not detect the human body key point, the output result of the compliance judgment is normal.

Judging the regularity by a set threshold value a;

when the crane key point detection model detects a crane key point, the human body key point detection model detects a human body key point, and the mean value P of the human body key point in the world coordinate system is away from the approaching plane P of the crane key point₁Distance d of₁When the output value is larger than a set threshold value a, the output result of the compliance judgment is normal;

when the crane key point detection model detects the crane key point, the human body key point detection model detects the human body key point, and the mean value P of the human body key point is far from the approaching plane P of the crane key point₁Distance d of₁Is less than the set threshold value a, and z in step eight_craWhen the value is less than 0, the compliance judgment output result is "normal".

When the crane key point detection model detects a crane key point, the human body key point detection model detects a human body key point, and the mean value P of the human body key point is far away from the approaching plane P of the crane key point₁Distance d of₁Is less than the set threshold value a, and z in step eight_craIf the output value is greater than 0, the compliance judgment output result is 'violation'.

The invention also provides a system for detecting the violation of a person standing under the suspension arm, which comprises the following steps: the system comprises two cameras, an image acquisition and decoding module, a coordinate value analysis module, a human body key point detection module, a crane coordinate system establishing module, a key point mean coordinate calculation module, a distance calculation module, a key point position calculation module and a compliance judgment module;

two cameras are arranged according to the position of the crane, so that the crane is positioned in a common visual field of the two cameras, and a binocular visual model is formed by the intersection of the visual fields of the two cameras;

the image acquisition and decoding module is used for shooting a construction site video by using a binocular camera, decoding the video stream and respectively acquiring the to-be-detected images decoded by the two cameras;

the coordinate value analysis module is used for inputting the left image and the right image shot by the binocular camera into the trained crane key point detection model, detecting the two images and respectively obtaining the coordinate values of the crane key point in the pixel coordinate systems of the two images;

the human body key point detection module is used for inputting a left image and a right image shot by the binocular camera into a trained human body key point detection model, and the human body key point detection model detects a plurality of key points of a human body in the images by taking the human body as a detection target;

the crane coordinate system establishing module is used for inputting the coordinates of the key points of the crane into the binocular vision positioning module to obtain the coordinate values of 8 key points of the crane in a world coordinate system; fitting the minimum approximation plane P of the 8 coordinate points by the least square method principle₁Then according to the minimum approximation plane P₁Establishing a crane coordinate system;

the key point mean coordinate calculation module is used for inputting the coordinate values of the key points of the human body in the pixel coordinate system of the pictures shot by the left camera and the right camera obtained in the fourth step into the binocular vision positioning module to obtain the coordinate values of the key points of the human body in the world coordinate system and calculate the mean coordinate P (x) of the key points₁,y₁,z₁)；

The distance calculation module is used for calculating a point P (x)₁,y₁,z₁) Distance approximation plane P₁Distance ofFrom d₁Distance d₁The calculation method of (c) is as follows:

the key point position calculating module is used for calculating the coordinate position of the key point in the crane coordinate system according to the following formula;

where R is a 3 × 3 orthogonal identity matrix, the translation vector t is a 3 × 1 dimensional vector, P (x)₁,y₁,z₁) To obtain the mean coordinates of the human key points, (x)_cra,y_cra,z_cra) Mapping the mean value coordinates of the key points of the human body to the values of the points in the crane coordinate system;

and the compliance judgment module is used for judging compliance according to the relation between the coordinate position in the crane coordinate system and the mean value coordinate of the key point of the human body and outputting a compliance judgment result.

The invention also provides a terminal for realizing the method for detecting the illegal behavior of the person standing under the suspension arm, which comprises the following steps:

the memory is used for storing a computer program and a method for detecting illegal behaviors of a person standing off the boom;

and the processor is used for executing the computer program and the illegal behavior detection method for the person standing under the boom so as to realize the steps of the illegal behavior detection method for the person standing under the boom.

According to the technical scheme, the invention has the following advantages:

inputting coordinate values of key points of a human body in a pixel coordinate system into a binocular vision positioning module to obtain mean value coordinates; and judging compliance according to the coordinate relation. The invention can achieve the effect of real-time detection, and can store the illegal site images for manual review, thereby greatly improving the efficiency. The system automatically acquires images without manual supervision and management on site, performs compliance judgment, outputs the compliance judgment result, and can give an alarm if an illegal state occurs.

The binocular camera provided by the invention consists of a left camera and a right camera which are horizontally arranged and have basically the same internal parameters, the imaging processes of the two cameras both meet the pinhole imaging model of a classic camera, the two cameras can avoid dead angles of photographing, panoramic image shooting is realized, and compliance judgment is carried out.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the description will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a flow chart of a method for detecting an illegal action of a person standing under a boom;

FIG. 2 is a schematic diagram of a world coordinate system and a camera coordinate system;

FIG. 3 is a schematic diagram of the transformation between the camera coordinate system and the image coordinate system;

FIG. 4 is a schematic diagram of the transformation between an image coordinate system and a pixel coordinate system;

FIG. 5 is a schematic view of a binocular vision model;

FIG. 6 is a schematic view of the principle of binocular vision;

FIG. 7 is a schematic diagram of key points of a crane;

FIG. 8 is a schematic view of a crane coordinate system;

FIG. 9 is a schematic diagram of a transformation between a crane coordinate system and a world coordinate system;

fig. 10 is a diagram showing a relationship between the crane and the human body.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present invention provides the units and algorithmic steps of the examples described for the embodiments disclosed in the boom man-under-station violation detection method and system, which can be implemented in electronic hardware, computer software, or a combination of both, and in the above description the components and steps of the examples have been generally described in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The block diagram shown in the drawings of the present invention providing a method and system for detecting violations of a person standing under a boom is merely a functional entity and does not necessarily correspond to a physically separate entity. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

In providing a method and system for detecting violations of a person standing under a boom, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The invention provides a method and a system for detecting illegal behaviors of a person standing under a suspension arm, which are used for monitoring the standardization of crane operation on a construction site, reducing life and property losses caused by illegal operation and improving the safety of a construction process. The invention adopts binocular vision and artificial intelligence technology to monitor the position of the crane jib on a construction site at any time, calculate the position of workers on the construction site, and judge whether the violation of the crane jib-off-station person exists or not by comparing the position of the crane jib with the position of the workers.

As shown in fig. 1, step one: two cameras are arranged according to the position of the crane, so that the crane is positioned in a common visual field of the two cameras, and a binocular visual model is formed by the intersection of the visual fields of the two cameras;

in the invention, the binocular camera generally consists of a left camera and a right camera which are horizontally arranged and have basically the same internal parameters, and the imaging processes of the two cameras meet the pinhole imaging model of a classical camera. In the process of pinhole imaging, the establishment of a coordinate system is a crucial part. Generally, real-world object formation images we see at a digital terminal are subjected to the following processes in total: firstly, mapping from a world coordinate system to a camera coordinate system, belonging to three-dimensional coordinate system conversion; secondly, projecting the image coordinate system from a camera coordinate system, and belonging to the mapping from three-dimensional coordinates to two-dimensional coordinates; finally, the transformation from the image coordinate system to the final pixel coordinate system is a transformation between two-dimensional coordinates. The steps in the camera optical imaging process based on the pinhole imaging model will be described in detail below.

Specifically, the conversion from the world coordinate system to the camera coordinate system is as follows:

the world coordinate system and the camera coordinate system are both the transformation of a three-dimensional space coordinate system, and are essentially a rigid transformation process. In a three-dimensional space, when an object is not deformed, a geometric object is subjected to rotational and translational motion, which is called rigid body change. The transformation relationship is shown in FIG. 2:

expression of rigid body transformation between the two

The corresponding homogeneous expression is:

where R is a 3 × 3 orthogonal identity matrix (i.e., a rotation matrix), the translation vector t is a 3 × 1 dimensional vector, R, t is camera independent, so these two parameters are referred to as the camera extrinsic parameters, X_w、Y_w、Z_wIs a certain point, X, in the world coordinate system_c、Y_c、Z_cAnd mapping the next point in the world coordinate system to the point in the camera coordinate system.

specifically, the transformation relationship between the camera coordinate system and the image coordinate system satisfies the pinhole imaging model of the camera, which performs a scaling operation on the spatial points in a certain proportion and passes through the pinhole O_cProjected onto a two-dimensional imaging plane, the depth information represented by the Z-axis is lost during the conversion process, since the process is a conversion of a three-dimensional coordinate system into a two-dimensional coordinate system. As shown in fig. 3, is a basic model of a pinhole camera. The plane pi is called the image plane of the camera, point O_cReferred to as the center (or optical center) of the camera, and f is the focal length of the camera.

As shown in fig. 3, the homogeneous coordinates of the space point P in the camera coordinate system are:

X_c＝[X_c Y_c Z_c 1]^T，

its homogeneous coordinate of the image point m in the image coordinate system is recorded as:

m＝[x y 1]^T

the matrix representation is used as:

in the invention, an image coordinate system is transformed to a final pixel coordinate system to obtain a conversion mode between two-dimensional coordinates;

as shown in fig. 4, a rectangular coordinate system u-v with pixels as units is established with the upper left corner of the image as the origin, and the abscissa u and the Z coordinate v of the pixel are the number of columns and the number of rows in the image array, respectively.

Since (u, v) only represents the column number and the row number of the pixels, and the coordinate positions of the pixels in the image are not expressed by physical units, an image coordinate system taking the physical units as indexes is established.

The above formulas are combined to write a matrix, and the relation between a world coordinate system and a pixel coordinate system can be achieved:

is provided with

Wherein f is_xAnd f_yReferred to as the effective focal length, in pixels. The expression can be expressed as:

wherein, the matrix M₁Representing camera intrinsic parametersIt is related to the camera internal parameters such as the center coordinate of the camera, the focal length of the camera, the size of the CCD camera sensor, etc.; usually, the environment will not change due to the change of the external environment; matrix M₂Representing the internal parameters of the camera, which are related to the setting of the position and world coordinates of the camera, including the rotation matrix R and the translation matrix t, describe the way in which the pose of the camera changes. The process of camera calibration is to solve the matrix M₁Sum matrix M₂And further the pixel coordinates (u, v) corresponding to any point P (x, y, z) in space.

Under the condition that parameters of the monocular camera are known, only two linear equations about a point P (x, y, z) can be obtained, a projection point is mapped onto a ray, and the specific position of the space point cannot be determined according to only one pixel. This is because all the points of the connecting line from the camera optical center to the normalization plane can be projected onto the pixel, and the three-dimensional coordinates corresponding to the real world cannot be uniquely determined, so that a camera needs to be added to obtain the depth information of the point P, and only after the depth information of the point P is determined, the spatial position of the point P can be exactly known to uniquely determine the three-dimensional coordinates of the point P in the real world.

The invention adopts the position of two cameras according to the position of the crane as shown in figure 5, so that the crane is positioned in the common visual field of the two cameras to form a binocular camera system, a binocular vision model is formed by the intersection of the visual fields of the two cameras, and the measured object is measured in the common visual field of the two cameras.

The principle of the binocular vision model is shown in fig. 6: in the convergent stereo model, P is assumed₁And P₂Is the imaging point of the same point P in the space on the left image and the right image respectively. From the above, the two cameras (C1 and C2) have their respective projection matrices, M₁And M₂In the left and right images, the relationship between the spatial point and the image point is:

from the above, if the pixel positions of the point P in the two cameras and the related internal and external parameters of the cameras are known, the coordinates (X, Y, Z) of the feature point P in the world coordinate system can be obtained.

the method comprises the steps of shooting construction site videos by using a binocular camera, decoding the videos by using tools such as OpenCv and the like, and respectively obtaining to-be-detected images of left and right cameras after decoding.

The image to be detected consists of a plurality of groups of images, and each group of images comprises two images shot by a left camera and a right camera at the same time.

Step three: inputting the left and right images shot by the binocular camera into a trained crane key point detection model, and detecting the two images to respectively obtain coordinate values of the crane key point in pixel coordinate systems of the two images;

according to the method, a left image and a right image shot by a binocular camera are input into a trained crane key point detection model, and the crane key point detection model respectively detects the two images shot by the left camera and the right camera to obtain coordinates of a crane key point.

The key points of the crane to be detected by the invention are shown in fig. 7, and the number of the key points is 8.

In the third step, the trained crane key point detection model is trained based on an OpenPose key point detection network.

The invention adopts the crane key point detection model to detect the key points of the crane jib and the pulley block on the construction site, thereby obtaining the key point positions of the crane jib and the pulley block in the image.

The specific training process of the crane key point detection model comprises the following steps:

preparing a training set: the method comprises the steps of shooting a working scene of the crane by using a binocular camera to obtain images shot by a left camera and a right camera, then marking four key points of a crane boom and four key points of a pulley block in a training set image by using a Labelme marking tool to obtain image labels, and enabling the shot images and the manufactured labels to be in one-to-one correspondence to be used as a data set for crane key point detection.

detecting 17 key points of the human body in the image; the specific training process of the human body key point detection model is as follows:

the trained crane key point detection model is trained based on an OpenPose key point detection network.

The invention adopts a human body key point detection model to detect key points of workers on a construction site, thereby obtaining the positions of the human body key points.

The specific training process of the human body key point detection model comprises the following steps:

preparing a training set: the method comprises the steps of shooting a working scene of a crane by using a binocular camera to obtain images shot by a left camera and a right camera, then marking 17 key points of a human body in images of a training set by using a Labelme marking tool to obtain image labels, and enabling the shot images and the manufactured labels to be in one-to-one correspondence to be used as a data set for detecting the key points of the human body.

The data sets were scaled to 8:2 for the training set and the validation set.

the expression formula of the minimum approximation plane of the invention is

A₁x+B₁y+C₁z+D₁＝0

The key points of the crane are 8 in number, and the specific distribution is shown in figures 8 and 9. The crane coordinate system takes a point C in crane key points as an origin of coordinates, and the point C horizontally leftwards is Z of the crane coordinate system_craThe axis, point C, is horizontally Y of the crane coordinate system_craAxis, X_craThe axis is oriented perpendicular to the plane of the paper.

Step six: inputting the coordinate values of the key points of the human body in the pixel coordinate system of the pictures shot by the left camera and the right camera obtained in the fourth step into the binocular vision positioning module to obtain the coordinate values of the key points of the human body in the world coordinate system, and calculating the mean coordinate P (x) of the key points₁,y₁,z₁)；

In the ninth step, compliance judgment is performed as shown in fig. 10, and a compliance judgment result is output.

When the crane key point detection model does not detect the crane key point, the output result of the compliance judgment is normal;

In the ninth step, the regularity is judged by a set threshold value a;

when the crane key point detection model detects the crane key point, the human body key point detection model detects the human body key point, and the mean value P of the human body key point is far from the approaching plane P of the crane key point₁Distance d of₁Is less than the set threshold value a, and z in the step eight_craWhen less than 0, the compliance is judgedThe output result is 'normal'.

The method is based on a deep learning method for modeling, a complex scene of a person standing under a suspension arm is decomposed into two targets of a key point of a crane and a key point of a human body, and positions of the human body and the crane in a world coordinate system are extracted by using a binocular vision technology.

The invention can achieve the effect of real-time detection, and can store the illegal site images for manual review, thereby greatly improving the efficiency and reducing the working intensity of the supervision personnel.

The invention provides a method for detecting violation behaviors of a person standing under a suspension arm, which mainly relates to an artificial intelligence computer vision technology and an artificial intelligence cloud service in the cloud technology.

Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. Computer Vision technology (CV) is a science for researching how to make a machine see, and further means that a camera and a Computer are used for replacing human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Based on the method, the invention also provides a system for detecting the illegal behavior of a person standing under the suspension arm, and the system comprises: the system comprises two cameras, an image acquisition and decoding module, a coordinate value analysis module, a human body key point detection module, a crane coordinate system establishing module, a key point mean coordinate calculation module, a distance calculation module, a key point position calculation module and a compliance judgment module;

the coordinate value analysis module is used for inputting the left image and the right image shot by the binocular camera into a trained crane key point detection model, detecting the two images and respectively obtaining coordinate values of a pixel coordinate system of the crane key point on the two images;

The distance calculation module is used for calculating a point P (x)₁,y₁,z₁) Distance approximation plane P₁Distance d of₁Distance d₁The calculation method of (c) is as follows:

where R is a 3 × 3 orthogonal identity matrix, the translation vector t is a 3 × 1 dimensional vector, P (x)₁,y₁,z₁) For derived key points of the human bodyMean coordinate (x)_cra,y_cra,z_cra) Mapping the mean value coordinates of the key points of the human body to the values of the points in the crane coordinate system;

The invention also provides a terminal for realizing the method for detecting the illegal behavior of the person standing under the suspension arm, which comprises the following steps: the memory is used for storing a computer program and a method for detecting illegal behaviors of a person standing off the boom; and the processor is used for executing the computer program and the illegal behavior detection method for the person standing under the suspension arm so as to realize the steps of the illegal behavior detection method for the person standing under the suspension arm.

The terminal may be implemented in various forms. For example, the terminal described in the embodiments of the present invention may include a mobile terminal such as a mobile phone, a smart phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), and the like, and a fixed terminal such as a Digital TV, a desktop computer, and the like. In the following, it is assumed that the terminal is a mobile terminal. However, it will be understood by those skilled in the art that the configuration according to the embodiment of the present invention can be applied to a fixed type terminal in addition to elements particularly used for moving purposes.

The method for detecting an illegal action of a person standing under a boom is implemented by combining the units and algorithm steps of each example described in the embodiments disclosed herein, and can be realized by electronic hardware, computer software or a combination of the two. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for detecting violation behaviors of people standing under a suspension arm is characterized by comprising the following steps:

step six: inputting the coordinate values of the key points of the human body in the pixel coordinate system of the pictures shot by the left camera and the right camera obtained in the fourth step into the first stepIn the binocular vision positioning module, coordinate values of key points of the human body in a world coordinate system are obtained, and mean value coordinates P (x) of the key points are calculated₁,y₁,z₁)；

2. The method for detecting violation behaviors of a person standing off a boom according to claim 1,

the first step further comprises the following steps:

establishing a coordinate system;

the corresponding homogeneous expression is:

where R is a 3 × 3 orthogonal identity matrix, the translation vector t is a 3 × 1 dimensional vector, R, t is camera independent, two parameters are camera extrinsic parameters, X_w、Y_w、Z_wIs a certain point, X, in the world coordinate system_c、Y_c、Z_cMapping a next point in the world coordinate system to a point in the camera coordinate system;

X_c＝[X_c Y_c Z_c 1]^T，

m＝[x y 1]^T

the matrix representation is used as:

3. the method for detecting violation of a person standing under a boom of claim 1 or 2, wherein,

transforming the image coordinate system to a final pixel coordinate system to obtain a conversion mode between two-dimensional coordinates;

establishing an image coordinate system with a physical unit as an index;

is provided with

Wherein f is_xAnd f_yCalled effective focal length, in pixels; the expression can be expressed as:

wherein, the matrix M₁Representing camera intrinsic parameters; matrix M₂Representing internal parameters of the camera, related to the position of the camera and the setting of world coordinates, including rotationsAnd the transfer matrix R and the translation matrix t describe the mode of the pose change of the camera.

4. The method for detecting violation of a person standing under a boom of claim 1, wherein,

in the third step, the concrete training process of the crane key point detection model is as follows:

5. The method for detecting the illegal behavior of the person standing under the suspension arm according to claim 1, wherein in the fourth step, the number of key points of the human body in the detected image is 17; the specific training process of the human body key point detection model is as follows:

6. The method for detecting the illegal behavior of the person standing under the suspension arm as claimed in claim 1, wherein in the ninth step, when the crane key point is not detected by the crane key point detection model, the output result of the compliance judgment is normal;

7. The method for detecting the illegal behavior of the person standing under the suspension arm as claimed in claim 1, wherein in the ninth step, the set threshold value a is used for judging the regularity;

when the crane key point detection model detects a crane key point, the human body key point detection model detects a human body key point, and the mean value P of the human body key point is far away from the approaching plane P of the crane key point₁Distance d of₁Is less than the set threshold value a, and z in step eight_craWhen the value is less than 0, the compliance judgment output result is "normal".

8. The method for detecting the violation behaviors of the person standing off the boom according to claim 7, wherein in the ninth step,

when the crane key point detection model detects the crane key points, the human body key point detection model detects the human body key pointsThe mean value P is from the approaching plane P of the key point of the crane₁Distance d of₁Is less than the set threshold value a, and z in step eight_craIf the output value is greater than 0, the compliance judgment output result is 'violation'.

9. A system for detecting violation behaviors of a person standing under a suspension arm is characterized in that the system adopts the violation behavior detection method of the person standing under the suspension arm as claimed in any one of claims 1 to 8;

the system comprises: the system comprises two cameras, an image acquisition and decoding module, a coordinate value analysis module, a human body key point detection module, a crane coordinate system establishing module, a key point mean coordinate calculation module, a distance calculation module, a key point position calculation module and a compliance judgment module;

10. A terminal for realizing a method for detecting illegal behaviors of people standing under a suspension arm is characterized by comprising the following steps:

a processor for executing the computer program and the method for detecting an illegal behavior of a person standing under the boom to realize the steps of the method for detecting an illegal behavior of a person standing under the boom according to any one of claims 1 to 8.