CN111798516B

CN111798516B - Method for detecting running state quantity and analyzing errors of bridge crane equipment

Info

Publication number: CN111798516B
Application number: CN202010618093.XA
Authority: CN
Inventors: 梁敏健; 刘桂雄; 杨宁祥; 戚政武; 苏宇航; 陈英红; 杨帆; 李继承
Original assignee: South China University of Technology SCUT; Guangdong Inspection and Research Institute of Special Equipment Zhuhai Inspection Institute
Current assignee: South China University of Technology SCUT; Guangdong Inspection and Research Institute of Special Equipment Zhuhai Inspection Institute
Priority date: 2020-07-01
Filing date: 2020-07-01
Publication date: 2023-12-22
Anticipated expiration: 2040-07-01
Also published as: CN111798516A

Abstract

The invention discloses a method for detection and error analysis of operating status quantities of bridge crane equipment, which includes: using the deep learning model Mask R-CNN to predict the positioning frame of the crane component in each frame of the camera, the instance segmentation of the component area, and the key components of the component. points and key points of natural calibration objects; use the key points of natural calibration objects on the crane to establish a world coordinate system, and use the pixel coordinates of the corresponding positions identified by Mask R-CNN as the input value to solve the PnP problem, and solve the single unit to obtain the coordinate transformation. The response matrix is to obtain the coordinate transformation matrix from the camera coordinates of the current frame to the world coordinates; combined with the coordinate transformation matrix and the key position pixel coordinates predicted by Mask R-CNN, the camera projection transformation properties are used to obtain the key point world corresponding to the pixel coordinates. coordinates; based on the obtained world coordinates of key points, calculate the operating state quantity of the crane; use the method of differential approximation error to analyze the calculation error of obtaining the world coordinates of key points, and obtain the analytical expression of the calculation error.

Description

A method for detection and error analysis of operating status quantities of bridge crane equipment

技术领域Technical field

本发明涉及模式识别技术领域，尤其涉及桥式起重机基于单目视觉的运行状态量识别方法。The present invention relates to the technical field of pattern recognition, and in particular to a method for identifying operating status quantities of an overhead crane based on monocular vision.

背景技术Background technique

现有桥式起重机运行状态量测量技术大多依赖各类传感器关于状态量的直接测量，此类传感器需要安装于起重机的关键位置，进行直接的测量，这造成拆装的非便利性，造成了多余的安装成本以及降低了传感器在其他起重机上的复用性。本发明提出的一种桥式起重机设备运行状态量的检测方法，结合了深度学习算法与单目视觉的相机位姿PnP问题求解方法，充分利用深度学习算法对图像的高级语义理解能力、高鲁棒性与高泛化性，实现了仅利用单目视觉，便可求出桥式起重机的多种运行状态量。相比传统安装大量传感器的测量方案，此方法仅需单目相机从而节省了设备成本，且利用了深度学习的高泛化性，因而对相机摆放位置无需做特定要求，提升了测量装置的易用性。Existing operating state quantity measurement technologies for bridge cranes mostly rely on direct measurement of state quantities by various sensors. Such sensors need to be installed at key positions of the crane for direct measurement, which results in inconvenience of disassembly and assembly and redundant installation costs and reduces the reusability of the sensor on other cranes. The invention proposes a method for detecting operating status quantities of bridge crane equipment, which combines a deep learning algorithm with a monocular vision camera pose PnP problem solving method, making full use of the deep learning algorithm's advanced semantic understanding of images and Gao Lu's capabilities. With its strong properties and high generalization, it is possible to obtain various operating status quantities of the bridge crane using only monocular vision. Compared with traditional measurement solutions that install a large number of sensors, this method only requires a monocular camera, thus saving equipment costs. It also takes advantage of the high generalization of deep learning, so there is no need to make specific requirements for camera placement, which improves the efficiency of the measurement device. Ease of use.

发明内容Contents of the invention

为解决上述技术问题，本发明的目的是提供一种桥式起重机的运行状态量的模式测量及误差分析方法。In order to solve the above technical problems, the purpose of the present invention is to provide a method for pattern measurement and error analysis of operating state quantities of an overhead crane.

本发明的目的通过以下的技术方案来实现：The purpose of the present invention is achieved through the following technical solutions:

一种桥式起重机设备运行状态量的检测及误差分析方法，包括：A method for detecting operating status quantities of bridge crane equipment and error analysis, including:

A利用深度学习模型Mask R-CNN预测摄像头每帧画面中起重机部件的定位框、部件区域的实例分割以及部件关键点和自然标定物关键点；A uses the deep learning model Mask R-CNN to predict the positioning frame of the crane component in each frame of the camera, the instance segmentation of the component area, and the key points of the component and the key points of the natural calibration object;

B利用起重机上的自然标定物关键点建立世界坐标系，并与Mask R-CNN识别出的对应位置的像素坐标作为求解PnP问题的输入值，求解获得坐标变换的单应性矩阵，即得到当前帧的相机坐标到世界坐标的坐标变换矩阵；B uses the key points of natural calibration objects on the crane to establish a world coordinate system, and uses the pixel coordinates of the corresponding positions identified by Mask R-CNN as the input value to solve the PnP problem, and solves to obtain the homography matrix of the coordinate transformation, that is, the current The coordinate transformation matrix from the camera coordinates of the frame to the world coordinates;

C结合坐标变换矩阵，及Mask R-CNN预测的关键位置像素坐标，利用相机投影变换性质求出像素坐标对应的关键点世界坐标；C combines the coordinate transformation matrix and the key position pixel coordinates predicted by Mask R-CNN, and uses the camera projection transformation properties to obtain the key point world coordinates corresponding to the pixel coordinates;

D依据求出的关键点世界坐标，计算起重机运行状态量；D calculates the crane operating status quantity based on the obtained world coordinates of the key points;

E利用微分近似误差的方法，分析求取关键点世界坐标的计算误差，得出了计算误差的解析表达式。E uses the method of differential approximation error to analyze the calculation error of obtaining the world coordinates of key points, and obtains the analytical expression of the calculation error.

与现有技术相比，本发明的一个或多个实施例可以具有如下优点：Compared with the prior art, one or more embodiments of the present invention may have the following advantages:

相比传统安装大量传感器的测量方案，此方法仅需单目相机从而节省了设备成本，且利用了深度学习的高泛化性，因而对相机摆放位置无需做特定要求，提升了测量装置的易用性。Compared with traditional measurement solutions that install a large number of sensors, this method only requires a monocular camera, thus saving equipment costs. It also takes advantage of the high generalization of deep learning, so there is no need to make specific requirements for camera placement, which improves the efficiency of the measurement device. Ease of use.

附图说明Description of the drawings

图1是桥式起重机设备运行状态量的检测及误差分析方法流程框图。Figure 1 is a flow chart of the detection and error analysis method of operating status quantities of bridge crane equipment.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合实施例及附图对本发明作进一步详细的描述。In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail below with reference to the embodiments and drawings.

如图1所示，为桥式起重机设备运行状态量的检测及误差分析方法流程，包括样本集制作与算法训练阶段；相机位姿实时计算阶段；关键点世界坐标计算阶段；运行状态量计算阶段；误差计算阶段；具体包括如下步骤：As shown in Figure 1, it is the detection and error analysis method flow of operating status quantities of bridge crane equipment, including the sample set production and algorithm training stages; the real-time calculation stage of camera pose; the key point world coordinate calculation stage; and the operating status quantity calculation stage. ;Error calculation stage; specifically includes the following steps:

步骤10利用深度学习模型Mask R-CNN预测摄像头每帧画面中起重机部件的定位框、部件区域的实例分割以及部件关键点和自然标定物关键点；Step 10 uses the deep learning model Mask R-CNN to predict the positioning frame of the crane component in each frame of the camera, the instance segmentation of the component area, and the key points of the component and the key points of the natural calibration object;

步骤20利用起重机上的自然标定物关键点建立世界坐标系，并与Mask R-CNN识别出的对应位置的像素坐标作为求解PnP问题的输入值，求解获得坐标变换的单应性矩阵，即得到当前帧的相机坐标到世界坐标的坐标变换矩阵；Step 20 uses the key points of the natural calibration objects on the crane to establish a world coordinate system, and uses the pixel coordinates of the corresponding positions identified by Mask R-CNN as the input value to solve the PnP problem, and solves to obtain the homography matrix of the coordinate transformation, that is, we get The coordinate transformation matrix from the camera coordinates of the current frame to the world coordinates;

步骤30结合坐标变换矩阵，及Mask R-CNN预测的关键位置像素坐标，利用相机投影变换性质求出像素坐标对应的世界坐标；Step 30 combines the coordinate transformation matrix and the key position pixel coordinates predicted by Mask R-CNN, and uses the camera projection transformation properties to find the world coordinates corresponding to the pixel coordinates;

步骤40依据求出的关键点世界坐标，计算起重机运行状态量，如起重机吊具的倾斜角度、吊绳的倾斜角度、小车位置、大车位置；Step 40: Calculate the operating state quantities of the crane based on the obtained world coordinates of the key points, such as the inclination angle of the crane spreader, the inclination angle of the sling, the position of the trolley, and the position of the trolley;

步骤50利用微分近似误差的方法，分析了求取关键点世界坐标的计算误差，得出了计算误差的解析表达式。Step 50 uses the differential approximation error method to analyze the calculation error in obtaining the world coordinates of the key points, and obtains the analytical expression of the calculation error.

上述步骤10具体包括：The above step 10 specifically includes:

制作起重机重要位置关键点、部件定位框及部件区域实例分割的小样本数据集。其中关键点包含，起重机上相对位置恒定不变，且容易辨识的三个或以上的点，并测量这些点的世界坐标以作为自然标定物。利用制作好的小样本数据集对Mask R-CNN进行迁移学习。Mask R-CNN的骨干网络选择为在ImageNet数据集上预训练好的ResNet-50，结合特征金字塔(Feature Pyramid Networks,FPN)结构以充分提取图像不同分辨率下的特征信息。ResNet-50FPN骨干网络输出的特征图与特征图上的锚点框(anchor)共同作为区域提供网络层(Region Proposal Network,RPN)的输入，该层通过softmax判断锚点属于正类(区域包含目标对象)还是负类(区域不包含目标对象)，再利用回归锚点框顶点坐标误差，以获得精确的回归框，其中锚点框顶点回归的损失函数见下式，公式中φ(Aⁱ)是锚点框区域特征图组成的特征向量，W^T _*是需要学习的参数，为真实的锚点框顶点坐标值：Create a small sample data set of important crane position key points, component positioning frames and component area instance segmentation. The key points include three or more points on the crane whose relative positions are constant and easy to identify, and the world coordinates of these points are measured as natural calibration objects. Use the prepared small sample data set to perform transfer learning on Mask R-CNN. The backbone network of Mask R-CNN is ResNet-50 pre-trained on the ImageNet data set, combined with the Feature Pyramid Networks (FPN) structure to fully extract feature information at different resolutions of the image. The feature map output by the ResNet-50FPN backbone network and the anchor box (anchor) on the feature map together serve as the input of the Region Proposal Network (RPN) layer. This layer uses softmax to determine that the anchor point belongs to the positive class (the region contains the target object) or a negative class (the area does not contain the target object), and then use the vertex coordinate error of the regression anchor box to obtain an accurate regression box. The loss function of the anchor point box vertex regression is shown in the following formula, in the formula φ (A ⁱ ) is the feature vector composed of the anchor box area feature map, W ^T _* is the parameter that needs to be learned, is the real anchor point box vertex coordinate value:

回归框部分的特征图，通过RoIAlign获得固定大小为7×7大小的特征图，此特征图用作后续步骤中定位框回归与分类分支、实例分割预测分支与关键点预测分支的输入。其中，RoIAlign层反向公式如下，x_i代表池化前特征图上的像素点，y_rj代表池化后的第r个候选区域的第j个点，d(.)表示两点之间的距离，i^*(r,j)代表最大池化时选出的最大像素值所在点的坐标，Δh和Δw表示x_i与i^*(r,j)横纵坐标的差值：For the feature map of the regression frame part, a feature map with a fixed size of 7×7 is obtained through RoIAlign. This feature map is used as the input of the positioning frame regression and classification branch, the instance segmentation prediction branch and the key point prediction branch in subsequent steps. Among them, the reverse formula of the RoIAlign layer is as follows, x _i represents the pixel point on the feature map before pooling, y _rj represents the j-th point of the r-th candidate area after pooling, and d(.) represents the distance between the two points. Distance, i ^* (r, j) represents the coordinates of the point where the maximum pixel value is selected during maximum pooling, Δh and Δw represent the difference between the horizontal and vertical coordinates of x _i and i ^* (r, j):

定位框回归与分类分支，由两路全连接层构成，一路输出8维向量分别表示定位框每个顶点坐标分量的偏移量；另一路输出向量维度为类别个数，用于预测定位框中对象的类别。用于回归偏移量的一路子分支其损失函数L_box为真实值与预测值的最小二乘误差，形式与锚点框顶点回归的损失函数一样；分类子分支损失函数采用下式交叉熵损失函数L_cls：The positioning box regression and classification branch consists of two fully connected layers. One outputs an 8-dimensional vector that represents the offset of each vertex coordinate component of the positioning box; the other output vector dimension is the number of categories, which is used to predict the positioning box. The category of the object. The loss function L _box of the sub-branch used to regress the offset is the least squares error between the real value and the predicted value, and the form is the same as the loss function of the anchor box vertex regression; the classification sub-branch loss function uses the following cross-entropy loss Function L _cls :

其中p_out表示分类子分支的输出，是关于网络权重的函数；N_ap表示样本个数；p_n表示真实的分类标签。Among them, p _out represents the output of the classification sub-branch, which is a function of the network weight; N _ap represents the number of samples; p _n represents the real classification label.

实例分割分支与关键点预测分支，均由逆卷积层构成，输出与原图相对应但比例经过缩放的空间概率分布图，表示每个位置为某种类别的概率；其损失函数L_mask和L_point同为交叉熵损失函数，形式与分类子分支损失函数一致。The instance segmentation branch and the key point prediction branch are both composed of inverse convolution layers. They output a spatial probability distribution map that corresponds to the original image but has been scaled, indicating the probability that each position is a certain category; its loss function L _mask and L _point is also a cross-entropy loss function, and its form is consistent with the classification sub-branch loss function.

整个Mask R-CNN网络的总损失函数为L＝L_box+L_cls+L_mask+L_point，在标记的桥式起重机小样本数据集上，利用Adam或SGD等基于梯度下降算法的优化器最小化此总损失，得到最优的Mask R-CNN网络模型参数，完成模型的训练。The total loss function of the entire Mask R-CNN network is L=L _box +L _cls +L _mask +L _point . On the labeled bridge crane small sample data set, the optimizer based on the gradient descent algorithm such as Adam or SGD is minimized. This total loss is transformed to obtain the optimal Mask R-CNN network model parameters and complete the model training.

上述步骤20具体包括：The above step 20 specifically includes:

Mask R-CNN预测的充当自然标定物的关键点像素坐标与对应的世界坐标存在下述关系：The pixel coordinates of key points predicted by Mask R-CNN and used as natural calibration objects have the following relationship with the corresponding world coordinates:

其中(w,h,1)表示预测点像素坐标，d_x、d_y分别表示相机传感器x轴与y轴的像素当量，f表示相机像方焦距，(w₀,h₀)为成像中心像素坐标，为相机坐标系关于世界坐标系的变换矩阵，z_c表示相机坐标系原点在世界坐标系中的z轴分量。Among them (w, h, 1) represents the pixel coordinates of the predicted point, d _x and d _y represent the pixel equivalents of the x-axis and y-axis of the camera sensor respectively, f represents the focal length of the camera image side, (w ₀ , h ₀ ) is the imaging center pixel coordinate, is the transformation matrix of the camera coordinate system with respect to the world coordinate system, z _c represents the z-axis component of the origin of the camera coordinate system in the world coordinate system.

根据上述关系式，当预测关键点n≥3时，便可通过求解上述等式右侧的单应性矩阵，获得已标定相机任一时刻相对于世界坐标的变换矩阵，即完成了一次PnP问题的求解。上述方程的求解，我们采用OpenCV提供的原生API solvePnPRansac，当选取的自然标定物坐标点数远大于3时，此求解器对噪声有较强的鲁棒性。According to the above relationship, when the key point n ≥ 3 is predicted, the transformation matrix of the calibrated camera relative to the world coordinates at any time can be obtained by solving the homography matrix on the right side of the above equation, thus completing a PnP problem. solution. To solve the above equation, we use the native API solvePnPRansac provided by OpenCV. When the number of selected natural calibration object coordinate points is much greater than 3, this solver has strong robustness to noise.

上述步骤30具体包括：The above step 30 specifically includes:

在求出的当前时刻相机变换矩阵的基础上，利用相机投影成像规律，可以反推出任意在世界坐标系中的运动曲面参数已知的关键点的世界坐标，此时这一关键点只有两个运动自由度。任意二自由度关键点其像素坐标与世界坐标存在如下关系，为简洁起见不妨假设运动曲面为一平面：On the basis of the calculated camera transformation matrix at the current moment, using the camera projection imaging law, the world coordinates of any key point with known motion surface parameters in the world coordinate system can be deduced. At this time, there are only two key points. Freedom of movement. The pixel coordinates of any two-degree-of-freedom key point have the following relationship with the world coordinates. For the sake of simplicity, it may be assumed that the motion surface is a plane:

公式中，代表像素坐标，/>表示对应世界坐标，/>表示关键点在世界坐标系下运动平面的法向量，h则表示此平面方程截距。上述公式表达了由/>计算/>的整个过程。MaskR-CNN预测出的起重机关键点位置包括：吊绳上下端点与中点、吊具上下端点、小车轮廓多边形顶点，这些关键点的预测像素坐标结合这些点在起重机上的运动平面在世界坐标系中的参数，带入上述系列等式即可求得这些关键点对应的任意时刻的世界坐标。formula, Represents pixel coordinates,/> Represents the corresponding world coordinates,/> Represents the normal vector of the movement plane of the key point in the world coordinate system, and h represents the intercept of the equation of this plane. The above formula expresses the /> Calculate/> the whole process. The key point positions of the crane predicted by MaskR-CNN include: the upper and lower end points and midpoints of the sling, the upper and lower end points of the spreader, and the vertices of the trolley outline polygon. The predicted pixel coordinates of these key points are combined with the motion plane of these points on the crane in the world coordinate system. By bringing the parameters in into the above series of equations, the world coordinates at any time corresponding to these key points can be obtained.

上述步骤40具体包括：The above step 40 specifically includes:

在求出的关键点世界坐标基础上计算桥式起重机的运行状态量：小车位置、小车速度、大车位置、大车速度、吊具倾斜度、吊绳倾斜度。Based on the obtained world coordinates of the key points, the operating state quantities of the bridge crane are calculated: trolley position, trolley speed, trolley position, trolley speed, spreader inclination, and sling inclination.

小车位置，由其轮廓顶点的平均点的y轴表示，速度则对y轴取差分获得：v_sc＝FPS×Δy，其中FPS表示帧率，由相机采样率、算法计算速度决定。当轮廓顶点缺失预测试值，或者预测误差较大时，则用小车区域Mask R-CNN实例分割结果的质心世界坐标的y轴分量y_m来代替计算。The position of the car is represented by the y-axis of the average point of its outline vertices, and the speed is obtained by taking the difference on the y-axis: v _sc = FPS × Δy, where FPS represents the frame rate, which is determined by the camera sampling rate and algorithm calculation speed. When the predicted test value is missing for the contour vertex, or the prediction error is large, the y-axis component y _m of the centroid world coordinate of the car area Mask R-CNN instance segmentation result is used instead for calculation.

大车位置由平移变换向量T的z分量表示，运行速度计算方式为：v_bc＝FPS×Δz。The position of the cart is represented by the z component of the translation transformation vector T, and the running speed is calculated as: v _bc = FPS × Δz.

吊具、吊绳倾斜角度计算方式为：The calculation method for the inclination angle of spreaders and lifting ropes is:

其中，表示吊绳所在直线上的一系列预测关键点，或者吊具中线上的系列关键点。||.||表示向量范数，(.)_x、(.)_y分别表示向量x与y分量。in, Represents a series of predicted key points on the straight line where the sling is located, or a series of key points on the center line of the spreader. ||.|| represents the vector norm, (.) _x and (.) _y represent the vector x and y components respectively.

上述步骤50具体包括：The above step 50 specifically includes:

现分析由像素坐标求取二自由度关键点对应世界坐标的误差，认为在输入误差较小时，输出误差可以由函数微分值近似表示：Now we analyze the error corresponding to the world coordinates of the two-degree-of-freedom key points calculated from the pixel coordinates. It is believed that when the input error is small, the output error can be approximately represented by the function differential value:

其中，dh为关键点运动平面参数的测量误差；dR、/>为相机位姿计算误差，solvePnPRansac等求解器会给出相应的算法求解误差可作为位姿计算误差取值；/>为相机内参标定误差；/>为关键点的预测误差，取值可为Mask R-CNN关于关键点的预测概率。in, dh is the measurement error of the key point motion plane parameters; dR, /> To calculate the camera pose calculation error, solvers such as solvePnPRansac will give corresponding algorithm solution errors that can be used as pose calculation error values;/> Calibration error for camera internal parameters;/> is the prediction error of the key point, and the value can be the prediction probability of the key point by Mask R-CNN.

虽然本发明所揭露的实施方式如上，但所述的内容只是为了便于理解本发明而采用的实施方式，并非用以限定本发明。任何本发明所属技术领域内的技术人员，在不脱离本发明所揭露的精神和范围的前提下，可以在实施的形式上及细节上作任何的修改与变化，但本发明的专利保护范围，仍须以所附的权利要求书所界定的范围为准。Although the embodiments disclosed in the present invention are as above, the described contents are only used to facilitate the understanding of the present invention and are not intended to limit the present invention. Any person skilled in the technical field to which the present invention belongs may make any modifications and changes in the form and details of the implementation without departing from the spirit and scope disclosed by the present invention. However, the patent protection scope of the present invention shall not The scope defined by the appended claims shall prevail.

Claims

1. The method for detecting and analyzing the running state quantity of the bridge crane equipment is characterized by comprising the following steps:

a, predicting a positioning frame of a crane component, an example segmentation of a component area and a component key point and a natural calibration object key point in each frame of picture of a camera by using a deep learning model Mask R-CNN;

b, establishing a world coordinate system by utilizing natural calibration object key points on the crane, and taking pixel coordinates of corresponding positions identified by Mask R-CNN as input values for solving a PnP problem, and solving a homography matrix for obtaining coordinate transformation, namely obtaining a coordinate transformation matrix from camera coordinates of a current frame to world coordinates;

c, combining the coordinate transformation matrix and the key position pixel coordinates predicted by Mask R-CNN, and solving the world coordinates of the key points corresponding to the pixel coordinates by utilizing the projection transformation property of the camera;

d, calculating the crane running state quantity according to the calculated world coordinates of the key points;

and E, analyzing and solving the calculation error of the world coordinates of the key points by using a differential approximation error method, and obtaining an analysis expression of the calculation error.

2. The method for detecting and analyzing the operation state quantity and the error of the bridge crane equipment according to claim 1, wherein the step a specifically comprises:

manufacturing a small sample data set of important position key points, component positioning frames and component area example segmentation of a crane; wherein the key points comprise: under the condition that the relative position on the crane is constant, three or more points which are easy to identify are detected, and world coordinates of the points are measured to be used as natural calibration objects;

performing migration learning on Mask R-CNN by using the prepared small sample data set; the backbone network of Mask R-CNN is selected as ResNet-50 pre-trained on an ImageNet data set, and feature information of images under different resolutions is extracted by combining a feature pyramid structure; the feature map output by ResNet-50FPN backbone network and the anchor blocks on the feature map together serve as inputs to the region providing network layer through softmax judges whether the anchor point belongs to positive class or negative class, and then the coordinate error of the vertex of the regression anchor point frame is utilized to obtain an accurate regression frame, wherein the loss function of the vertex regression of the anchor point frame is shown as the following formula, phi (A) ⁱ ) Is the characteristic vector composed of the characteristic map of the anchor point frame region, W ^T _* Is a parameter that needs to be learned and,the vertex coordinate value of the true anchor point frame is as follows:

obtaining a feature map with a fixed size of 7 multiplied by 7 through RoIAlign by using the feature map of the regression frame part as input of a positioning frame regression and classification branch, an example segmentation prediction branch and a key point prediction branch; wherein the roualign layer inversion formula is as follows:

wherein x is _i Representing pixel points on the characteristic diagram before pooling, y _rj The j-th point representing the pooled r-th candidate region, d () represents the distance between the two points, i ^* (r, j) represents the coordinates of the point where the maximum pixel value selected at the time of maximum pooling is located, and Δh and Δw represent x _i And i ^* (r, j) difference in abscissa and ordinate.

3. The method for detecting and analyzing the operation state quantity of the bridge crane equipment according to claim 2, wherein,

the regression and classification branch of the positioning frame consists of two paths of full-connection layers, and one path of output 8-dimensional vector respectively represents the offset of each vertex coordinate component of the positioning frame; the other path of output vector dimension is the number of categories and is used for predicting the categories of the objects in the positioning frame; one-way sub-division for regression offsetIts loss function L _box The least square error of the true value and the predicted value is the same as the loss function of the peak regression of the anchor point frame; the classifying sub-branch loss function adopts the following cross entropy loss function L _cls ：

Wherein p is _out The output representing the classification sub-branch is a function of the network weight; n (N) _ap Representing the number of samples; p is p _n Representing a true class label.

4. The method for detecting and analyzing the operation state quantity of the bridge crane equipment according to claim 2, wherein,

the example segmentation branch and the key point prediction branch are both composed of deconvolution layers, and a space probability distribution diagram corresponding to the original image but scaled is output to represent the probability that each position is of a certain class; loss function L _mask And L _point The cross entropy loss function is the same as the classifying sub-branch loss function in form;

the total loss function of the whole Mask R-CNN network is L=L _box +L _cls +L _mask +L _point And on the marked bridge crane small sample data set, minimizing the total loss function by using an optimizer of Adam based on a gradient descent algorithm to obtain optimal Mask R-CNN network model parameters, and completing training of the model.

5. The method for detecting and analyzing the operation state quantity of the bridge crane according to claim 1, wherein in the step B:

the key point pixel coordinates predicted by Mask R-CNN and serving as natural calibration objects have the following relation with corresponding world coordinates:

wherein (w, H, 1) represents the predicted point pixel coordinates, d _x 、d _y Pixel equivalent of x-axis and y-axis of camera sensor, respectively, f represents camera image Fang Jiaoju, (w) ₀ ,H ₀ ) For the imaging center pixel coordinates,z is a transformation matrix of the camera coordinate system with respect to the world coordinate system _c Representing a z-axis component of the camera coordinate system origin in the world coordinate system;

according to the relation, when the predicted key point n is more than or equal to 3, the homography matrix on the right side of the equation can be solved to obtain the transformation matrix of the calibrated camera relative to the world coordinates at any moment, and then the solution of the PnP problem is completed once.

6. The method for detecting and analyzing the operation state quantity and the error of the bridge crane equipment according to claim 1, wherein the step C specifically comprises:

based on the obtained camera transformation matrix at the current moment, the world coordinates of key points with known motion curved surface parameters in a world coordinate system are reversely deduced by utilizing a camera projection imaging rule, and at the moment, the key points have only two motion degrees of freedom; the pixel coordinates of any two-degree-of-freedom key point and world coordinates have the following relation, and for brevity, it is not necessary to assume that the motion curved surface is a plane:

the above formula expresses thatCalculate->In the formula, ++>Representing pixel coordinates, +.>Representing corresponding world coordinates>The normal vector of the motion plane of the key point under the world coordinate system is represented, and H represents the intercept of the plane equation;

the predicted crane key point position of the Mask R-CNN comprises: the world coordinates of any moment corresponding to the key points can be obtained by combining the predicted pixel coordinates of the key points with the parameters of the motion planes of the points on the crane in the world coordinate system and taking the parameters into the series of equations.

7. The method for detecting and analyzing the operation state quantity and the error of the bridge crane equipment according to claim 1, wherein the step D specifically comprises:

calculating the running state quantity of the bridge crane on the basis of the calculated world coordinates of the key points, wherein the crane running state quantity comprises a trolley position, a trolley speed, a lifting appliance gradient and a lifting rope gradient;

the trolley position, represented by the y-axis of the average point of its profile vertices, the velocity is then obtained by differentiating the y-axis: v _sc =fps×Δy, where FPS represents the frame rate, determined by the camera sampling rate, the algorithm calculation speed; when the outline vertex lacks a pre-test value or the prediction error is larger, the y-axis component y of the centroid world coordinate of the result is segmented by using the trolley region Mask R-CNN example _m Instead of calculation;

the cart position is represented by the z component of the translation transformation vector T, and the running speed is calculated by: v _bc ＝FPS×Δz；

The calculation mode of the inclination angle of the lifting appliance and the lifting rope is as follows:

wherein,representing a series of predicted key points on a straight line where the lifting rope is located or a series of key points on a central line of the lifting appliance; i denote vector norms () _x 、(.) _y Representing the x and y components of the vector, respectively.

8. The method for detecting and analyzing the operation state quantity and the error of the bridge crane equipment according to claim 1, wherein the step E specifically comprises:

analyzing the error of the world coordinates corresponding to the two-degree-of-freedom key points obtained by the pixel coordinates, and considering that when the input error is small, the output error can be approximately represented by a function differential value:

wherein,dh is the measurement error of the motion plane parameters of the key points; dR, & lt + & gt>For the pose calculation errors of the camera, solvers such as a solvePnPRansac and the like give corresponding algorithm solving errors which can be used as pose calculation errors to take values; />Calibrating errors for internal parameters of the camera; />The prediction error of the key point can be the prediction probability of Mask R-CNN on the key point.