CN116385480B

CN116385480B - Detection method and system for moving object below tower crane

Info

Publication number: CN116385480B
Application number: CN202310053361.1A
Authority: CN
Inventors: 葛晓东; 米文忠; 姜贺; 房新奥; 郭振威
Original assignee: Guangdong Light Speed Intelligent Equipment Co ltd; Tenghui Technology Building Intelligence Shenzhen Co ltd
Current assignee: Guangdong Light Speed Intelligent Equipment Co ltd; Tenghui Technology Building Intelligence Shenzhen Co ltd
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-10-20
Anticipated expiration: 2043-02-03
Also published as: CN116385480A

Abstract

The invention discloses a detection method and a detection system for a moving object below a tower crane, wherein the detection method comprises the following steps: extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted; establishing motion estimation based on two-dimensional affine transformation to obtain affine transformation relation of two adjacent frames of images, and further obtaining similarity; the method comprises the steps of reserving a moving target pixel with similarity lower than a threshold as a candidate moving target pixel, performing European spatial clustering to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box for each clustering result; according to affine transformation relation of two adjacent frames of images, finding out characteristic point pairs which do not meet affine transformation relation of two adjacent frames of images, and taking the characteristic point pairs as a matched outer point set; obtaining the number of outliers in each outer boundary frame according to the matched outlier set, and obtaining the outer boundary frame meeting the confidence requirement by using the number of outliers; and classifying the external boundary frame by using the trained visual neural network to realize the detection of the moving target.

Description

Detection method and system for moving object below tower crane

Technical Field

The invention relates to the technical field of target detection, in particular to a detection method and a detection system for a moving target below a tower crane.

Background

In the automatic driving process of the intelligent tower crane, the intelligent automatic driving device has important significance in detecting the moving object below the tower crane, and the moving object mainly comprises a person, a cargo truck, a bicycle, a tricycle and the like. The current detection method is mainly based on deep learning, and mainly adopts sensors such as images, laser radars and the like to collect data sources.

The detection method based on the image only mainly realizes the detection of the moving target based on the deep neural network, and the common detection method comprises two-stage FaterRCNN, single-stage SSD and single-stage YOLO. The backbone network selection is mainly based on the Darknet and MobileNet series. Because the construction scene complexity below the tower crane is higher, and because the camera mainly adopts top-down shooting, the condition that partial target information is blocked or shielded easily appears. It is therefore desirable to guarantee the expressive power of the entire network by employing a neural network backbone network with a high degree of complexity. However, such methods tend to result in high computational power and therefore generally require computation with a processor with dedicated AI acceleration hardware support.

The method based on the laser radar and the image mainly adopts a fusion detection method of two data. The method has a loose fusion mode, such as performing deep neural network target detection based on an image, finding out a corresponding laser radar point cloud according to a projection view cone of the image, and performing target classification judgment again according to the laser radar point cloud. The method has a compact fusion method, namely, the combination of the laser radar point cloud and the image is used as the input of the neural network, the deep neural network compatible with the laser radar point cloud and the image is directly trained, and the end-to-end target detection is completed.

However, both the above methods rely heavily on deep neural networks, whose reasoning requires high computational hardware. Problems are encountered in practical large scale deployments. The operation system of the intelligent tower crane is a large-scale system comprising a plurality of links such as sensing, decision making, planning, control, safety and the like, and has numerous sensors, large calculated amount and complex architecture. Heterogeneous computing mechanism support is often required, i.e., the computation of each sensor is done as much as possible at the terminal, rather than aggregating all the data into a central service node computation. In this case, facing the single task of moving object detection, if the computation is done by relying entirely on a neural network of higher complexity, it tends to result in high costs and high power consumption of the computing device at the edge.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a method and a system for detecting a moving object below a tower crane, which are used for solving the technical problems that the cost and the power consumption of computing equipment are high in the existing method for detecting the moving object below the tower crane, so that the purposes of accurately detecting the moving object below the tower crane and simultaneously reducing the cost and the power consumption of the computing equipment are achieved.

In order to solve the problems, the technical scheme adopted by the invention is as follows:

the detection method for the moving object below the tower crane comprises the following steps:

image acquisition is carried out on a moving object below the tower crane by utilizing an acquisition device arranged below the tower crane trolley;

extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted;

establishing motion estimation for the motion target image after the feature points are extracted based on two-dimensional affine transformation to obtain affine transformation relation of two adjacent frames of images;

obtaining the similarity of the moving target images of the adjacent frames by utilizing the affine transformation relation of the two adjacent frames, and reserving the moving target pixels with the similarity lower than a threshold value as candidate moving target pixels;

performing European spatial clustering on the candidate moving target pixels to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box aiming at each clustering result;

according to the affine transformation relation of the two adjacent frames of images, in all feature point matching results, finding out feature point pairs which do not meet the affine transformation relation of the two adjacent frames of images, and taking the feature point pairs as a matching outer point set;

obtaining the number of outliers in each outer boundary frame according to the matched outlier set, obtaining the confidence coefficient of the moving target of each outer boundary frame by using the number of outliers, and screening out the outer boundary frames meeting the confidence coefficient requirement;

and classifying the outer boundary box meeting the confidence coefficient requirement by utilizing a classification network of the visual neural network training target, and determining the type and the confidence coefficient of the moving target.

In a preferred embodiment of the present invention, when obtaining a moving object image from which feature points are extracted, the method includes:

establishing an image pyramid of the moving target image by Gaussian filtering;

extracting a small number of stable feature points from the top layer of the image pyramid by using a feature extraction algorithm to obtain a feature map;

searching local maximum points in the feature map, and reserving all the local maximum points obtained by searching according to a set percentage to serve as feature points;

acquiring a local direction gradient histogram of each feature point, and setting the maximum value of the local direction gradient histogram as the direction of the feature point;

extracting a large number of small-scale ORB characteristic points at the bottom layer of the image pyramid by using an ORB algorithm;

and after the feature point extraction and the small-scale ORB feature point extraction are completed on the moving target image, obtaining the moving target image after the feature point extraction.

As a preferred embodiment of the present invention, when extracting a large number of small-scale ORB feature points using an ORB algorithm, it includes:

setting an image small window, obtaining the accumulated quantity of the difference between the central point and the surrounding pixel points of the image small window, and constructing a small-scale ORB characteristic diagram according to the accumulated quantity of the difference;

searching a local maximum point in the small-scale ORB feature map as a small-scale ORB feature candidate point, and screening the small-scale ORB feature candidate point to obtain the small-scale ORB feature point;

and acquiring a centroid point of the image small window, connecting the centroid point with the center point, and taking the connecting line direction as the direction of the small-scale ORB characteristic point.

As a preferred embodiment of the present invention, when obtaining affine transformation relations of two adjacent frame images, it includes:

performing violent search based on the stable characteristic points to obtain a point pair with the minimum characteristic Euclidean distance as a rough matching point pair;

obtaining the difference of each small-scale ORB characteristic point according to the rough matching point pairs, and eliminating the small-scale ORB characteristic points with excessive difference;

and carrying out violent search again based on the reserved small-scale ORB characteristic points to obtain a point pair with the minimum characteristic Euclidean distance as a final matching result, and obtaining affine transformation relation of the two adjacent frames of images by utilizing the final matching result.

As a preferred embodiment of the present invention, when eliminating the feature points of the small-scale ORB with excessive variability, the method includes:

according to the rough matching point pairs, a two-dimensional affine transformation model of the moving image after the feature points are extracted is obtained, and small-scale ORB feature points of one image in the moving object image after the feature points are extracted are projected onto the other image by utilizing the two-dimensional affine transformation model, so that a corresponding projection result is obtained;

judging whether corresponding small-scale ORB feature points exist around projection according to the projection result, if not, judging that the small-scale ORB feature points do not have corresponding homonymous points, considering that the difference of the small-scale ORB feature points is too large, and eliminating the small-scale ORB feature points;

when obtaining the affine transformation relation of the two adjacent frames of images, the method comprises the following steps:

and carrying out least square solving on the two-dimensional affine transformation model of the moving image after the feature points are extracted by utilizing the final matching result, and obtaining the affine transformation relation of the two adjacent frames of images.

As a preferred embodiment of the present invention, when obtaining the similarity of moving object images of adjacent frames, it comprises:

acquiring all adjacent frame moving target images in the moving target images after the feature points are extracted, and taking a small image window for each pixel of one frame of moving target image in each group of adjacent frame moving target images; obtaining a small image window corresponding to another corresponding frame of moving target image according to the affine transformation relation of the two adjacent frames of images;

according to the small image window and the corresponding small image window, window image groups of each group of adjacent frame moving target images are obtained;

and acquiring the average value of the normalized correlation products of the three channels of the window image group R, G, B, and taking the average value of the normalized correlation products as the similarity.

As a preferred embodiment of the present invention, when obtaining a small image window corresponding to another frame of moving object image, it includes:

multiplying the center point of the small image window by an affine transformation matrix to obtain the center point of the small image window corresponding to another frame of moving target image;

and obtaining the corresponding small image window according to the same window size according to the center point of the corresponding small image window.

As a preferred embodiment of the present invention, when obtaining the average value of the normalized correlation product, it includes:

acquiring a gray average value of the window image group, and subtracting the gray average value corresponding to each pixel in the window image group from the gray average value of each pixel in the window image group to obtain a de-centralized window image group;

multiplying and summing corresponding pixels of the de-centralized window image group to obtain a correlation product, and dividing the correlation product by a two-norm analog value of the correlation product to obtain a normalized correlation product;

and obtaining the average value of the normalized correlation products of the three channels of the window image group R, G, B according to the normalized correlation products.

As a preferred embodiment of the present invention, when obtaining the moving object confidence of each of the outer bounding boxes, it includes:

the confidence that the outer bounding box belongs to a moving object is weighted by dividing the number of the outer points in the outer bounding box by a set value as a weight, so that the confidence of the outer point proportion is obtained;

acquiring the proportion of pixel points in the outer boundary frame to the small image window as similarity confidence;

and multiplying the outlier proportional confidence by the similarity confidence to obtain the moving target confidence of each external boundary box.

A detection system for a moving object below a tower crane, comprising:

feature extraction unit: the device is used for acquiring images of a moving object below the tower crane by using an acquisition device arranged below the tower crane trolley; extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted;

an outer bounding box acquisition unit: the method comprises the steps of establishing motion estimation for a moving target image after feature points are extracted based on two-dimensional affine transformation, and obtaining affine transformation relation of two adjacent frames of images; obtaining the similarity of the moving target images of the adjacent frames by utilizing the affine transformation relation of the two adjacent frames, and reserving the moving target pixels with the similarity lower than a threshold value as candidate moving target pixels; performing European spatial clustering on the candidate moving target pixels to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box aiming at each clustering result;

an outer bounding box screening unit: the method comprises the steps of finding out characteristic point pairs which do not meet affine transformation relation of two adjacent frame images in all characteristic point matching results according to affine transformation relation of the two adjacent frame images, and taking the characteristic point pairs as a matching outer point set; obtaining the number of outliers in each outer boundary frame according to the matched outlier set, obtaining the confidence coefficient of the moving target of each outer boundary frame by using the number of outliers, and screening out the outer boundary frames meeting the confidence coefficient requirement;

classification unit: and the classification network is used for training the target by utilizing the visual neural network, classifying the outer boundary box meeting the confidence coefficient requirement and determining the type and the confidence coefficient of the moving target.

Compared with the prior art, the invention has the beneficial effects that:

(1) In the lifting process of the tower crane, different characteristic point extraction methods are used on different scales, so that the algorithm speed is effectively improved and the subsequent matching precision is ensured;

(2) When the method is used for detecting the moving object below the tower crane, the external boundary box meeting the confidence coefficient requirement is obtained by fusion image sequence analysis and feature point matching, and then the trained visual neural network is used for classification, so that the moving object is accurately detected, and the method is different from the existing detection method which completely relies on the neural network to complete calculation, so that the cost and the power consumption of the computing equipment are effectively reduced.

The invention is described in further detail below with reference to the drawings and the detailed description.

Drawings

FIG. 1 is a schematic diagram of a deployment location of an acquisition device according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating steps of a method for detecting a moving object under a tower crane according to an embodiment of the present invention.

Reference numerals illustrate: 1. a suspension arm; 2. a tower body; 3. a tower crane trolley; 4. and a collecting device.

Detailed Description

The method for detecting the moving object below the tower crane provided by the invention, as shown in fig. 2, comprises the following steps:

step S1: image acquisition is carried out on a moving object below the tower crane by utilizing an acquisition device 4 arranged below the tower crane trolley 3;

step S2: extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted;

step S3: establishing motion estimation on the motion target image after the feature points are extracted based on two-dimensional affine transformation to obtain affine transformation relation of two adjacent frames of images;

step S4: obtaining the similarity of the moving target images of the adjacent frames by utilizing the affine transformation relation of the two adjacent frames, and reserving the moving target pixels with the similarity lower than a threshold value as candidate moving target pixels;

step S5: performing European spatial clustering on candidate moving target pixels to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box aiming at each clustering result;

step S6: according to affine transformation relation of two adjacent frames of images, in all feature point matching results, finding out feature point pairs which do not meet affine transformation relation of two adjacent frames of images, and taking the feature point pairs as a matching outer point set;

step S7: obtaining the number of outliers in each outer boundary frame according to the matched outlier set, obtaining the confidence coefficient of the moving target of each outer boundary frame by using the number of outliers, and screening out the outer boundary frames meeting the confidence coefficient requirement;

step S8: and classifying the outer boundary boxes meeting the confidence coefficient requirements by utilizing a classification network of the visual neural network training target, and determining the type and the confidence coefficient of the moving target.

Specifically, the acquisition device 4 adopted by the invention is a downward-looking camera, and as shown in fig. 1, the main components of the tower crane comprise a suspension arm 1, a tower body 2 and a tower crane trolley 3. The downward-looking camera is arranged below the crane trolley 3 and is perpendicular to the ground to shoot a moving object, so that the moving object below the crane is effectively subjected to image acquisition.

In step S2, when obtaining a moving object image from which feature points are extracted, the method includes:

establishing an image pyramid of the moving target image by Gaussian filtering;

extracting a small amount of stable feature points on the top layer of the image pyramid by using a feature extraction algorithm to obtain a feature map;

and after feature point extraction and small-scale ORB feature point extraction are completed on the moving target image, obtaining the moving target image after feature point extraction.

Further, the feature extraction algorithm adopted in the present invention is a SIFT algorithm.

Further, the percentage set in the present invention is the first twenty percent.

Specifically, the image pyramid is constructed by using a gaussian filter, the basic value of the standard deviation of the gaussian function is 1.6, and 6 times of filtering are calculated in total, and in the ith filtering, the standard deviation is 1.6 to the power of i. And extracting a small amount of stable feature points from the top layer of the image pyramid by using a SIFT algorithm, namely obtaining a feature map by using the difference value between the top layer and the secondary top layer of the image pyramid.

Further, when a large number of small-scale ORB feature points are extracted using the ORB algorithm, it includes:

setting an image small window, acquiring the accumulated quantity of differences between the central point and surrounding pixel points of the image small window, and constructing a small-scale ORB characteristic diagram according to the accumulated quantity of the differences;

searching local maximum points in the small-scale ORB feature map to serve as small-scale ORB feature candidate points, and screening the small-scale ORB feature candidate points to obtain small-scale ORB feature points;

and acquiring a centroid point of the small window of the image, connecting the centroid point with the center point, and taking the connecting line direction as the direction of the small-scale ORB characteristic point.

In the above step S3, when obtaining the affine transformation relationship of the adjacent two frame images, it includes:

and carrying out violent search again based on the reserved small-scale ORB characteristic points to obtain a point pair with the minimum characteristic Euclidean distance as a final matching result, and obtaining affine transformation relation of two adjacent frames of images by utilizing the final matching result.

Further, when eliminating the feature points of the small scale ORB with excessive variability, the method includes:

according to the rough matching point pairs, a two-dimensional affine transformation model of the moving image after the feature points are extracted is obtained, and small-scale ORB feature points of one image in the moving target image after the feature points are extracted are projected onto the other image by using the two-dimensional affine transformation model, so that a corresponding projection result is obtained;

and judging whether corresponding small-scale ORB feature points exist around the projection according to the projection result, if not, judging that the small-scale ORB feature points do not have corresponding homonymous points, considering that the difference of the small-scale ORB feature points is too large, and eliminating the small-scale ORB feature points.

Further, when obtaining affine transformation relations of two adjacent frame images, it includes:

and carrying out least square solution on the two-dimensional affine transformation model of the moving image after the feature points are adjacently extracted by utilizing the final matching result, and obtaining the affine transformation relation of the two adjacent frames of images.

In the above step S4, the moving object pixels having the similarity lower than 0.5 are reserved as candidate moving object pixels.

In the above step S4, when obtaining the similarity of the moving object images of the adjacent frames, it includes:

acquiring all adjacent frame moving target images in the moving target images after the feature points are extracted, and taking a small image window for each pixel of one frame of moving target image in each group of adjacent frame moving target images; obtaining a small image window corresponding to another corresponding frame of moving target image according to affine transformation relation of two adjacent frames of images;

according to the small image window and the corresponding small image window, obtaining a window image group of each group of adjacent frame moving target images;

the average value of the normalized correlation products of the three channels of the window image group R, G, B is obtained, and the average value of the normalized correlation products is taken as the similarity.

Further, when a small image window corresponding to another frame of the moving object image is obtained, the method includes:

obtaining the center point of the small image window corresponding to the other frame of moving target image by multiplying the center point of the small image window by the affine transformation matrix;

Further, when obtaining the mean value of the normalized correlation product, the method includes:

acquiring a gray average value of a window image group, and subtracting the gray average value corresponding to each pixel in the window image group from the gray average value of each pixel in the window image group to acquire a de-centralized window image group;

multiplying and summing corresponding pixels of the window image group subjected to decentralization to obtain a correlation product, and dividing the correlation product by a two-norm analog value of the correlation product to obtain a normalized correlation product;

the mean of the normalized correlation products for the three channels of the window image set R, G, B is obtained from the normalized correlation products.

In the above step S7, when obtaining the moving object confidence of each outer bounding box, it includes:

the confidence that the external bounding box belongs to the moving object is weighted by taking the number of the external points in the external bounding box divided by a set value as a weight, so that the external point proportion confidence is obtained;

and multiplying the outlier proportional confidence coefficient by the similarity confidence coefficient to obtain the moving target confidence coefficient of each external boundary box.

Further, the number of outliers in the outer bounding box divided by 15 is used as the weight.

In the step S5, the specific process of performing the european space clustering to obtain a plurality of clustering results is as follows:

and performing European spatial clustering on the candidate moving target pixels, traversing all the candidate moving target pixels, and sequencing by using the number of the candidate moving target pixels in the neighborhood, wherein the more the number of the candidate moving target pixels in the neighborhood is, the higher the ranking is. Region growing is performed starting from the candidate moving object pixels arranged in front, i.e. all candidate moving object pixels having a spatial distance from the pixel of less than 3 are classified into the region. And performing logical calculation on the newly classified pixels until no new candidate moving target pixels can be classified into positions. At this time, the above calculation is repeated for the remaining candidate moving object pixels, so that the candidate moving object pixels are classified as different connected objects, and clustering is completed. For each communication body, a minimum bounding box is calculated as an outer bounding box.

In the step S7, the outer bounding box with the confidence level smaller than 2 is removed.

In the above step S8, the classification network of the target is trained using the mobiletv 2 neural network.

The invention provides a detection system for a moving object below a tower crane, which comprises the following components:

feature extraction unit: the device is used for acquiring images of a moving object under the tower crane by utilizing an acquisition device 4 arranged under the tower crane trolley 3; extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted;

an outer bounding box acquisition unit: the method comprises the steps of establishing motion estimation for a moving target image after feature points are extracted based on two-dimensional affine transformation, and obtaining affine transformation relation of two adjacent frames of images; obtaining the similarity of the moving target images of the adjacent frames by utilizing the affine transformation relation of the two adjacent frames, and reserving the moving target pixels with the similarity lower than a threshold value as candidate moving target pixels; performing European spatial clustering on candidate moving target pixels to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box aiming at each clustering result;

an outer bounding box screening unit: the method comprises the steps of finding out characteristic point pairs which do not meet affine transformation relation of two adjacent frames of images in all characteristic point matching results according to affine transformation relation of the two adjacent frames of images, and taking the characteristic point pairs as a matching outer point set; obtaining the number of outliers in each outer boundary frame according to the matched outlier set, obtaining the confidence coefficient of the moving target of each outer boundary frame by using the number of outliers, and screening out the outer boundary frames meeting the confidence coefficient requirement;

classification unit: the classification network is used for training the target by utilizing the visual neural network, classifying the outer boundary box meeting the confidence coefficient requirement and determining the type and the confidence coefficient of the moving target.

Compared with the prior art, the invention has the beneficial effects that:

The above embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but any insubstantial changes and substitutions made by those skilled in the art on the basis of the present invention are intended to be within the scope of the present invention as claimed.

Claims

1. The detection method for the moving object below the tower crane is characterized by comprising the following steps of:

2. The method for detecting a moving object under a tower crane according to claim 1, wherein when a moving object image after feature point extraction is obtained, comprising:

establishing an image pyramid of the moving target image by Gaussian filtering;

3. The method for detecting a moving object under a tower crane according to claim 2, wherein when a large number of small-scale ORB feature points are extracted by an ORB algorithm, the method comprises:

4. The method for detecting a moving object under a tower crane according to claim 2, wherein when obtaining affine transformation relations of two adjacent frames of images, comprising:

5. The method for detecting a moving object under a tower crane according to claim 4, wherein when eliminating feature points of small scale ORB with excessive variability, the method comprises:

6. The method for detecting a moving object under a tower crane according to claim 1, wherein when obtaining the similarity of moving object images of adjacent frames, comprising:

7. The method for detecting a moving object under a tower crane according to claim 6, wherein when a small image window corresponding to another frame of moving object image is obtained, comprising:

8. The method for detecting a moving object under a tower crane according to claim 6, wherein when obtaining the average value of normalized correlation products, comprising:

9. The method for detecting a moving object under a tower crane according to claim 6, wherein when obtaining the confidence of the moving object of each outer bounding box, comprising:

10. A detection system for a moving object below a tower crane, comprising: