CN116385480B - Detection method and system for moving object below tower crane - Google Patents
Detection method and system for moving object below tower crane Download PDFInfo
- Publication number
- CN116385480B CN116385480B CN202310053361.1A CN202310053361A CN116385480B CN 116385480 B CN116385480 B CN 116385480B CN 202310053361 A CN202310053361 A CN 202310053361A CN 116385480 B CN116385480 B CN 116385480B
- Authority
- CN
- China
- Prior art keywords
- image
- small
- moving target
- obtaining
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 24
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims abstract description 61
- 230000009466 transformation Effects 0.000 claims abstract description 61
- 238000000034 method Methods 0.000 claims abstract description 42
- 238000013528 artificial neural network Methods 0.000 claims abstract description 19
- 230000000007 visual effect Effects 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims description 19
- 238000012216 screening Methods 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 238000012300 Sequence Analysis Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000000725 suspension Substances 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/207—Analysis of motion for motion estimation over a hierarchy of resolutions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
- G06V10/765—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses a detection method and a detection system for a moving object below a tower crane, wherein the detection method comprises the following steps: extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted; establishing motion estimation based on two-dimensional affine transformation to obtain affine transformation relation of two adjacent frames of images, and further obtaining similarity; the method comprises the steps of reserving a moving target pixel with similarity lower than a threshold as a candidate moving target pixel, performing European spatial clustering to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box for each clustering result; according to affine transformation relation of two adjacent frames of images, finding out characteristic point pairs which do not meet affine transformation relation of two adjacent frames of images, and taking the characteristic point pairs as a matched outer point set; obtaining the number of outliers in each outer boundary frame according to the matched outlier set, and obtaining the outer boundary frame meeting the confidence requirement by using the number of outliers; and classifying the external boundary frame by using the trained visual neural network to realize the detection of the moving target.
Description
Technical Field
The invention relates to the technical field of target detection, in particular to a detection method and a detection system for a moving target below a tower crane.
Background
In the automatic driving process of the intelligent tower crane, the intelligent automatic driving device has important significance in detecting the moving object below the tower crane, and the moving object mainly comprises a person, a cargo truck, a bicycle, a tricycle and the like. The current detection method is mainly based on deep learning, and mainly adopts sensors such as images, laser radars and the like to collect data sources.
The detection method based on the image only mainly realizes the detection of the moving target based on the deep neural network, and the common detection method comprises two-stage FaterRCNN, single-stage SSD and single-stage YOLO. The backbone network selection is mainly based on the Darknet and MobileNet series. Because the construction scene complexity below the tower crane is higher, and because the camera mainly adopts top-down shooting, the condition that partial target information is blocked or shielded easily appears. It is therefore desirable to guarantee the expressive power of the entire network by employing a neural network backbone network with a high degree of complexity. However, such methods tend to result in high computational power and therefore generally require computation with a processor with dedicated AI acceleration hardware support.
The method based on the laser radar and the image mainly adopts a fusion detection method of two data. The method has a loose fusion mode, such as performing deep neural network target detection based on an image, finding out a corresponding laser radar point cloud according to a projection view cone of the image, and performing target classification judgment again according to the laser radar point cloud. The method has a compact fusion method, namely, the combination of the laser radar point cloud and the image is used as the input of the neural network, the deep neural network compatible with the laser radar point cloud and the image is directly trained, and the end-to-end target detection is completed.
However, both the above methods rely heavily on deep neural networks, whose reasoning requires high computational hardware. Problems are encountered in practical large scale deployments. The operation system of the intelligent tower crane is a large-scale system comprising a plurality of links such as sensing, decision making, planning, control, safety and the like, and has numerous sensors, large calculated amount and complex architecture. Heterogeneous computing mechanism support is often required, i.e., the computation of each sensor is done as much as possible at the terminal, rather than aggregating all the data into a central service node computation. In this case, facing the single task of moving object detection, if the computation is done by relying entirely on a neural network of higher complexity, it tends to result in high costs and high power consumption of the computing device at the edge.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method and a system for detecting a moving object below a tower crane, which are used for solving the technical problems that the cost and the power consumption of computing equipment are high in the existing method for detecting the moving object below the tower crane, so that the purposes of accurately detecting the moving object below the tower crane and simultaneously reducing the cost and the power consumption of the computing equipment are achieved.
In order to solve the problems, the technical scheme adopted by the invention is as follows:
the detection method for the moving object below the tower crane comprises the following steps:
image acquisition is carried out on a moving object below the tower crane by utilizing an acquisition device arranged below the tower crane trolley;
extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted;
establishing motion estimation for the motion target image after the feature points are extracted based on two-dimensional affine transformation to obtain affine transformation relation of two adjacent frames of images;
obtaining the similarity of the moving target images of the adjacent frames by utilizing the affine transformation relation of the two adjacent frames, and reserving the moving target pixels with the similarity lower than a threshold value as candidate moving target pixels;
performing European spatial clustering on the candidate moving target pixels to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box aiming at each clustering result;
according to the affine transformation relation of the two adjacent frames of images, in all feature point matching results, finding out feature point pairs which do not meet the affine transformation relation of the two adjacent frames of images, and taking the feature point pairs as a matching outer point set;
obtaining the number of outliers in each outer boundary frame according to the matched outlier set, obtaining the confidence coefficient of the moving target of each outer boundary frame by using the number of outliers, and screening out the outer boundary frames meeting the confidence coefficient requirement;
and classifying the outer boundary box meeting the confidence coefficient requirement by utilizing a classification network of the visual neural network training target, and determining the type and the confidence coefficient of the moving target.
In a preferred embodiment of the present invention, when obtaining a moving object image from which feature points are extracted, the method includes:
establishing an image pyramid of the moving target image by Gaussian filtering;
extracting a small number of stable feature points from the top layer of the image pyramid by using a feature extraction algorithm to obtain a feature map;
searching local maximum points in the feature map, and reserving all the local maximum points obtained by searching according to a set percentage to serve as feature points;
acquiring a local direction gradient histogram of each feature point, and setting the maximum value of the local direction gradient histogram as the direction of the feature point;
extracting a large number of small-scale ORB characteristic points at the bottom layer of the image pyramid by using an ORB algorithm;
and after the feature point extraction and the small-scale ORB feature point extraction are completed on the moving target image, obtaining the moving target image after the feature point extraction.
As a preferred embodiment of the present invention, when extracting a large number of small-scale ORB feature points using an ORB algorithm, it includes:
setting an image small window, obtaining the accumulated quantity of the difference between the central point and the surrounding pixel points of the image small window, and constructing a small-scale ORB characteristic diagram according to the accumulated quantity of the difference;
searching a local maximum point in the small-scale ORB feature map as a small-scale ORB feature candidate point, and screening the small-scale ORB feature candidate point to obtain the small-scale ORB feature point;
and acquiring a centroid point of the image small window, connecting the centroid point with the center point, and taking the connecting line direction as the direction of the small-scale ORB characteristic point.
As a preferred embodiment of the present invention, when obtaining affine transformation relations of two adjacent frame images, it includes:
performing violent search based on the stable characteristic points to obtain a point pair with the minimum characteristic Euclidean distance as a rough matching point pair;
obtaining the difference of each small-scale ORB characteristic point according to the rough matching point pairs, and eliminating the small-scale ORB characteristic points with excessive difference;
and carrying out violent search again based on the reserved small-scale ORB characteristic points to obtain a point pair with the minimum characteristic Euclidean distance as a final matching result, and obtaining affine transformation relation of the two adjacent frames of images by utilizing the final matching result.
As a preferred embodiment of the present invention, when eliminating the feature points of the small-scale ORB with excessive variability, the method includes:
according to the rough matching point pairs, a two-dimensional affine transformation model of the moving image after the feature points are extracted is obtained, and small-scale ORB feature points of one image in the moving object image after the feature points are extracted are projected onto the other image by utilizing the two-dimensional affine transformation model, so that a corresponding projection result is obtained;
judging whether corresponding small-scale ORB feature points exist around projection according to the projection result, if not, judging that the small-scale ORB feature points do not have corresponding homonymous points, considering that the difference of the small-scale ORB feature points is too large, and eliminating the small-scale ORB feature points;
when obtaining the affine transformation relation of the two adjacent frames of images, the method comprises the following steps:
and carrying out least square solving on the two-dimensional affine transformation model of the moving image after the feature points are extracted by utilizing the final matching result, and obtaining the affine transformation relation of the two adjacent frames of images.
As a preferred embodiment of the present invention, when obtaining the similarity of moving object images of adjacent frames, it comprises:
acquiring all adjacent frame moving target images in the moving target images after the feature points are extracted, and taking a small image window for each pixel of one frame of moving target image in each group of adjacent frame moving target images; obtaining a small image window corresponding to another corresponding frame of moving target image according to the affine transformation relation of the two adjacent frames of images;
according to the small image window and the corresponding small image window, window image groups of each group of adjacent frame moving target images are obtained;
and acquiring the average value of the normalized correlation products of the three channels of the window image group R, G, B, and taking the average value of the normalized correlation products as the similarity.
As a preferred embodiment of the present invention, when obtaining a small image window corresponding to another frame of moving object image, it includes:
multiplying the center point of the small image window by an affine transformation matrix to obtain the center point of the small image window corresponding to another frame of moving target image;
and obtaining the corresponding small image window according to the same window size according to the center point of the corresponding small image window.
As a preferred embodiment of the present invention, when obtaining the average value of the normalized correlation product, it includes:
acquiring a gray average value of the window image group, and subtracting the gray average value corresponding to each pixel in the window image group from the gray average value of each pixel in the window image group to obtain a de-centralized window image group;
multiplying and summing corresponding pixels of the de-centralized window image group to obtain a correlation product, and dividing the correlation product by a two-norm analog value of the correlation product to obtain a normalized correlation product;
and obtaining the average value of the normalized correlation products of the three channels of the window image group R, G, B according to the normalized correlation products.
As a preferred embodiment of the present invention, when obtaining the moving object confidence of each of the outer bounding boxes, it includes:
the confidence that the outer bounding box belongs to a moving object is weighted by dividing the number of the outer points in the outer bounding box by a set value as a weight, so that the confidence of the outer point proportion is obtained;
acquiring the proportion of pixel points in the outer boundary frame to the small image window as similarity confidence;
and multiplying the outlier proportional confidence by the similarity confidence to obtain the moving target confidence of each external boundary box.
A detection system for a moving object below a tower crane, comprising:
feature extraction unit: the device is used for acquiring images of a moving object below the tower crane by using an acquisition device arranged below the tower crane trolley; extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted;
an outer bounding box acquisition unit: the method comprises the steps of establishing motion estimation for a moving target image after feature points are extracted based on two-dimensional affine transformation, and obtaining affine transformation relation of two adjacent frames of images; obtaining the similarity of the moving target images of the adjacent frames by utilizing the affine transformation relation of the two adjacent frames, and reserving the moving target pixels with the similarity lower than a threshold value as candidate moving target pixels; performing European spatial clustering on the candidate moving target pixels to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box aiming at each clustering result;
an outer bounding box screening unit: the method comprises the steps of finding out characteristic point pairs which do not meet affine transformation relation of two adjacent frame images in all characteristic point matching results according to affine transformation relation of the two adjacent frame images, and taking the characteristic point pairs as a matching outer point set; obtaining the number of outliers in each outer boundary frame according to the matched outlier set, obtaining the confidence coefficient of the moving target of each outer boundary frame by using the number of outliers, and screening out the outer boundary frames meeting the confidence coefficient requirement;
classification unit: and the classification network is used for training the target by utilizing the visual neural network, classifying the outer boundary box meeting the confidence coefficient requirement and determining the type and the confidence coefficient of the moving target.
Compared with the prior art, the invention has the beneficial effects that:
(1) In the lifting process of the tower crane, different characteristic point extraction methods are used on different scales, so that the algorithm speed is effectively improved and the subsequent matching precision is ensured;
(2) When the method is used for detecting the moving object below the tower crane, the external boundary box meeting the confidence coefficient requirement is obtained by fusion image sequence analysis and feature point matching, and then the trained visual neural network is used for classification, so that the moving object is accurately detected, and the method is different from the existing detection method which completely relies on the neural network to complete calculation, so that the cost and the power consumption of the computing equipment are effectively reduced.
The invention is described in further detail below with reference to the drawings and the detailed description.
Drawings
FIG. 1 is a schematic diagram of a deployment location of an acquisition device according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating steps of a method for detecting a moving object under a tower crane according to an embodiment of the present invention.
Reference numerals illustrate: 1. a suspension arm; 2. a tower body; 3. a tower crane trolley; 4. and a collecting device.
Detailed Description
The method for detecting the moving object below the tower crane provided by the invention, as shown in fig. 2, comprises the following steps:
step S1: image acquisition is carried out on a moving object below the tower crane by utilizing an acquisition device 4 arranged below the tower crane trolley 3;
step S2: extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted;
step S3: establishing motion estimation on the motion target image after the feature points are extracted based on two-dimensional affine transformation to obtain affine transformation relation of two adjacent frames of images;
step S4: obtaining the similarity of the moving target images of the adjacent frames by utilizing the affine transformation relation of the two adjacent frames, and reserving the moving target pixels with the similarity lower than a threshold value as candidate moving target pixels;
step S5: performing European spatial clustering on candidate moving target pixels to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box aiming at each clustering result;
step S6: according to affine transformation relation of two adjacent frames of images, in all feature point matching results, finding out feature point pairs which do not meet affine transformation relation of two adjacent frames of images, and taking the feature point pairs as a matching outer point set;
step S7: obtaining the number of outliers in each outer boundary frame according to the matched outlier set, obtaining the confidence coefficient of the moving target of each outer boundary frame by using the number of outliers, and screening out the outer boundary frames meeting the confidence coefficient requirement;
step S8: and classifying the outer boundary boxes meeting the confidence coefficient requirements by utilizing a classification network of the visual neural network training target, and determining the type and the confidence coefficient of the moving target.
Specifically, the acquisition device 4 adopted by the invention is a downward-looking camera, and as shown in fig. 1, the main components of the tower crane comprise a suspension arm 1, a tower body 2 and a tower crane trolley 3. The downward-looking camera is arranged below the crane trolley 3 and is perpendicular to the ground to shoot a moving object, so that the moving object below the crane is effectively subjected to image acquisition.
In step S2, when obtaining a moving object image from which feature points are extracted, the method includes:
establishing an image pyramid of the moving target image by Gaussian filtering;
extracting a small amount of stable feature points on the top layer of the image pyramid by using a feature extraction algorithm to obtain a feature map;
searching local maximum points in the feature map, and reserving all the local maximum points obtained by searching according to a set percentage to serve as feature points;
acquiring a local direction gradient histogram of each feature point, and setting the maximum value of the local direction gradient histogram as the direction of the feature point;
extracting a large number of small-scale ORB characteristic points at the bottom layer of the image pyramid by using an ORB algorithm;
and after feature point extraction and small-scale ORB feature point extraction are completed on the moving target image, obtaining the moving target image after feature point extraction.
Further, the feature extraction algorithm adopted in the present invention is a SIFT algorithm.
Further, the percentage set in the present invention is the first twenty percent.
Specifically, the image pyramid is constructed by using a gaussian filter, the basic value of the standard deviation of the gaussian function is 1.6, and 6 times of filtering are calculated in total, and in the ith filtering, the standard deviation is 1.6 to the power of i. And extracting a small amount of stable feature points from the top layer of the image pyramid by using a SIFT algorithm, namely obtaining a feature map by using the difference value between the top layer and the secondary top layer of the image pyramid.
Further, when a large number of small-scale ORB feature points are extracted using the ORB algorithm, it includes:
setting an image small window, acquiring the accumulated quantity of differences between the central point and surrounding pixel points of the image small window, and constructing a small-scale ORB characteristic diagram according to the accumulated quantity of the differences;
searching local maximum points in the small-scale ORB feature map to serve as small-scale ORB feature candidate points, and screening the small-scale ORB feature candidate points to obtain small-scale ORB feature points;
and acquiring a centroid point of the small window of the image, connecting the centroid point with the center point, and taking the connecting line direction as the direction of the small-scale ORB characteristic point.
In the above step S3, when obtaining the affine transformation relationship of the adjacent two frame images, it includes:
performing violent search based on the stable characteristic points to obtain a point pair with the minimum characteristic Euclidean distance as a rough matching point pair;
obtaining the difference of each small-scale ORB characteristic point according to the rough matching point pairs, and eliminating the small-scale ORB characteristic points with excessive difference;
and carrying out violent search again based on the reserved small-scale ORB characteristic points to obtain a point pair with the minimum characteristic Euclidean distance as a final matching result, and obtaining affine transformation relation of two adjacent frames of images by utilizing the final matching result.
Further, when eliminating the feature points of the small scale ORB with excessive variability, the method includes:
according to the rough matching point pairs, a two-dimensional affine transformation model of the moving image after the feature points are extracted is obtained, and small-scale ORB feature points of one image in the moving target image after the feature points are extracted are projected onto the other image by using the two-dimensional affine transformation model, so that a corresponding projection result is obtained;
and judging whether corresponding small-scale ORB feature points exist around the projection according to the projection result, if not, judging that the small-scale ORB feature points do not have corresponding homonymous points, considering that the difference of the small-scale ORB feature points is too large, and eliminating the small-scale ORB feature points.
Further, when obtaining affine transformation relations of two adjacent frame images, it includes:
and carrying out least square solution on the two-dimensional affine transformation model of the moving image after the feature points are adjacently extracted by utilizing the final matching result, and obtaining the affine transformation relation of the two adjacent frames of images.
In the above step S4, the moving object pixels having the similarity lower than 0.5 are reserved as candidate moving object pixels.
In the above step S4, when obtaining the similarity of the moving object images of the adjacent frames, it includes:
acquiring all adjacent frame moving target images in the moving target images after the feature points are extracted, and taking a small image window for each pixel of one frame of moving target image in each group of adjacent frame moving target images; obtaining a small image window corresponding to another corresponding frame of moving target image according to affine transformation relation of two adjacent frames of images;
according to the small image window and the corresponding small image window, obtaining a window image group of each group of adjacent frame moving target images;
the average value of the normalized correlation products of the three channels of the window image group R, G, B is obtained, and the average value of the normalized correlation products is taken as the similarity.
Further, when a small image window corresponding to another frame of the moving object image is obtained, the method includes:
obtaining the center point of the small image window corresponding to the other frame of moving target image by multiplying the center point of the small image window by the affine transformation matrix;
and obtaining the corresponding small image window according to the same window size according to the center point of the corresponding small image window.
Further, when obtaining the mean value of the normalized correlation product, the method includes:
acquiring a gray average value of a window image group, and subtracting the gray average value corresponding to each pixel in the window image group from the gray average value of each pixel in the window image group to acquire a de-centralized window image group;
multiplying and summing corresponding pixels of the window image group subjected to decentralization to obtain a correlation product, and dividing the correlation product by a two-norm analog value of the correlation product to obtain a normalized correlation product;
the mean of the normalized correlation products for the three channels of the window image set R, G, B is obtained from the normalized correlation products.
In the above step S7, when obtaining the moving object confidence of each outer bounding box, it includes:
the confidence that the external bounding box belongs to the moving object is weighted by taking the number of the external points in the external bounding box divided by a set value as a weight, so that the external point proportion confidence is obtained;
acquiring the proportion of pixel points in the outer boundary frame to the small image window as similarity confidence;
and multiplying the outlier proportional confidence coefficient by the similarity confidence coefficient to obtain the moving target confidence coefficient of each external boundary box.
Further, the number of outliers in the outer bounding box divided by 15 is used as the weight.
In the step S5, the specific process of performing the european space clustering to obtain a plurality of clustering results is as follows:
and performing European spatial clustering on the candidate moving target pixels, traversing all the candidate moving target pixels, and sequencing by using the number of the candidate moving target pixels in the neighborhood, wherein the more the number of the candidate moving target pixels in the neighborhood is, the higher the ranking is. Region growing is performed starting from the candidate moving object pixels arranged in front, i.e. all candidate moving object pixels having a spatial distance from the pixel of less than 3 are classified into the region. And performing logical calculation on the newly classified pixels until no new candidate moving target pixels can be classified into positions. At this time, the above calculation is repeated for the remaining candidate moving object pixels, so that the candidate moving object pixels are classified as different connected objects, and clustering is completed. For each communication body, a minimum bounding box is calculated as an outer bounding box.
In the step S7, the outer bounding box with the confidence level smaller than 2 is removed.
In the above step S8, the classification network of the target is trained using the mobiletv 2 neural network.
The invention provides a detection system for a moving object below a tower crane, which comprises the following components:
feature extraction unit: the device is used for acquiring images of a moving object under the tower crane by utilizing an acquisition device 4 arranged under the tower crane trolley 3; extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted;
an outer bounding box acquisition unit: the method comprises the steps of establishing motion estimation for a moving target image after feature points are extracted based on two-dimensional affine transformation, and obtaining affine transformation relation of two adjacent frames of images; obtaining the similarity of the moving target images of the adjacent frames by utilizing the affine transformation relation of the two adjacent frames, and reserving the moving target pixels with the similarity lower than a threshold value as candidate moving target pixels; performing European spatial clustering on candidate moving target pixels to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box aiming at each clustering result;
an outer bounding box screening unit: the method comprises the steps of finding out characteristic point pairs which do not meet affine transformation relation of two adjacent frames of images in all characteristic point matching results according to affine transformation relation of the two adjacent frames of images, and taking the characteristic point pairs as a matching outer point set; obtaining the number of outliers in each outer boundary frame according to the matched outlier set, obtaining the confidence coefficient of the moving target of each outer boundary frame by using the number of outliers, and screening out the outer boundary frames meeting the confidence coefficient requirement;
classification unit: the classification network is used for training the target by utilizing the visual neural network, classifying the outer boundary box meeting the confidence coefficient requirement and determining the type and the confidence coefficient of the moving target.
Compared with the prior art, the invention has the beneficial effects that:
(1) In the lifting process of the tower crane, different characteristic point extraction methods are used on different scales, so that the algorithm speed is effectively improved and the subsequent matching precision is ensured;
(2) When the method is used for detecting the moving object below the tower crane, the external boundary box meeting the confidence coefficient requirement is obtained by fusion image sequence analysis and feature point matching, and then the trained visual neural network is used for classification, so that the moving object is accurately detected, and the method is different from the existing detection method which completely relies on the neural network to complete calculation, so that the cost and the power consumption of the computing equipment are effectively reduced.
The above embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but any insubstantial changes and substitutions made by those skilled in the art on the basis of the present invention are intended to be within the scope of the present invention as claimed.
Claims (10)
1. The detection method for the moving object below the tower crane is characterized by comprising the following steps of:
image acquisition is carried out on a moving object below the tower crane by utilizing an acquisition device arranged below the tower crane trolley;
extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted;
establishing motion estimation for the motion target image after the feature points are extracted based on two-dimensional affine transformation to obtain affine transformation relation of two adjacent frames of images;
obtaining the similarity of the moving target images of the adjacent frames by utilizing the affine transformation relation of the two adjacent frames, and reserving the moving target pixels with the similarity lower than a threshold value as candidate moving target pixels;
performing European spatial clustering on the candidate moving target pixels to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box aiming at each clustering result;
according to the affine transformation relation of the two adjacent frames of images, in all feature point matching results, finding out feature point pairs which do not meet the affine transformation relation of the two adjacent frames of images, and taking the feature point pairs as a matching outer point set;
obtaining the number of outliers in each outer boundary frame according to the matched outlier set, obtaining the confidence coefficient of the moving target of each outer boundary frame by using the number of outliers, and screening out the outer boundary frames meeting the confidence coefficient requirement;
and classifying the outer boundary box meeting the confidence coefficient requirement by utilizing a classification network of the visual neural network training target, and determining the type and the confidence coefficient of the moving target.
2. The method for detecting a moving object under a tower crane according to claim 1, wherein when a moving object image after feature point extraction is obtained, comprising:
establishing an image pyramid of the moving target image by Gaussian filtering;
extracting a small number of stable feature points from the top layer of the image pyramid by using a feature extraction algorithm to obtain a feature map;
searching local maximum points in the feature map, and reserving all the local maximum points obtained by searching according to a set percentage to serve as feature points;
acquiring a local direction gradient histogram of each feature point, and setting the maximum value of the local direction gradient histogram as the direction of the feature point;
extracting a large number of small-scale ORB characteristic points at the bottom layer of the image pyramid by using an ORB algorithm;
and after the feature point extraction and the small-scale ORB feature point extraction are completed on the moving target image, obtaining the moving target image after the feature point extraction.
3. The method for detecting a moving object under a tower crane according to claim 2, wherein when a large number of small-scale ORB feature points are extracted by an ORB algorithm, the method comprises:
setting an image small window, obtaining the accumulated quantity of the difference between the central point and the surrounding pixel points of the image small window, and constructing a small-scale ORB characteristic diagram according to the accumulated quantity of the difference;
searching a local maximum point in the small-scale ORB feature map as a small-scale ORB feature candidate point, and screening the small-scale ORB feature candidate point to obtain the small-scale ORB feature point;
and acquiring a centroid point of the image small window, connecting the centroid point with the center point, and taking the connecting line direction as the direction of the small-scale ORB characteristic point.
4. The method for detecting a moving object under a tower crane according to claim 2, wherein when obtaining affine transformation relations of two adjacent frames of images, comprising:
performing violent search based on the stable characteristic points to obtain a point pair with the minimum characteristic Euclidean distance as a rough matching point pair;
obtaining the difference of each small-scale ORB characteristic point according to the rough matching point pairs, and eliminating the small-scale ORB characteristic points with excessive difference;
and carrying out violent search again based on the reserved small-scale ORB characteristic points to obtain a point pair with the minimum characteristic Euclidean distance as a final matching result, and obtaining affine transformation relation of the two adjacent frames of images by utilizing the final matching result.
5. The method for detecting a moving object under a tower crane according to claim 4, wherein when eliminating feature points of small scale ORB with excessive variability, the method comprises:
according to the rough matching point pairs, a two-dimensional affine transformation model of the moving image after the feature points are extracted is obtained, and small-scale ORB feature points of one image in the moving object image after the feature points are extracted are projected onto the other image by utilizing the two-dimensional affine transformation model, so that a corresponding projection result is obtained;
judging whether corresponding small-scale ORB feature points exist around projection according to the projection result, if not, judging that the small-scale ORB feature points do not have corresponding homonymous points, considering that the difference of the small-scale ORB feature points is too large, and eliminating the small-scale ORB feature points;
when obtaining the affine transformation relation of the two adjacent frames of images, the method comprises the following steps:
and carrying out least square solving on the two-dimensional affine transformation model of the moving image after the feature points are extracted by utilizing the final matching result, and obtaining the affine transformation relation of the two adjacent frames of images.
6. The method for detecting a moving object under a tower crane according to claim 1, wherein when obtaining the similarity of moving object images of adjacent frames, comprising:
acquiring all adjacent frame moving target images in the moving target images after the feature points are extracted, and taking a small image window for each pixel of one frame of moving target image in each group of adjacent frame moving target images; obtaining a small image window corresponding to another corresponding frame of moving target image according to the affine transformation relation of the two adjacent frames of images;
according to the small image window and the corresponding small image window, window image groups of each group of adjacent frame moving target images are obtained;
and acquiring the average value of the normalized correlation products of the three channels of the window image group R, G, B, and taking the average value of the normalized correlation products as the similarity.
7. The method for detecting a moving object under a tower crane according to claim 6, wherein when a small image window corresponding to another frame of moving object image is obtained, comprising:
multiplying the center point of the small image window by an affine transformation matrix to obtain the center point of the small image window corresponding to another frame of moving target image;
and obtaining the corresponding small image window according to the same window size according to the center point of the corresponding small image window.
8. The method for detecting a moving object under a tower crane according to claim 6, wherein when obtaining the average value of normalized correlation products, comprising:
acquiring a gray average value of the window image group, and subtracting the gray average value corresponding to each pixel in the window image group from the gray average value of each pixel in the window image group to obtain a de-centralized window image group;
multiplying and summing corresponding pixels of the de-centralized window image group to obtain a correlation product, and dividing the correlation product by a two-norm analog value of the correlation product to obtain a normalized correlation product;
and obtaining the average value of the normalized correlation products of the three channels of the window image group R, G, B according to the normalized correlation products.
9. The method for detecting a moving object under a tower crane according to claim 6, wherein when obtaining the confidence of the moving object of each outer bounding box, comprising:
the confidence that the outer bounding box belongs to a moving object is weighted by dividing the number of the outer points in the outer bounding box by a set value as a weight, so that the confidence of the outer point proportion is obtained;
acquiring the proportion of pixel points in the outer boundary frame to the small image window as similarity confidence;
and multiplying the outlier proportional confidence by the similarity confidence to obtain the moving target confidence of each external boundary box.
10. A detection system for a moving object below a tower crane, comprising:
feature extraction unit: the device is used for acquiring images of a moving object below the tower crane by using an acquisition device arranged below the tower crane trolley; extracting feature points of an image sequence from the acquired moving target image to obtain a moving target image after the feature points are extracted;
an outer bounding box acquisition unit: the method comprises the steps of establishing motion estimation for a moving target image after feature points are extracted based on two-dimensional affine transformation, and obtaining affine transformation relation of two adjacent frames of images; obtaining the similarity of the moving target images of the adjacent frames by utilizing the affine transformation relation of the two adjacent frames, and reserving the moving target pixels with the similarity lower than a threshold value as candidate moving target pixels; performing European spatial clustering on the candidate moving target pixels to obtain a plurality of clustering results, and obtaining a corresponding outer boundary box aiming at each clustering result;
an outer bounding box screening unit: the method comprises the steps of finding out characteristic point pairs which do not meet affine transformation relation of two adjacent frame images in all characteristic point matching results according to affine transformation relation of the two adjacent frame images, and taking the characteristic point pairs as a matching outer point set; obtaining the number of outliers in each outer boundary frame according to the matched outlier set, obtaining the confidence coefficient of the moving target of each outer boundary frame by using the number of outliers, and screening out the outer boundary frames meeting the confidence coefficient requirement;
classification unit: and the classification network is used for training the target by utilizing the visual neural network, classifying the outer boundary box meeting the confidence coefficient requirement and determining the type and the confidence coefficient of the moving target.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310053361.1A CN116385480B (en) | 2023-02-03 | 2023-02-03 | Detection method and system for moving object below tower crane |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310053361.1A CN116385480B (en) | 2023-02-03 | 2023-02-03 | Detection method and system for moving object below tower crane |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116385480A CN116385480A (en) | 2023-07-04 |
CN116385480B true CN116385480B (en) | 2023-10-20 |
Family
ID=86977618
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310053361.1A Active CN116385480B (en) | 2023-02-03 | 2023-02-03 | Detection method and system for moving object below tower crane |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116385480B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014092550A2 (en) * | 2012-12-10 | 2014-06-19 | Mimos Berhad | Method for camera motion estimation with presence of moving object |
CN110245671A (en) * | 2019-06-17 | 2019-09-17 | 艾瑞迈迪科技石家庄有限公司 | A kind of endoscopic images characteristic point matching method and system |
CN112418251A (en) * | 2020-12-10 | 2021-02-26 | 研祥智能科技股份有限公司 | Infrared body temperature detection method and system |
CN114358166A (en) * | 2021-12-29 | 2022-04-15 | 青岛星科瑞升信息科技有限公司 | Multi-target positioning method based on self-adaptive k-means clustering |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4492036B2 (en) * | 2003-04-28 | 2010-06-30 | ソニー株式会社 | Image recognition apparatus and method, and robot apparatus |
-
2023
- 2023-02-03 CN CN202310053361.1A patent/CN116385480B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014092550A2 (en) * | 2012-12-10 | 2014-06-19 | Mimos Berhad | Method for camera motion estimation with presence of moving object |
CN110245671A (en) * | 2019-06-17 | 2019-09-17 | 艾瑞迈迪科技石家庄有限公司 | A kind of endoscopic images characteristic point matching method and system |
CN112418251A (en) * | 2020-12-10 | 2021-02-26 | 研祥智能科技股份有限公司 | Infrared body temperature detection method and system |
CN114358166A (en) * | 2021-12-29 | 2022-04-15 | 青岛星科瑞升信息科技有限公司 | Multi-target positioning method based on self-adaptive k-means clustering |
Also Published As
Publication number | Publication date |
---|---|
CN116385480A (en) | 2023-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110175576B (en) | Driving vehicle visual detection method combining laser point cloud data | |
CN111680542B (en) | Steel coil point cloud identification and classification method based on multi-scale feature extraction and Pointnet neural network | |
WO2019228063A1 (en) | Product inspection terminal, method and system, computer apparatus and readable medium | |
CN106886216B (en) | Robot automatic tracking method and system based on RGBD face detection | |
CN108986148B (en) | Method for realizing multi-intelligent-trolley collaborative search, identification and tracking of specific target group | |
CN109935080B (en) | Monitoring system and method for real-time calculation of traffic flow on traffic line | |
CN107993488A (en) | A kind of parking stall recognition methods, system and medium based on fisheye camera | |
CN103246896A (en) | Robust real-time vehicle detection and tracking method | |
CN111753682B (en) | Hoisting area dynamic monitoring method based on target detection algorithm | |
CN111223129A (en) | Detection method, detection device, monitoring equipment and computer readable storage medium | |
CN105069451B (en) | A kind of Car license recognition and localization method based on binocular camera | |
CN115272652A (en) | Dense object image detection method based on multiple regression and adaptive focus loss | |
CN111915583B (en) | Vehicle and pedestrian detection method based on vehicle-mounted thermal infrared imager in complex scene | |
CN110852179B (en) | Suspicious personnel invasion detection method based on video monitoring platform | |
CN104915642A (en) | Method and apparatus for measurement of distance to vehicle ahead | |
CN113688797A (en) | Abnormal behavior identification method and system based on skeleton extraction | |
CN112560619A (en) | Multi-focus image fusion-based multi-distance bird accurate identification method | |
CN111091057A (en) | Information processing method and device and computer readable storage medium | |
CN115100741A (en) | Point cloud pedestrian distance risk detection method, system, equipment and medium | |
CN109215059B (en) | Local data association method for tracking moving vehicle in aerial video | |
CN111259736A (en) | Real-time pedestrian detection method based on deep learning in complex environment | |
CN114022837A (en) | Station left article detection method and device, electronic equipment and storage medium | |
CN116385480B (en) | Detection method and system for moving object below tower crane | |
CN109815887B (en) | Multi-agent cooperation-based face image classification method under complex illumination | |
CN112465854A (en) | Unmanned aerial vehicle tracking method based on anchor-free detection algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |