CN117726793A - Port gantry crane detection method - Google Patents

Port gantry crane detection method Download PDF

Info

Publication number
CN117726793A
CN117726793A CN202311708848.5A CN202311708848A CN117726793A CN 117726793 A CN117726793 A CN 117726793A CN 202311708848 A CN202311708848 A CN 202311708848A CN 117726793 A CN117726793 A CN 117726793A
Authority
CN
China
Prior art keywords
target
frame
point
gantry crane
grounding point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311708848.5A
Other languages
Chinese (zh)
Inventor
赵欣
李鹏程
张显宏
衡量
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Youdao Zhitu Technology Co Ltd
Original Assignee
Shanghai Youdao Zhitu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Youdao Zhitu Technology Co Ltd filed Critical Shanghai Youdao Zhitu Technology Co Ltd
Priority to CN202311708848.5A priority Critical patent/CN117726793A/en
Publication of CN117726793A publication Critical patent/CN117726793A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a detection method of a port gantry crane, which comprises the steps of observing visual texture characteristics of the gantry crane, manually marking a target pseudo-3D frame on a pixel frame, designing a deep convolutional neural network model, sending the marked pixel frame into model learning, generating a 2D boundary frame of the target on an image by the trained model, generating a pseudo-3D boundary frame on the image, measuring the distance between a host vehicle and a key target by predicted coordinate information, judging the exposure state of the target on the pixel frame and the exposure state of a target grounding point by predicted semantic attributes, further helping to plan a driving route and avoiding collision, and being strong in universality and suitable for a wider wharf scene.

Description

Port gantry crane detection method
Technical Field
The invention belongs to the technical field of intelligent driving, relates to detection of port gantry cranes, and in particular relates to a detection method of port gantry cranes.
Background
Autopilot, which aims to make a vehicle intelligently aware of the surrounding environment and safely travel with little or no human effort, has been rapidly developed in recent years. The unmanned wharf is the most likely landing direction of the intelligent driving technology of the automobile, and the unmanned transfer vehicle is used for replacing a manpower driving transfer container, so that the manpower can be saved, and the operation efficiency can be improved. The gantry crane is a gantry crane on a quay for loading and unloading operations, is the heart force of the quay, and the operation capacity determines the cargo throughput capacity of a quay. The metal structure of the device is like a door-shaped frame, and two supporting legs are arranged below the bearing main beam and can directly walk on a track on the ground. The sensing system of the automatic driving helps the vehicle to know the surrounding environment through various sensor inputs (image data from a camera, point cloud data from a laser radar, a high-precision map and the like), and a high-quality sensing result is an important basis for downstream track prediction and path planning. At present, common sensing targets in the industry are pedestrians, motor vehicles, bicycles, cone-drum roadblocks and the like, and the detection of specific targets (such as gantry cranes) of ports is still blank.
In the existing automatic driving target detection related technology, detection methods based on a traditional detection method and a detection method based on deep learning are classified. The traditional target detection model uses a series of feature extractors (such as SIFT, viola-Jones, HOG, DPM and the like) with slower speed and lower precision; deep learning changes the pattern of the visual field, and is increasingly proven to be a more robust target detection method: the method comprises the steps of acquiring a pixel frame through a camera arranged on a vehicle, manually marking a target rectangular frame on the pixel frame, designing a neural network model, sending marked pictures into model learning, automatically identifying and outputting object types in the pictures by the trained model, and marking the positions of the object types in the pictures by using a 2D frame.
The existing target detection model can be mainly divided into a one-stage target detection algorithm and a two-stage target detection algorithm. The one-stage target detection algorithm does not need to generate regional suggestions, and the type scoring value and the position coordinate value of the object are directly generated through one stage, wherein a typical algorithm is YOLO, SSD, centernet; the two-stage target detection algorithm firstly generates candidate areas in the first stage, and then classifies and refines the candidate areas in the second stage, wherein typical algorithms include R-CNN, fast R-CNN and Fast R-CNN. Generally, a one-phase network has a faster speed and a higher portability than a two-phase network, and is the current choice. The mountain-opening CornerNet converts the detection of the target boundary frame into the detection of the upper left corner point and the lower right corner point, a priori frame is not required to be designed, then published Centernet analyzes the prediction result of the CornerNet, the proportion of false positive frames (which are not target frames but are predicted to be target frames) is found to be very high, the possible reason is that the CornerNet focuses on the boundary of the frame, the central content of an object is ignored, the CenterNet is added with a central key point, a branch is also added to the corresponding network structure, and the feature map and the offsets of the central point are predicted, so that the detection precision is improved.
However, the method only generates a 2D boundary box of the template on the image, so that the information of the transverse and longitudinal distances between the target and the vehicle is difficult to obtain, the information provided by the 2D boundary box for the target with high interaction with the downstream of the gantry crane is insufficient, and the vehicle can not be well helped to plan a driving route and avoid collision.
Disclosure of Invention
Aiming at the problems, the main purpose of the invention is to design a detection method of a port gantry crane, which is applied to development of unmanned technology of port vehicles, and solves the technical problems that the 2D bounding box provides insufficient information and can not well help a vehicle to plan a driving route and avoid collision.
The invention adopts the following technical scheme for realizing the purposes:
the detection method of the portal crane of the port obtains pixel frame data of the port, and marks a target pseudo 3D frame on the pixel frame data based on visual texture characteristics of the portal crane;
designing a neural network model to obtain a target detection model, inputting the labeled pixel frames into the neural network model for learning, and predicting the position information and the attribute of the gantry crane;
and judging the exposure state of the gantry crane on the pixel frame and the exposure state of the gantry crane grounding point based on the position information and the attribute of the gantry crane.
As a further description of the invention, the training of the target detection model in the method comprises the following specific steps:
step 1: acquiring pixel frame data acquired by a port real vehicle, and marking a pseudo 3D contour and attribute of a gantry crane to form a target detection training set;
step 2: randomly cutting an original pixel frame in a target detection training set, downsampling to a preset size, and normalizing to obtain a feature map; traversing each target on a pixel frame, and establishing a truth hash table about the clipped pixel frame;
step 3: constructing a deep convolutional neural network model, and adding regression branches, truncated attribute branches and left and right visible line branches of a grounding point into the deep convolutional neural network model;
step 4: training by using the deep convolutional neural network model to obtain a trained target detection model.
As a further description of the present invention, in the step 1, the bottom frame of the gantry crane is framed in a manner of labeling a pseudo 3D contour with a rectangle+trapezoid; the rectangular frame selects the short side of the bottom leg of the gantry crane, and the trapezoidal frame selects the long side of the bottom leg of the gantry crane.
As a further description of the present invention, in the step 1, the attribute label includes a target direction label and a target left-right visible line label; the target direction comprises a head exposure, a tail exposure, a left side vehicle body exposure, a right side vehicle body exposure, a head and left side vehicle body exposure, a tail and left side vehicle body exposure, a head and right side vehicle body exposure, a tail and right side vehicle body exposure, and a target left and right visible line is a left and right boundary line of a target non-shielded part.
As a further description of the present invention, in the step 2, each object on the pixel frame is traversed, and a truth hash table is created for the clipped pixel frame, and the method further includes the following steps:
s21: traversing each target on the pixel frame, and mapping the original coordinate value of each target according to the downsampling proportion to obtain a center point coordinate, a grounding point coordinate and left and right visible line coordinates of the target after cutting downsampling;
s22: the grounding point of the gantry crane is encoded into a coordinate characteristic value and a semantic characteristic value, whether the grounding point is in a cutting area or not is judged, and the coordinate characteristic value and the semantic characteristic value of the grounding point which are not in the cutting area are updated;
s23: the coordinate characteristic value and the semantic characteristic value of the grounding point in the cutting area are reserved, the left visible line coordinate and the right visible line coordinate are updated according to the coordinate characteristic value after the grounding point is updated, the maximum circumscribed rectangle of the pseudo 3D labeling rectangular frame and the trapezoid frame is calculated to be used as a 2D boundary frame, and the left upper corner point and the right lower corner point coordinates of the 2D boundary frame are obtained;
s24: coding the truncation attribute of the target according to a truncation classification algorithm;
s25: establishing a true value hash table for the clipped pixel frame, wherein the true value hash table comprises information: target center point coordinates, 2D bounding box corner coordinates, pseudo 3D ground point coordinates observability, left and right visible line coordinates, truncation properties.
As a further description of the present invention, the grounding point is a corner point where the object contacts with the ground, and includes a left front grounding point, a left rear grounding point, a right front grounding point, and a right rear grounding point;
in the step S22, the coordinate feature value of the grounding point is the pixel coordinate of the point in the image coordinate system, the semantic feature value of the grounding point is whether the point can be observed or not when the point is observed from the view angle of the camera under the condition that the point is not shielded; the observable code is 1 and the unobservable code is 0.
As a further description of the present invention, the truncation classification algorithm determines according to whether the target is single-direction exposure or double-direction exposure, wherein the single-direction exposure represents the target through a rectangular frame, the double-direction exposure represents the target through a rectangular+trapezoidal frame, and then whether the grounding point performs the encoding of the truncation attribute in the clipped pixel frame or not is determined.
As a further description of the present invention, in the step S24, the truncated attribute is a truncated state of the target on the pixel frame, and the truncated states are respectively encoded into five semantic features of 0, 1, 2, 3 and 4, where the truncated states of the representations of the five semantic features are respectively:
0) The target is complete, and the edges of the pixel frames which are not cut are cut off;
1) The target is cut off, only the front face is exposed, and the rectangle is exposed;
2) The object is cut off, only the side face is exposed, and the trapezoid is exposed;
3) The target is cut off, the front face and the side face are exposed, the grounding point of the boundary edge is in the cut pixel frame, and the rectangle and the trapezoid are exposed;
4) The object is cut off, the front face and the side face are exposed, the grounding point of the boundary edge is in the cut pixel frame, and the rectangle and the trapezoid are exposed.
As a further description of the present invention, in the step 3, a deep convolutional neural network model is constructed, including the following steps:
s31: the backbone network adopts an improved VGG network, and comprises five modules from bottom to top, each module uses 2x2 pooling cores, small convolution cores of 3x3 are stacked to replace convolution cores of 5x5 and 7x7, and the output of each convolution core is subjected to batch normalization and then subjected to nonlinear processing by a ReLU activation function;
s32: the neck network adopts a pyramid network structure of the FPN characteristic map, and the fine characteristics of the high layer and the coarse characteristics of the low layer are fused in an up-sampling and element addition mode;
s33: the head network is provided with a classifier and a regressive device, wherein the classifier uses a Sigmoid layer for outputting whether a target belongs to a foreground class or a background class; the regressor is provided with a plurality of branch layers which are respectively used for different task training;
s34: branches for classification among the branches of the head network use cross entropy loss functions, branches for regression coordinates use L1 norm loss functions, and the branch loss functions are added up to a total loss function.
As a further description of the present invention, in the step 4, training is performed by using a deep convolutional neural network to obtain a trained target detection model, which includes the following steps:
s41: the feature map extracted from the pixel frame is sent into a network, the weight and bias of a convolution kernel are initialized, a final prediction result is obtained from an input layer through a series of convolution layers, an activation function and a pooling layer, and a loss function of the current network is calculated according to the difference between the prediction result and a true value;
s42: calculating the gradient of the loss function on each weight of the output layer, sequentially calculating the gradient of each layer forwards to obtain the gradient of each weight in the whole network, and updating the weights according to the gradient of the weights by using a gradient descent method or other optimization algorithms; and continuously adjusting the weight through multiple iterations to obtain a trained target detection model.
Compared with the prior art, the invention has the technical effects that:
the invention provides a detection method of a port gantry crane, which comprises the steps of observing visual texture characteristics of the gantry crane, manually marking a target pseudo-3D frame on a pixel frame, designing a deep convolutional neural network model, sending the marked pixel frame into model learning, generating a 2D boundary frame of the target on an image by the trained model, generating a pseudo-3D boundary frame on the image, measuring the distance between a host vehicle and a key target by predicted coordinate information, judging the exposure state of the target on the pixel frame and the exposure state of a target grounding point by predicted semantic attributes, further helping to plan a driving route and avoiding collision, and being strong in universality and suitable for a wider wharf scene.
Drawings
FIG. 1 is a schematic flow chart of the overall method of the present invention;
FIG. 2 is a schematic diagram of a pseudo 3D frame marker in the present invention;
FIG. 3 is a schematic view of the object of the present invention in left and right visible lines;
FIG. 4 is a schematic diagram of a pixel frame processing and true value encoding flow of the present invention;
FIG. 5 is a schematic diagram of a pseudo 3D frame labeled grounding point according to the present invention;
FIG. 6 is a schematic diagram of the observability mapping relationship between direction and ground point according to the present invention;
FIG. 7 is a schematic diagram of truncated property encoding according to the present invention;
fig. 8 is a schematic diagram of a deep learning neural network according to the present invention.
Description of the embodiments
The invention is described in detail below with reference to the attached drawing figures:
in one embodiment of the invention, a detection method of a portal crane of a port is disclosed, referring to fig. 1-8, firstly, pixel frame data of the port is obtained, and a target pseudo 3D frame on the pixel frame data is marked based on visual texture characteristics of the portal crane; secondly, designing a neural network model to obtain a target detection model, inputting the marked pixel frames into the neural network model for learning, and predicting the position information and the attribute of the gantry crane; and finally, judging the exposure state of the gantry crane on the pixel frame and the exposure state of the grounding point of the gantry crane based on the position information and the attribute of the gantry crane.
Specifically, in this embodiment, as shown in fig. 1, the method includes application of a target detection model and training of the target detection model;
the application of the target detection model is as follows: the method comprises the steps of obtaining a pixel frame to be detected of a port, inputting the pixel frame to be detected into a target detection model, and obtaining a target detection result corresponding to the pixel frame to be detected, wherein the target detection result is used for representing whether a gantry crane exists in the pixel frame or not, and outputting the specific position and attribute of the gantry crane.
Training of the target detection model, comprising the following steps:
step 1: acquiring pixel frame data acquired by a port real vehicle, and manually marking a pseudo 3D contour and related attributes of a gantry crane to form a target detection training set;
step 2: randomly cutting an original pixel frame in a target detection training set, downsampling to a preset size, and normalizing to obtain a feature map; traversing each target on a pixel frame, and establishing a truth hash table about the clipped pixel frame;
step 3: constructing a deep convolutional neural network model, and adding regression branches, truncated attribute branches and left and right visible line branches of a grounding point into the deep convolutional neural network model;
step 4: training by using the deep convolutional neural network model to obtain a trained target detection model.
More specifically, in this embodiment, the focus of gantry crane identification is to identify the "foot" of the gantry crane, that is, the bottom tire and rigid structure of the gantry crane, and in this embodiment, the "foot" of the gantry crane is referred to as the foot; therefore, in the step 1, the embodiment creatively proposes the pseudo 3D labeling for the gantry crane, and the "feet" of the gantry crane are manually labeled by using a rectangle plus a trapezoid on the pixel frame data collected by the real vehicle by imagining using a cuboid box to mount the "feet" of the gantry crane; namely, the pseudo 3D outline of the embodiment is marked as a bottom frame of the gantry crane in a rectangular+trapezoidal mode; the rectangular frame selects the short side of the bottom leg of the gantry crane, and the trapezoidal frame selects the long side of the bottom leg of the gantry crane, which can be approximately understood as the vehicle side of the vehicle, as shown by the solid line in fig. 2.
In this embodiment, the relevant attribute labels include a target direction label and a target left-right visible line label;
target direction: marking eight directions, including a head exposure, a tail exposure, a left side body exposure, a right side body exposure, a head and left side body exposure, a tail and left side body exposure, a head and right side body exposure, a tail and right side body exposure;
visible lines on the left and right of the target: imagine a line moving from the left and right sides of the object to the middle, stopping at the left and right boundaries where the object is first not occluded, as shown by the thick line in fig. 3, and the thin line in fig. 3 is a pseudo 3D frame.
In this embodiment, at least 10000 pictures shot with gantry cranes form a picture dataset, the gantry cranes in each picture in the picture dataset are marked, and coordinates and attributes of pseudo 3D contours are determined to form a target detection training set.
In this embodiment, in the step 2, each object on the pixel frame is traversed, and a truth hash table is created for the clipped pixel frame, as shown in fig. 4, and the method includes the following steps:
s21: traversing each target on the pixel frame, and mapping the original coordinate value of each target according to the downsampling proportion to obtain a center point coordinate, a grounding point coordinate and left and right visible line coordinates of the target after cutting downsampling;
s22: the grounding point of the gantry crane is encoded into a coordinate characteristic value and a semantic characteristic value, whether the grounding point is in a cutting area or not is judged, and the coordinate characteristic value and the semantic characteristic value of the grounding point which are not in the cutting area are updated;
s23: the coordinate characteristic value and the semantic characteristic value of the grounding point in the cutting area are reserved, the left visible line coordinate and the right visible line coordinate are updated according to the coordinate characteristic value after the grounding point is updated, the maximum circumscribed rectangle of the pseudo 3D labeling rectangular frame and the trapezoid frame is calculated to be used as a 2D boundary frame, and the coordinates of the left upper corner point and the right lower corner point of the 2D boundary frame are obtained as a dotted line frame shown in fig. 2;
s24: coding the truncation attribute of the target according to a truncation classification algorithm;
s25: establishing a true value hash table for the clipped pixel frame, wherein the true value hash table comprises information: target center point coordinates, 2D bounding box corner coordinates, pseudo 3D ground point coordinates observability, left and right visible line coordinates, truncation properties.
Specifically, in this embodiment, the above-mentioned ground point coding algorithm is as follows:
the "ground point" in this embodiment refers to the corner point where the object contacts the ground, and is denoted by A, B, C, D, and represents a front left ground point, a rear left ground point, a front right ground point, and a rear right ground point, respectively, as shown in fig. 5; in the step S22, the coordinate feature value of the grounding point is the pixel coordinate of the point in the image coordinate system, and the semantic feature value of the grounding point is whether the point can be observed or not when the point is observed from the camera view angle under the condition that the point is not blocked; the observable code is 1 and the unobservable code is 0.
In general, there are two points observed at the same time at four corner points of the surface where the target contacts with the ground, or three points are observed, and specifically, which point is observed to be combined with the marked target direction to perform coding, as shown in fig. 5, the rectangle of the gantry crane surface facing the staff is the head, the direction mark is "head and left vehicle body are exposed", the semantic feature value of the corresponding grounding point is that point a, point B, point C can be observed (code is 1), and point D cannot be observed (code is 0). As shown in fig. 6, the direction of the target and the observability of the target ground point are in a one-to-one mapping relationship.
In this embodiment, the above truncation classification algorithm is as follows:
the cut-off attribute refers to a cut-off state of the target on the pixel frame, and is mainly determined according to whether the target is exposed in one direction or two directions, wherein the one direction is exposed through a rectangular frame to represent the target, the two directions are exposed through a rectangular+trapezoidal frame to represent the target, and then whether the cut-off attribute is encoded in the pixel frame after cutting through the grounding point, as shown in fig. 7, specifically, in the step S24, the cut-off attribute is the cut-off state of the target on the pixel frame, and the cut-off states of the representative five semantic features are respectively encoded as 0, 1, 2, 3 and 4 semantic features:
0) The target is complete, and the edges of the pixel frames which are not cut are cut off;
1) The target is cut off, only the front face is exposed, and the rectangle is exposed;
2) The object is cut off, only the side face is exposed, and the trapezoid is exposed;
3) The target is cut off, the front face and the side face are exposed, the grounding point of the boundary edge is in the cut pixel frame, and the rectangle and the trapezoid are exposed;
4) The object is cut off, the front face and the side face are exposed, the grounding point of the boundary edge is in the cut pixel frame, and the rectangle and the trapezoid are exposed.
In the embodiment, in the step 3, the deep convolutional neural network model is based on the idea of a central net network, and uses a one-stage and anchor-free network to detect, and uses a model architecture of a backbone network, a neck network and a head network to process and output input data. The backbone network is the backbone part of a convolutional neural network, and is generally composed of a plurality of convolutional layers and pooling layers, and is used for extracting high-level features from an original image; the neck network is a middle layer connecting the backbone network and the head network, and is used for adjusting the characteristics from the backbone network and outputting the characteristics to the head network so as to better adapt to task requirements; the header network is used to map the features extracted by the backbone network to the final output. The deep learning neural network structure of the present embodiment is shown in fig. 8, and the construction thereof includes the following steps:
s31: the backbone network adopts an improved VGG network and comprises five modules from bottom to top, each module adopts 2x2 pooling cores to reduce the width and height of a characteristic graph, small convolution cores stacked by 3x3 are adopted to replace convolution cores of 5x5 and 7x7 to enlarge the number of channels of the characteristic graph, the output of each convolution core is subjected to Batch Normalization (batch normalization), and then is subjected to nonlinear processing of a ReLU activation function and then is used as the input of the next module in the form of the characteristic graph;
under the condition of the same receptive field, the network structure of the VGG can reduce the calculated amount, deepen the network depth and improve the precision of the convolutional neural network.
S32: the neck network is constructed using the feature extraction structure of the FPN (Feature Pyramid Network, feature map pyramid network). Consists of a bottom-up path and a top-down path. The bottom-up path is the backbone network in step S31, which can extract the features of different layers; the top-down path is to fuse the fine features of the high layer with the coarse features of the low layer by means of up-sampling and element addition;
specific: the output feature map of the nth module is directly added to the output feature map of the N-1 th module, up-sampling is achieved through transposed convolution, and the like, from top to bottom, to all modules.
S33: the head network designs a classifier and a regressive, the classifier uses Sigmoid layer to map the input vector to the probability range of 0 to 1 for outputting whether the target belongs to the foreground class or the background class. The regressor is provided with a plurality of branching layers for different task training. The branches involved are: the central point coordinate branches are used for regressing the horizontal coordinate values of the central points of the targets; the 2D boundary frame corner point coordinate branches are used for regressing the horizontal and vertical coordinate values of the upper left corner point and the lower right corner point of the 2D boundary frame of the target; the pseudo 3D grounding point coordinate branches are used for regressing the horizontal coordinate values of four pseudo 3D grounding points of the target; the pseudo 3D grounding point observability branch is used for classifying whether four pseudo 3D grounding points of a target belong to observable or unobservable attributes; the left and right visible line coordinate branches are used for regressing the abscissa of the left visible line and the right visible line of the target; the truncated property branch is used to classify to which of the five truncated properties the truncated state of the object on the pixel frame belongs.
S34: among the branches of the head network, the branch for classification uses a cross entropy loss function, and the branch for regression coordinates uses an L1 norm loss function. The branch penalty functions are summed to a total penalty function.
L1 norm loss function, also known as minimum absolute deviation (LAD), which is a measure of [. Times. ] for a target value) And the estimated valueSum of absolute differences (+)>) Minimization:
cross entropy loss function, when predicting resultIn the above, the closer the probability of taking out the true tag class is to 100%, the closer the value of the loss function is to 0:
in the embodiment, in the step 4, training is performed by using a deep convolutional neural network to obtain a trained target detection model, which includes the following steps:
s41: the feature image extracted from the pixel frame is sent into a network, and the weight of the convolution kernel is initializedwBias and method of making samebStarting from an input layer, obtaining a final prediction result through a series of convolution layers, an activation function and a pooling layer; according to the predicted junctionCalculating the loss function of the current network according to the difference between the fruit value and the true value;
s42: and calculating the gradient of the loss function on each weight of the output layer, and then sequentially calculating the gradient of each layer forwards to obtain the gradient of each weight in the whole network. The weights are updated according to their gradient magnitude using a gradient descent method or other optimization algorithm. Through repeated iteration, the weight is continuously adjusted to reduce the loss function, so that the neural network can better fit training data, and a trained target detection model is obtained.
Through the above disclosure, the embodiments of the present invention are disclosed, when a target detection model is applied to perform reasoning, firstly, a pixel frame to be detected is obtained, the pixel frame to be detected is downsampled to a preset size, a feature map of the pixel frame to be detected is obtained, then, a designed target detection network is input, the target detection model is used to perform reasoning, a non-maximum suppression algorithm (NMS) is used to extract peaks in a heat map, a topK algorithm is used to extract confidence scores, indexes, categories and center point coordinates of K center points before confidence ranking, and the extraction of target information corresponding to the topK center points includes: 2D bounding box corner coordinates, pseudo 3D ground point observability, left and right visible line coordinates, truncation properties. All the information is combined as a return value to the model. The target detection result is used for representing whether the gantry crane exists in the pixel frame or not and outputting the specific position and attribute of the gantry crane.
According to the one-to-one mapping relation between the target direction and the observability of the target grounding point, the direction of the target can be restored by using the observability of the predicted pseudo 3D grounding point, and the method can be used for measuring the transverse and longitudinal distance between the vehicle and the key target by combining the coordinate information of the predicted pseudo 3D grounding point. The predicted cut-off attribute can be used for judging the exposure state of the target on the pixel frame, for example, the cut-off attribute of 4 means that the target is incompletely exposed in the pixel frame, and the target is close to the camera, so that the grounding point of the edge at the juncture of the front surface and the side surface is positioned outside the picture, and different strategies can be executed according to the cut-off state when planning the driving route, so that collision is avoided. The predicted left and right visible line coordinates can be used to determine the visible state of the ground point of the target on the pixel frame, and the abscissa is located at the ground point between the left and right visible line abscissas, which indicates that the point is visible, otherwise, the ground point may be blocked by a vehicle/pedestrian/obstacle nearby, so that the downstream determination of the target state planning route can be assisted.
Compared with the prior art, the detection method provided by the invention has the following advantages:
1. the detection method can judge whether the gantry crane exists in the detection image and output the specific position and attribute of the gantry crane, and has strong universality;
2. the detection method provided by the invention learns the coordinates and the observable states of the left front, the left rear, the right front and the right rear of the gantry crane, also enters network learning for the interception attribute of the gantry crane, outputs the left visible line and the right visible line of the gantry crane, and provides a brand new thought for detecting the gantry crane in a port scene.
The above embodiments are only for illustrating the technical solution of the present invention, but not for limiting, and other modifications and equivalents thereof by those skilled in the art should be included in the scope of the claims of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (10)

1. A detection method of a port gantry crane is characterized by comprising the following steps of: acquiring pixel frame data of a port, and marking a target pseudo 3D frame on the pixel frame data based on visual texture characteristics of a gantry crane;
designing a neural network model to obtain a target detection model, inputting the labeled pixel frames into the neural network model for learning, and predicting the position information and the attribute of the gantry crane;
and judging the exposure state of the gantry crane on the pixel frame and the exposure state of the gantry crane grounding point based on the position information and the attribute of the gantry crane.
2. The method for detecting the portal crane of the port according to claim 1, wherein: training a target detection model in the method comprises the following specific steps:
step 1: acquiring pixel frame data acquired by a port real vehicle, and marking a pseudo 3D contour and attribute of a gantry crane to form a target detection training set;
step 2: randomly cutting an original pixel frame in a target detection training set, downsampling to a preset size, and normalizing to obtain a feature map; traversing each target on a pixel frame, and establishing a truth hash table about the clipped pixel frame;
step 3: constructing a deep convolutional neural network model, and adding regression branches, truncated attribute branches and left and right visible line branches of a grounding point into the deep convolutional neural network model;
step 4: training by using the deep convolutional neural network model to obtain a trained target detection model.
3. The method for detecting the portal crane of the port according to claim 2, wherein: in the step 1, the bottom frame of the gantry crane is marked by using a rectangular and trapezoidal mode of the pseudo 3D outline; the rectangular frame selects the short side of the bottom leg of the gantry crane, and the trapezoidal frame selects the long side of the bottom leg of the gantry crane.
4. The method for detecting the portal crane of the port according to claim 2, wherein: in the step 1, the attribute labels comprise a target direction label and a target left-right visible line label; the target direction comprises a head exposure, a tail exposure, a left side vehicle body exposure, a right side vehicle body exposure, a head and left side vehicle body exposure, a tail and left side vehicle body exposure, a head and right side vehicle body exposure, a tail and right side vehicle body exposure, and a target left and right visible line is a left and right boundary line of a target non-shielded part.
5. A method for detecting a port gantry crane according to claim 3, wherein: in the step 2, each object on the pixel frame is traversed, and a truth hash table is established for the clipped pixel frame, and the method further includes the following steps:
s21: traversing each target on the pixel frame, and mapping the original coordinate value of each target according to the downsampling proportion to obtain a center point coordinate, a grounding point coordinate and left and right visible line coordinates of the target after cutting downsampling;
s22: the grounding point of the gantry crane is encoded into a coordinate characteristic value and a semantic characteristic value, whether the grounding point is in a cutting area or not is judged, and the coordinate characteristic value and the semantic characteristic value of the grounding point which are not in the cutting area are updated;
s23: the coordinate characteristic value and the semantic characteristic value of the grounding point in the cutting area are reserved, the left visible line coordinate and the right visible line coordinate are updated according to the coordinate characteristic value after the grounding point is updated, the maximum circumscribed rectangle of the pseudo 3D labeling rectangular frame and the trapezoid frame is calculated to be used as a 2D boundary frame, and the left upper corner point and the right lower corner point coordinates of the 2D boundary frame are obtained;
s24: coding the truncation attribute of the target according to a truncation classification algorithm;
s25: establishing a true value hash table for the clipped pixel frame, wherein the true value hash table comprises information: target center point coordinates, 2D bounding box corner coordinates, pseudo 3D ground point coordinates observability, left and right visible line coordinates, truncation properties.
6. The method for detecting the portal crane of the port according to claim 5, wherein: the grounding point is an angular point of which the target contacts with the ground and comprises a left front grounding point, a left rear grounding point, a right front grounding point and a right rear grounding point;
in the step S22, the coordinate feature value of the grounding point is the pixel coordinate of the point in the image coordinate system, the semantic feature value of the grounding point is whether the point can be observed or not when the point is observed from the view angle of the camera under the condition that the point is not shielded; the observable code is 1 and the unobservable code is 0.
7. The method for detecting the portal crane of the port according to claim 5, wherein: the cut-off classification algorithm judges whether the targets are exposed in one direction or two directions, wherein the targets are represented by rectangular frames in one direction, the targets are represented by rectangular and trapezoidal frames in two directions, and then whether the grounding points are in the cut-off pixel frames or not is judged.
8. The method for detecting the portal crane of the port according to claim 5, wherein: in the step S24, the truncated attribute is a truncated state of the target on the pixel frame, and the truncated states are respectively encoded into five semantic features of 0, 1, 2, 3 and 4, where the truncated states of the five semantic features are respectively:
0) The target is complete, and the edges of the pixel frames which are not cut are cut off;
1) The target is cut off, only the front face is exposed, and the rectangle is exposed;
2) The object is cut off, only the side face is exposed, and the trapezoid is exposed;
3) The target is cut off, the front face and the side face are exposed, the grounding point of the boundary edge is in the cut pixel frame, and the rectangle and the trapezoid are exposed;
4) The object is cut off, the front face and the side face are exposed, the grounding point of the boundary edge is in the cut pixel frame, and the rectangle and the trapezoid are exposed.
9. The method for detecting the portal crane of the port according to claim 2, wherein: in the step 3, a deep convolutional neural network model is constructed, which comprises the following steps:
s31: the backbone network adopts an improved VGG network, and comprises five modules from bottom to top, each module uses 2x2 pooling cores, small convolution cores of 3x3 are stacked to replace convolution cores of 5x5 and 7x7, and the output of each convolution core is subjected to batch normalization and then subjected to nonlinear processing by a ReLU activation function;
s32: the neck network adopts a pyramid network structure of the FPN characteristic map, and the fine characteristics of the high layer and the coarse characteristics of the low layer are fused in an up-sampling and element addition mode;
s33: the head network is provided with a classifier and a regressive device, wherein the classifier uses a Sigmoid layer for outputting whether a target belongs to a foreground class or a background class; the regressor is provided with a plurality of branch layers which are respectively used for different task training;
s34: branches for classification among the branches of the head network use cross entropy loss functions, branches for regression coordinates use L1 norm loss functions, and the branch loss functions are added up to a total loss function.
10. The method for detecting the portal crane of the port according to claim 2, wherein: in the step 4, training is performed by using a deep convolutional neural network to obtain a trained target detection model, and the method comprises the following steps:
s41: the feature map extracted from the pixel frame is sent into a network, the weight and bias of a convolution kernel are initialized, a final prediction result is obtained from an input layer through a series of convolution layers, an activation function and a pooling layer, and a loss function of the current network is calculated according to the difference between the prediction result and a true value;
s42: calculating the gradient of the loss function on each weight of the output layer, sequentially calculating the gradient of each layer forwards to obtain the gradient of each weight in the whole network, and updating the weights according to the gradient of the weights by using a gradient descent method or other optimization algorithms; and continuously adjusting the weight through multiple iterations to obtain a trained target detection model.
CN202311708848.5A 2023-12-13 2023-12-13 Port gantry crane detection method Pending CN117726793A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311708848.5A CN117726793A (en) 2023-12-13 2023-12-13 Port gantry crane detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311708848.5A CN117726793A (en) 2023-12-13 2023-12-13 Port gantry crane detection method

Publications (1)

Publication Number Publication Date
CN117726793A true CN117726793A (en) 2024-03-19

Family

ID=90199268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311708848.5A Pending CN117726793A (en) 2023-12-13 2023-12-13 Port gantry crane detection method

Country Status (1)

Country Link
CN (1) CN117726793A (en)

Similar Documents

Publication Publication Date Title
CN110175576B (en) Driving vehicle visual detection method combining laser point cloud data
CN108171112B (en) Vehicle identification and tracking method based on convolutional neural network
Possatti et al. Traffic light recognition using deep learning and prior maps for autonomous cars
Ohgushi et al. Road obstacle detection method based on an autoencoder with semantic segmentation
JP5822255B2 (en) Object identification device and program
Siogkas et al. Traffic lights detection in adverse conditions using color, symmetry and spatiotemporal information
Fernández et al. Road curb and lanes detection for autonomous driving on urban scenarios
CN112825192B (en) Object identification system and method based on machine learning
WO2015147764A1 (en) A method for vehicle recognition, measurement of relative speed and distance with a single camera
CN108875754B (en) Vehicle re-identification method based on multi-depth feature fusion network
CN115049700A (en) Target detection method and device
Van Pham et al. Front-view car detection and counting with occlusion in dense traffic flow
CN113409252B (en) Obstacle detection method for overhead transmission line inspection robot
CN117058646B (en) Complex road target detection method based on multi-mode fusion aerial view
Doval et al. Traffic sign detection and 3D localization via deep convolutional neural networks and stereo vision
Dewangan et al. Towards the design of vision-based intelligent vehicle system: methodologies and challenges
Huu et al. Proposing Lane and Obstacle Detection Algorithm Using YOLO to Control Self‐Driving Cars on Advanced Networks
Joy et al. Real time road lane detection using computer vision techniques in python
CN114648549A (en) Traffic scene target detection and positioning method fusing vision and laser radar
Arthi et al. Object detection of autonomous vehicles under adverse weather conditions
CN114118247A (en) Anchor-frame-free 3D target detection method based on multi-sensor fusion
Omar et al. Detection and localization of traffic lights using YOLOv3 and Stereo Vision
Gökçe et al. Recognition of dynamic objects from UGVs using Interconnected Neuralnetwork-based Computer Vision system
Behera et al. A novel method for day time pedestrian detection
Li et al. Pole-like street furniture decompostion in mobile laser scanning data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination