CN116109047A - Intelligent scheduling method based on three-dimensional intelligent detection - Google Patents

Intelligent scheduling method based on three-dimensional intelligent detection Download PDF

Info

Publication number
CN116109047A
CN116109047A CN202211153269.4A CN202211153269A CN116109047A CN 116109047 A CN116109047 A CN 116109047A CN 202211153269 A CN202211153269 A CN 202211153269A CN 116109047 A CN116109047 A CN 116109047A
Authority
CN
China
Prior art keywords
target
detection
frame
network
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211153269.4A
Other languages
Chinese (zh)
Inventor
赵可昕
秦奕
梁俊玮
陈泽明
梁华岳
董博雅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202211153269.4A priority Critical patent/CN116109047A/en
Publication of CN116109047A publication Critical patent/CN116109047A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The invention discloses an intelligent scheduling method based on three-dimensional intelligent detection, which comprises the following steps of sending a camera frame into a target detection algorithm to detect a target object; the detection result is added into a target tracking list after being compared; then, sending the tracked object into a target classification network for classification, and simultaneously, reconstructing three-dimensional position information by using a depth map; and sending the classified position information and the priori information into a decision module for intelligent decision, and displaying and controlling a lower computer through a client. The intelligent scheduling platform is improved by the deep neural network and the laser radar, and the position, the category and the environmental information of the monitored range can be effectively utilized under the condition that environmental equipment is not additionally added, so that controlled objects in the range can be accurately, efficiently and real-time scheduled.

Description

Intelligent scheduling method based on three-dimensional intelligent detection
Technical Field
The invention belongs to the field of intelligent scheduling, and relates to an intelligent scheduling method based on three-dimensional intelligent detection.
Background
Along with the continuous promotion of logistics industry, transportation industry and service industry, how to effectively control and dispatch a plurality of unmanned equipment in a scene, such as logistics vehicles, service robots and the like, an intelligent dispatching platform is established, so that the intelligent dispatching platform has the capability of effectively treating various emergency conditions while efficiently running, and becomes a problem to be solved urgently. In the field of intelligent scheduling, although some solutions have been presented, the proposed solutions generally require additional large-scale installation or modification of sites or controlled objects, which is costly.
In the existing warehouse navigation intelligent vehicle dispatching method (CN 201911046869.9) based on global vision, an OpenCV open source vision library is used for extracting characteristic patterns of AGV trolleys, wherein the method comprises binarizing by using HSV images and detecting contours by using a Canny edge detection algorithm. This approach is susceptible to environmental impact and system robustness is not ideal. The method uses the target detection network to identify and classify the targets, has strong adaptability, can be suitable for various environments and has good migration capability. And meanwhile, the laser radar is used for positioning, so that the precision is better, and the method can be suitable for detection and positioning tasks of various targets. In the target fusion method and system (CN202111490323. X) based on multiple cameras and laser radars, multi-camera data are utilized to detect targets, laser radar point clouds are utilized to perform three-dimensional clustering, and then information of the two are fused and matched. The method uses three-dimensional clustering, takes longer time, and relates to the problem of processing the ground point cloud information, and the clustering result is greatly influenced.
Disclosure of Invention
Aiming at the problems, the invention fuses the depth sensor, the light-weight deep learning target detection, the target classification algorithm and the intelligent decision method, and performs modularized design on the scheduling system, so that the state sensing, the intelligent decision and the real-time scheduling are realized under the condition that the site and the controlled object are not additionally modified in a large scale. The invention sends the camera frame into a target detection algorithm to detect a target object; then, the detection result is compared and then added into a target tracking list; then, sending the tracked object into a target classification network for classification, and simultaneously, reconstructing three-dimensional position information by using a depth map; and then, sending the classified position information and the priori information into a decision module for intelligent decision, and displaying and controlling a lower computer through a client. The intelligent scheduling platform is improved by the deep neural network and the laser radar, and the position, the category and the environmental information of the monitored range can be effectively utilized under the condition that environmental equipment is not additionally added, so that controlled objects in the range can be accurately, efficiently and real-time scheduled.
The invention is realized at least by one of the following technical schemes.
An intelligent scheduling method based on three-dimensional intelligent detection comprises the following steps:
1) The automatic labeling module of the data set acquires the current frame and sends the current frame to the target detection network for detection;
2) Tracking a detection result of the target detection network by using a target tracking network, and outputting a tracking result;
3) Performing target classification on the tracking result by using a target classification network, inputting a target frame and a depth map into a three-dimensional mapping module to reconstruct a three-dimensional position, and outputting three-dimensional coordinates under a world coordinate system;
4) The three-dimensional coordinates, the classification result and the environmental information of the tracked object are input into an intelligent decision module together, intelligent decision is carried out by using a behavior tree, the decision result is displayed on a client, and control and scheduling are carried out to a lower computer through serial communication.
Further, the controlled object position detected by the target detection network is sent to the target tracking network after the filtering algorithm, the target tracking network performs feature extraction modeling on the tracked object in each frame to obtain inter-frame correlation measurement, and the next frame position of the tracked object is determined according to the inter-frame correlation, so that output of a target tracking module result is obtained.
Further, inputting the result detected by the target detection network into the target tracking network, initializing N trackers according to the result if the result is the first entry into the target tracking network, wherein the number N of trackers is the same as the number of detection frames detected by the target detection network, if the result is not the first entry into the target tracking module, the rest trackers use a Kalman filter to predict by using the tracking frames, and calculate and match the contact ratio with the input detection frames in pairs to obtain three conditions: the partial detection frames are successfully matched with the tracking frames, the partial detection frames cannot be matched, and the partial tracking frames cannot be matched; if the detection frame is successfully matched with the tracking frame, updating a Kalman filter of the corresponding tracker by using the detection frame; if the tracking frame is not matched with the detection frame, updating a Kalman filter of the tracking frame by using the tracking frame, and adding one to the self-updating times of the tracker; if the detection frame cannot be successfully matched, initializing a tracker by using the detection frame;
mixing all trackers together, detecting whether the mixed existence frame number exceeds a threshold value, deleting the trackers if the existence frame number exceeds the threshold value, and if the existence frame number does not exceed the threshold value, carrying out the following processing:
firstly detecting whether the coincidence degree of tracking frames is higher than a threshold value, if so, the reset mark positions of the two trackers are true, and then detecting whether the continuous hit times of all trackers exceeds the threshold value, if not, the reset mark positions of the trackers are true; detecting whether the reset times of the tracker exceeds a threshold value or not, if so, setting the reset times to zero, and meanwhile, setting a reset mark position of the tracker to be true; finally, detecting whether the confidence coefficient of the target exceeds a threshold value, if not, the reset mark position of the tracker is true; the tracker with the true reset flag bit inputs the image information in the tracking frame into the deep neural network for classification.
Further, the target classification network uses an EfficientNet feature extraction network as a feature extraction backbone network, an image output by a target tracking network is input into the feature extraction backbone network, then a feature extraction backbone network feature extraction result is input into a full-connection layer and a regression layer, class probability corresponding to the image is obtained, class with the maximum probability is selected to be assigned as an image class attribute for target classification, the ID obtained by the classification is compared with the ID obtained by the last classification, if the ID is the same, the continuous classification times are increased by one, otherwise, the continuous classification times are set to zero, meanwhile, the target confidence is reset to the confidence of the classification result, a reset timer is set to zero, then the task of a target tracking module is completed, and all the trackers which are not deleted still remain to the next tracking task.
Further, the three-dimensional position reconstruction of the three-dimensional mapping module comprises the following steps:
0) The following coordinates are defined:
laser radar coordinate system: taking the midpoint of the bottom end of the laser radar as an origin, pointing the x-axis to the front of the laser radar, vertically upwards pointing the z-axis, and horizontally leftwards facing the front of the y-axis;
camera coordinate system: the camera optical center is taken as an origin and faces forward, at the moment, the x-axis is horizontally right, the y-axis is vertically directed to the ground, and the z-axis is parallel to the optical axis and is directed forward;
pixel coordinate system: taking the upper left corner of the image as an origin, horizontally rightward on the x-axis and vertically downward on the y-axis;
1) Starting a thread for being responsible for receiving data from the laser radar;
2) Converting the laser radar data into a format under a Cartesian coordinate system;
Figure SMS_1
/>
Figure SMS_2
Figure SMS_3
Figure SMS_4
Figure SMS_5
Figure SMS_6
wherein:
the measurement ID is the label ID of the data packet;
scan width is a value of horizontal resolution;
beam_angles are the elevation angle of each laser beam;
beam_azimuth_angles is the azimuth angle of each laser beam;
θ encoder the rotation angle of the built-in encoder of the laser radar;
θ azimuth azimuth angle of laser radar beam;
Figure SMS_7
is the height angle of the laser radar beam;
r is range_mm, which is the sum of the mode of the distance vector from the laser radar origin coordinate system center to the laser radar front-end optics and the mode of the distance vector from the laser radar front-end optics to the detection object;
n is lidar_origin_to_beam_origin_mm, and is the size of a distance vector from the center of the laser radar origin coordinate system to the laser radar front-end optics;
x, y and z are Cartesian coordinates of the point cloud;
3) Projecting point cloud data acquired from a laser radar onto an image to obtain a depth map:
Figure SMS_8
the depth map is a single-channel picture with the same resolution as the camera picture, and each pixel is filled with a corresponding Zc value; (X) l 、X l 、Z l ) Is the coordinate of the point cloud under the laser radar coordinate system, u and v are the pixel coordinates of the corresponding pixel point in the image, zc is the Z-axis value of the corresponding point cloud under the camera coordinate system, M 1 Is an internal reference matrix of the camera, M 2 Is an external reference matrix of a laser radar camera, M 1 And M is as follows 2 Obtained using joint calibration;
4) According to the target frame acquired by target identification, processing depth information in the target frame by using a k clustering method:
firstly traversing all depth values range in a target frame, and obtaining an average value range_mean of all depth information; initializing three clusters, and taking three values of [ range_mean-area_thresh, range_mean, range_mean+area_thresh ] as representative values of the three clusters;
then traversing all depth values range in the target frame again, and calculating the absolute value of each depth value and the representative values of three clusters, comparing the magnitudes of the three absolute values, and attributing the depth value of the point to a cluster with a small absolute value; after traversing a round, calculating the average value of depth values under the three clusters, taking the depth value as a new representative value of the three clusters, and carrying out a new round of circulation again; stopping iteration until the absolute value between the new representative value and the old representative value is smaller than stop_num, or stopping iteration when the iteration times exceed iter_num; at this time, the representative values of the three clusters are rearranged, and the representative value with the smallest value is the front depth representative value range front And the number of depth values subordinate to the cluster is front num The maximum value is considered as the background depth representative value range background And the number of depth values subordinate to the cluster is back group num While a medium value is considered as the range of the medium depth representative value middle And the number of depth values under the cluster is middle num Then calculate the depth representative value z of the target represent
Further, a depth representative value z of the target is calculated represent There are two calculation methods:
1. if the device is looking up or looking up at the object, fetch range middle Is z represent
2. If the device is a top-down target, z represent =range front *front num +range middle *middle num +range background *background num
5) Acquiring coordinates of a target in a world coordinate system:
Figure SMS_9
wherein (X) w 、Y w 、Z w ) Is the coordinate of the point cloud in the world coordinate system, Z represent A depth representative value representing a target; a and b are the midpoints of the target detection frame; m is M 1 Is an internal reference matrix of the camera; m is M 3 Is an external reference matrix from a world coordinate system to a camera coordinate system, and is obtained through PNP algorithm and manual calibration.
Further, the target detection network inputs the collected image frames into the feature extraction backbone network, then inputs the feature images with different scales extracted from the backbone network into the detection neck and the decoupled detection head, and finally, the detection head carries out regression to obtain the rectangular frame azimuth and the corresponding category probability of the object.
Further, the target classification network adopts a localization deployment method of a deep learning neural network, an image obtained by cutting through combined detection of target detection and target tracking is input into a feature extraction backbone network, a feature extraction result of the feature extraction backbone network is input into a full-connection layer and a regression layer, category probability corresponding to the image is obtained, and a category with the highest probability is selected to be assigned as an image category attribute.
Further, the intelligent decision module customizes an intelligent scheduling logic scheme before starting the system according to specific intelligent scheduling tasks and performance requirements, and realizes the intelligent scheduling logic scheme by using a behavior tree scheme; and (3) inputting the position, category information and other affiliated sensor information of the controlled target into a behavior tree by adopting a behavior tree algorithm, sensing the situation of the field and selecting strategies according to a pre-constructed behavior tree flow logic, and finally outputting an intelligent decision result through an action node in the behavior tree to perform front-end display or lower computer scheduling.
Further, the automatic data set labeling module comprises the steps of detecting the image sequence of the controlled object required by a specific task, labeling an initialized data set, sending the initialized data set into a target detection network for training to obtain a preliminary weight, further collecting images to obtain an expanded image sequence, loading the preliminary weight into the target detection network, reasoning the expanded image sequence by using the weight to obtain a preselected position of a target object in the expanded image sequence, manually confirming and finely adjusting the preselected position in the target object to obtain an expanded data set, training the used target detection network by using the expanded data set, and circularly reciprocating the process.
Compared with the prior art, the invention has the beneficial effects that:
according to the control system, the service function is abstracted into seven modules according to the characteristics of an application scene, the modules are coupled with each other in a low mode, the service function can be flexibly customized according to different requirements of a use scene of a scheduling platform, an intelligent decision module can carry out customization processing according to different tasks and requirements, and the maximum response to specific scheduling requirements can be realized; the invention combines the full-field depth distance detection sensor with the image sensor, the controlled object detection and positioning process is efficient and stable, the positioning progress required by the dispatching task can be realized sufficiently under the condition that the controlled object and the site are not additionally provided with redundant position sensors and other devices, the deployment cost is reduced, and the performance is improved compared with the current intelligent dispatching platform; the intelligent data set labeling process is adopted, so that the dispatching platform can be continuously self-optimized in the actual deployment dispatching process, the precision is continuously improved, and the self-learning function is realized.
Drawings
FIG. 1 is a schematic diagram of a control flow of an intelligent scheduling platform based on three-dimensional intelligent detection in an embodiment of the invention;
FIG. 2 is a schematic diagram of an exemplary target detection architecture according to the present invention;
FIG. 3 is a schematic diagram of an object classification architecture according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a target tracking architecture according to an embodiment of the present invention;
FIG. 5 is a schematic representation of the laser radar detection depth according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an intelligent decision logic according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an intelligent labeling platform for a dataset according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a result displayed at the front end of an intelligent scheduling platform based on three-dimensional intelligent detection according to an embodiment of the present invention;
fig. 9 is a schematic diagram of a front end display of an intelligent scheduling platform based on three-dimensional intelligent detection according to an embodiment of the present invention.
Detailed Description
The invention is further illustrated by the following examples and the accompanying drawings.
Example 1
The intelligent scheduling platform based on the three-dimensional intelligent detection comprises a target detection module, a target tracking module, a target classification module, a three-dimensional mapping and intelligent scheduling module, a data set automatic labeling module, a user interface display module and the like. The target detection module adopts a target detection network to detect and inputs the detection result into a target tracking module, the target tracking module adopts the target tracking network to track the detection result of the target detection network, the output tracking result is input into a target classification module, the target classification module is used for classifying the target of the tracking result, meanwhile, the target frame and the depth map are input into a three-dimensional mapping module to reconstruct the three-dimensional position, three-dimensional coordinates under a world coordinate system are output, the three-dimensional coordinates of a tracked object, the classification result and environmental information are input into an intelligent decision module together, intelligent decision is performed by using a behavior tree, the decision result is displayed on a client, and control scheduling is performed to a lower computer through serial port communication.
The invention provides an intelligent scheduling method based on three-dimensional intelligent detection, which is shown in fig. 1, and specifically comprises the following steps:
1) The current frame is obtained from the camera and sent to the target detection module for detection. Wherein the target detection module adopts a target detection algorithm. The target detection algorithm used in this embodiment is YOLOX, and the target detection network uses YOLOX-s network.
This YOLOX-s is divided into three parts: the feature extraction backbone network, the detection neck and the decoupling detection head. Wherein the feature extraction backbone network is consistent with the feature extraction network of YOLOX 5-s, which is a residual pile-up of a convolution layer, a batch normalization layer and an activation function layer, and YOLOX-s converts the activation function into a SiLU. And the detection neck uses the structure of the feature pyramid to perform feature fusion. And finally, three decoupling heads are used for respectively outputting the category prediction of the target frame, the position information of the target frame and the front background judgment of the target.
And inputting the collected image frames into a feature extraction backbone network, inputting the feature images with different scales extracted from the backbone network into a detection neck and a decoupled detection head, and finally carrying out regression by the detection head to obtain the possible rectangular frame azimuth and the corresponding category probability of the object.
2) And inputting the target detection result of the target detection module into the target tracking module. If the target tracking module is first entered, initializing N trackers according to the result, wherein the number N of the trackers is the same as the number of input (namely, the detection frames detected by the target detection network), and the trackers comprise a Kalman filter (taking the pixel coordinates of the center point of the detection frame and the area and the length-width ratio of the detection frame as state quantities), an ID number (default to be-1 during initialization), a frame number (age) exists, a reset timer reset_count, a continuous hit number hit_count, a self-update number reset_count, a reset flag bit reset_flag, a target confidence percentage prob and a continuous classification number cluster.
If the target tracking module is not entered for the first time, the left tracker uses a Kalman filter to predict a tracking frame and calculates the coincidence ratio with the input detection frame in pairs and matches the coincidence ratio. The degree of overlap is calculated here using the hungarian algorithm and measured on the basis of the IOU of the two boxes (i.e. the two matches with high degree of overlap are selected). Three cases were then obtained: the partial detection frames are successfully matched with the tracking frames, the partial detection frames cannot be matched, and the partial tracking frames cannot be matched. If the detection frame is successfully matched with the tracking frame, updating a Kalman filter of the corresponding tracker by using the detection frame; if the tracking frame is not matched with the detection frame, updating a Kalman filter of the tracking frame by using the tracking frame, and adding one to the self-updating times of the tracker; if the detection frame cannot be successfully matched, initializing a tracker by using the detection frame. All trackers are then mixed together, and whether their number of existing frames exceeds a threshold is detected, and if so, the tracker is deleted, and if not, the following process is performed. It is first detected whether the trace frame overlap is above a threshold (using the IOU metric), if so, the reset flag positions of both trackers are true. And detecting whether the continuous hit times of all trackers exceed a threshold value, and if the continuous hit times of all trackers do not exceed the threshold value, resetting the marker position of the trackers. And detecting whether the reset times of the tracker exceeds a threshold value, if so, setting the reset times to zero, and meanwhile, setting the reset mark position of the tracker to be true. And finally, detecting whether the confidence coefficient of the target exceeds a threshold value, and if not, determining that the reset mark position of the tracker is true. In the series of determinations, if the tracker has a need to reset the flag to be true once, the next detection determination is no longer needed. But can survive a series of decisions (reset flag not double true), its tracker continues to sort times and reset timer plus one. And the tracker with the true reset flag bit inputs the image information in the tracking frame into the target classification module.
3) The target classification network uses the deep neural network to classify the tracking result, as shown in fig. 3, the image obtained by cutting the combined detection of the target detection and the target tracking is input into a feature extraction backbone network (EfficientNet feature extraction network), the feature extraction backbone network feature extraction result is input into a full connection layer and a regression layer, the class probability corresponding to the image is obtained, the class with the highest probability is selected and assigned as the image class attribute to classify the target, and the class is vehicle No. 1-5. After classification, comparing the ID obtained by the classification with the ID obtained by the previous classification, if the ID is the same as the ID obtained by the previous classification, adding one to the continuous classification times, otherwise, setting zero to the continuous classification times. And simultaneously resetting the target confidence as the confidence of the classification result and resetting the reset timer to zero. And then completing the task of the target tracking module, and reserving all the trackers which are not deleted yet to the next tracking task.
Inputting a target tracking result and a depth map into a three-dimensional mapping module to reconstruct a three-dimensional position, wherein the three-dimensional position comprises the following steps:
s0) defining a coordinate system:
laser radar coordinate system: taking the midpoint of the bottom end of the laser radar as an origin, pointing the x-axis to the front of the laser radar, vertically upwards pointing the z-axis, and horizontally leftwards facing the front of the y-axis;
camera coordinate system: the camera optical center is taken as an origin and faces forward, at the moment, the x-axis is horizontally right, the y-axis is vertically directed to the ground, and the z-axis is parallel to the optical axis and is directed forward;
pixel coordinate system: with the upper left corner of the image as the origin, the x-axis is horizontally to the right and the y-axis is vertically downward.
S1) starting a thread, and specially taking charge of receiving data from a laser radar;
s2) converting the laser radar data into a format in a Cartesian coordinate system according to the following formula
Figure SMS_10
Figure SMS_11
Figure SMS_12
Figure SMS_13
Figure SMS_14
Figure SMS_15
Wherein, measurementID is the label ID of the data packet.
scan width Is the value of the horizontal resolution.
beam_angles are the height angle of each laser beam.
beam _ azimuth _ angles is the azimuth angle of each laser beam.
θ encoder Rotation angle of the built-in encoder of the laser radar.
θ azimuth Azimuth angle of laser radar beam.
Figure SMS_16
Is the elevation angle of the laser radar beam.
r is range_mm, which is the sum of the mode of the distance vector from the laser radar origin coordinate system center to the laser radar front-end optics and the mode of the distance vector from the laser radar front-end optics to the detection object.
n is lidar_origin_to_beam_origin_mm, and is the magnitude of the distance vector from the center of the lidar origin coordinate system to the lidar front-end optics.
x, y, z are cartesian coordinates of the point cloud.
S3) according to the formula:
Figure SMS_17
and projecting the point cloud data acquired from the laser radar onto an image to obtain a depth map. The effect is shown in fig. 5. Each pixel is filled with a corresponding Zc value; the subscript in the above formula is the coordinate of the point cloud under the laser radar coordinate system, u and v are the pixel coordinates of the corresponding pixel point in the image, zc is the Z-axis value of the corresponding point cloud under the camera coordinate system, M 1 Is an internal reference matrix of the camera, M 2 Is an external parameter matrix of the laser radar reaching the camera. M is M 1 And M is as follows 2 Obtained using joint calibration;
s4) processing depth information in the target frame by using a k clustering method according to the target frame acquired by target identification: firstly, traversing all depth information range in a target frame, and obtaining an average value range_mean of all depth information. Three clusters are initialized and are set up in [ range_mean-area_thresh, range_mean, range_mean+area_thresh]The three values are representative of the three clusters. Then traversing all depth values range in the target frame again, and attributing the depth value of the point to a cluster with small absolute value by calculating the absolute value of each depth value and the representative value of the three clusters and comparing the magnitudes of the three absolute values. After one round of traversing, the average value of the depth values under the three clusters is obtained, the depth value is used as a new representative value of the three clusters, and a new round of circulation is performed again. Until the absolute value between the new representative value and the old representative value is smaller than stop_num, or the iteration times exceed iter_num, the iteration is stopped. At this time, the representative values of the three clusters are rearranged, and the representative value with the smallest value is the front depth representative value range front And the number of depth values subordinate to the cluster is front num . The maximum value is regarded as the background depth representative value range background And the number of depth values subordinate to the cluster is back group num . While a medium value is regarded as the range of the medium depth representative value middle And the number of depth values under the cluster is middle num . Then calculate the depth representative value z of the target represent There are two calculation methods:
if the device is looking up or looking up at the target, fetch range middle Is z represent
If the device is a top-down target, z represent =(range front *front num +range middle *middle num +range background *background num )/(front num +middle num +background num ). Wherein the above-mentioned area_thresh is 1.5, stop_num is 0.01, iter_num is 30, and the three values can be adjusted according to practical situations.
S5) obtaining coordinates of the target in a world coordinate system:
Figure SMS_18
where w is the coordinates of the point cloud in the world coordinate system. Z is Z represent The method is obtained after the advanced treatment through the steps; u and v are the midpoints of the target detection box; m is M 1 Is an internal reference matrix of the camera; m is M 3 Is an external reference matrix from a world coordinate system to a camera coordinate system, and is obtained through PNP algorithm and manual calibration.
After the steps, outputting three-dimensional coordinates in a world coordinate system;
4) And the three-dimensional coordinates of the tracked objects and external environment information such as classification, object electric quantity, occupied condition of field facilities, task time and the like are input into an intelligent decision module together. Inputting the position and type information of a controlled target and other affiliated sensor information of a receiving site into a behavior tree by using an intelligent scheduling logic scheme customized before a system is started according to specific intelligent scheduling tasks and performance requirements, sensing the site situation and selecting strategies according to pre-constructed behavior tree flow logic, and finally, performing front-end display or lower computer scheduling by using an intelligent decision result output system through action nodes in the behavior tree, wherein an intelligent decision logic example is shown in fig. 6;
wherein, intelligent decision logic is: constructing a main decision flow loop under the root node; under the main decision cycle, a series of decision conditions such as whether the base power is sufficient, whether the visitor exists, whether the vehicle exists in the key area or not and the like are respectively set, if the decision conditions are met, corresponding actions are executed, and the traversal is continued; if the decision condition is not met, continuing to execute the method downwards; jumping into a secondary decision cycle in a task execution period; respectively setting decision judgment conditions such as whether the electric quantity of the vehicle is sufficient or not, whether an object taking instruction exists or not, if the conditions are met, executing corresponding decisions, and if the conditions are not met, continuing to execute the condition cycle;
5) Information in the system such as the electric quantity of the object, the occupied condition of the field facility, the task time and the like and the intelligent decision result are screened and displayed on the client, and control and scheduling are carried out to the lower computer through serial communication, so that the effect is shown in fig. 8 and 9.
And for the automatic labeling module of the data set, acquiring an image sequence of the controlled object required by a specific task, carrying out labeling initialization data set, and sending the initialization data set into a target detection network for training to obtain a preliminary weight. And further acquiring images to obtain an extended image sequence. Loading the preliminary weight into a target detection network, and reasoning the extended image sequence by using the weight to obtain the preselected position of the target object in the extended image sequence. And then manually confirming and fine-tuning the preselected position in the target object to obtain an extended data set. And training the used target detection network by using the extended data set, and repeating the process. The user interface is shown in fig. 7. The performance of the target detection module is shown in table 1 below.
TABLE 1 Performance of target detection modules
Class Ave.IOU mAP@0.50 Recall
Car 84.6% 98.99% 99.85%
Wherein the average per frame reasoning time is 3ms.
The performance of the object classification module is shown in table 2 below:
TABLE 2 Performance of the target classification module
Figure SMS_19
Figure SMS_20
Wherein the average per frame reasoning time is 1.758ms.
The embodiment of the invention balances the algorithm complexity, the detection accuracy and the functional completeness, and obtains better effect in implementation.
Example 2
In terms of three-dimensional mapping modules, not all situations are suitable for installing lidar due to site limitations and cost limitations. For this case, the three-dimensional coordinate reconstruction by the three-dimensional mapping module in embodiment 1 using a laser radar is changed to the three-dimensional coordinate reconstruction by using a binocular camera. The three-dimensional coordinate reconstruction process using binocular cameras is characterized in that two parallel cameras acquire images simultaneously, feature points are extracted from the two images and matched, and a formula is adopted for the matched features
Figure SMS_21
Wherein f is the focal length of the camera, b is the polar distance of the binocular camera, and x l The focal distance between the characteristic point connecting line and the optical center plane, x is the focal distance between the left eye optical axis and the camera r For the right eye optical axis and the camera, the focal distance between the characteristic point connecting line and the optical center plane is calculated, and the depth Z is calculated, so that the depth map obtained flow can be combined into the laser radar processing depth map flow. The improvement of the link enhances the applicability of the invention in various environments and reduces the installation cost.
Example 3
In terms of intelligent decision making, it may be difficult to describe with conventional behavior tree algorithms due to the variety of decision making that users desire. For this case, the intelligent scheduling link using the behavior tree in embodiment 1 is replaced with the posture determination link using the behavior tree. The gesture judging link using the behavior tree is characterized in that the position and classification information possibly generated by the controlled target are marked with scheduling behaviors or suggestions possibly generated by the user in advance by the external sensor information, the data-tag pair is encoded, a random forest model is built and sent to training, the model automatically searches scheduling suggestions possibly generated by the model aiming at the data which are not marked by the user, and then the scheduling suggestions are output to a scheduling system for the operation of the subsequent flow. The improvement improves the capability of the platform for the description of the complex scheduling task demands of the users, and improves the generalization capability of the platform.
Example 4
Due to the different installation budget and environment, it is not appropriate in some situations to deep learning target detection schemes to detect controlled objects. For this case, the deep learning object detection module in embodiment 1 is changed to a foreground separation module. The foreground separation module adopts adaptive Gaussian background modeling, uses 3-5 Gaussian models to represent the characteristics of each pixel point in the image to model the background, then matches the pixel point of the current frame with the model, if the matching is successful, the background point is the background point, and otherwise, the foreground point is the foreground. Meanwhile, in the process of separating the foreground, the method also continuously updates the Gaussian model parameters of the foreground, and the updating target is to enable the modeling background to be closer to the background of the current video frame, so that the method has the characteristic of self-adaption. The improvement enhances the generalization of the invention, reduces the implementation cost of the invention and enlarges the application range of the invention.
The above embodiments are only for the purpose of elaboration to aid in understanding the technical solutions of the present invention, and any modifications and substitutions made by those skilled in the art without departing from the principles of the present invention are intended to be within the scope of the present invention.

Claims (10)

1. An intelligent scheduling method based on three-dimensional intelligent detection is characterized by comprising the following steps:
1) The automatic labeling module of the data set acquires the current frame and sends the current frame to the target detection network for detection;
2) Tracking a detection result of the target detection network by using a target tracking network, and outputting a tracking result;
3) Performing target classification on the tracking result by using a target classification network, inputting a target frame and a depth map into a three-dimensional mapping module to reconstruct a three-dimensional position, and outputting three-dimensional coordinates under a world coordinate system;
4) The three-dimensional coordinates, the classification result and the environmental information of the tracked object are input into an intelligent decision module together, intelligent decision is carried out by using a behavior tree, the decision result is displayed on a client, and control and scheduling are carried out to a lower computer through serial communication.
2. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the controlled object position detected by the target detection network is sent to the target tracking network after a filtering algorithm, the target tracking network performs feature extraction modeling on the tracked object in each frame to obtain an inter-frame correlation measure, and the next frame position of the tracked object is determined according to the inter-frame correlation measure, so that output of a target tracking module result is obtained.
3. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the result detected by the target detection network is input into the target tracking network, if the result is the first entry into the target tracking network, N trackers are initialized according to the result, the number N of trackers is the same as the number of detection frames detected by the input target detection network, if the result is not the first entry into the target tracking module, the remaining trackers are predicted by using tracking frames by using kalman filters, and the coincidence degree is calculated and matched with the input detection frames two by two to obtain three conditions: the partial detection frames are successfully matched with the tracking frames, the partial detection frames cannot be matched, and the partial tracking frames cannot be matched; if the detection frame is successfully matched with the tracking frame, updating a Kalman filter of the corresponding tracker by using the detection frame; if the tracking frame is not matched with the detection frame, updating a Kalman filter of the tracking frame by using the tracking frame, and adding one to the self-updating times of the tracker; if the detection frame cannot be successfully matched, initializing a tracker by using the detection frame;
mixing all trackers together, detecting whether the mixed existence frame number exceeds a threshold value, deleting the trackers if the existence frame number exceeds the threshold value, and if the existence frame number does not exceed the threshold value, carrying out the following processing:
firstly detecting whether the coincidence degree of tracking frames is higher than a threshold value, if so, the reset mark positions of the two trackers are true, and then detecting whether the continuous hit times of all trackers exceeds the threshold value, if not, the reset mark positions of the trackers are true; detecting whether the reset times of the tracker exceeds a threshold value or not, if so, setting the reset times to zero, and meanwhile, setting a reset mark position of the tracker to be true; finally, detecting whether the confidence coefficient of the target exceeds a threshold value, if not, the reset mark position of the tracker is true; the tracker with the true reset flag bit inputs the image information in the tracking frame into the deep neural network for classification.
4. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the target classification network uses an EfficientNet feature extraction network as a feature extraction backbone network, an image output by a target tracking network is input into the feature extraction backbone network, then a feature extraction backbone network feature extraction result is input into a full-connection layer and a regression layer to obtain a class probability corresponding to the image, a class with the maximum probability is selected to be assigned as an image class attribute for target classification, an ID obtained by the current classification is compared with an ID obtained by the last classification, if the ID is the same, the continuous classification times are increased by one, otherwise, the continuous classification times are set to zero, meanwhile, the confidence of the target is reset to the confidence of the classification result, a reset timer is set to zero, then the task of a target tracking module is completed, and all the trackers which are not deleted remain to the next tracking task.
5. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the three-dimensional position reconstruction of the three-dimensional mapping module comprises the following steps:
0) The following coordinates are defined:
laser radar coordinate system: taking the midpoint of the bottom end of the laser radar as an origin, pointing the x-axis to the front of the laser radar, vertically upwards pointing the z-axis, and horizontally leftwards facing the front of the y-axis;
camera coordinate system: the camera optical center is taken as an origin and faces forward, at the moment, the x-axis is horizontally right, the y-axis is vertically directed to the ground, and the z-axis is parallel to the optical axis and is directed forward;
pixel coordinate system: taking the upper left corner of the image as an origin, horizontally rightward on the x-axis and vertically downward on the y-axis;
1) Starting a thread for being responsible for receiving data from the laser radar;
2) Converting the laser radar data into a format under a Cartesian coordinate system;
Figure FDA0003857762090000031
Figure FDA0003857762090000032
Figure FDA0003857762090000033
Figure FDA0003857762090000034
Figure FDA0003857762090000035
Figure FDA0003857762090000036
wherein:
measurementID is the label ID of the data packet;
scan width is a value of horizontal resolution;
beam_angles are the elevation angle of each laser beam;
beam_azimuth_angles is the azimuth angle of each laser beam;
θ encoder the rotation angle of the built-in encoder of the laser radar;
θ azimuth azimuth angle of laser radar beam;
Figure FDA0003857762090000037
is the height angle of the laser radar beam;
r is range_mm, which is the sum of the mode of the distance vector from the laser radar origin coordinate system center to the laser radar front-end optics and the mode of the distance vector from the laser radar front-end optics to the detection object;
n is lidar_origin_to_beam_origin_mm, and is the size of a distance vector from the center of the laser radar origin coordinate system to the laser radar front-end optics;
x, y and z are Cartesian coordinates of the point cloud;
3) Projecting point cloud data acquired from a laser radar onto an image to obtain a depth map:
Figure FDA0003857762090000041
the depth map is a single-channel picture with the same resolution as the camera picture, and each pixel is filled with a corresponding Zc value; (X) l 、Y l 、Z l ) Is the coordinate of the point cloud under the laser radar coordinate system, u and v are the pixel coordinates of the corresponding pixel point in the image, zc is the Z-axis value of the corresponding point cloud under the camera coordinate system, M 1 Is a phaseInternal reference matrix of machine, M 2 Is an external reference matrix of a laser radar camera, M 1 And M is as follows 2 Obtained using joint calibration;
4) According to the target frame acquired by target identification, processing depth information in the target frame by using a k clustering method:
firstly traversing all depth values range in a target frame, and obtaining an average value range_mean of all depth information; initializing three clusters, and taking three values of [ range_mean-area_thresh, range_mean, range_mean+area_thresh ] as representative values of the three clusters;
then traversing all depth values range in the target frame again, and calculating the absolute value of each depth value and the representative values of three clusters, comparing the magnitudes of the three absolute values, and attributing the depth value of the point to a cluster with a small absolute value; after traversing a round, calculating the average value of depth values under the three clusters, taking the depth value as a new representative value of the three clusters, and carrying out a new round of circulation again; stopping iteration until the absolute value between the new representative value and the old representative value is smaller than stop_num, or stopping iteration when the iteration times exceed iter_num; at this time, the representative values of the three clusters are rearranged, and the representative value with the smallest value is the front depth representative value range front And the number of depth values subordinate to the cluster is front num The maximum value is considered as the background depth representative value range background And the number of depth values subordinate to the cluster is back group num While a medium value is considered as the range of the medium depth representative value middle And the number of depth values under the cluster is middle num Then calculate the depth representative value z of the target represent
6. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 5, wherein the depth representative value z of the target is calculated represent There are two calculation methods:
1. if the device is looking up or looking up at the object, fetch range middle Is z represent
2. If the device is a top-down target, z represent =(range front *front num +range middle *middle num +range background *background num )/(front num +middle num +background num );
5) Acquiring coordinates of a target in a world coordinate system:
Figure FDA0003857762090000051
wherein (X) w 、Y w 、Z w ) Is the coordinate of the point cloud in the world coordinate system, Z represent A depth representative value representing a target; a and b are the midpoints of the target detection frame; m is M 1 Is an internal reference matrix of the camera; m is M 3 Is an external reference matrix from a world coordinate system to a camera coordinate system, and is obtained through PNP algorithm and manual calibration.
7. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the target detection network inputs the collected image frames into the feature extraction backbone network, then inputs the feature images with different scales extracted from the backbone network into the detection neck and the decoupled detection head, and finally, the detection head carries out regression to obtain the rectangular frame azimuth and the corresponding class probability of the object.
8. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein a target classification network adopts a localization deployment method of a deep learning neural network, an image obtained by cutting combined detection of target detection and target tracking is input into a feature extraction backbone network, a feature extraction result of the feature extraction backbone network is input into a full connection layer and a regression layer, category probability corresponding to an image is obtained, and a category with the largest probability is selected to be assigned as an image category attribute.
9. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the intelligent decision module customizes an intelligent scheduling logic scheme before starting a system according to specific intelligent scheduling tasks and performance requirements, and is implemented by using a behavior tree scheme; and (3) inputting the position, category information and other affiliated sensor information of the controlled target into a behavior tree by adopting a behavior tree algorithm, sensing the situation of the field and selecting strategies according to a pre-constructed behavior tree flow logic, and finally outputting an intelligent decision result through an action node in the behavior tree to perform front-end display or lower computer scheduling.
10. The intelligent scheduling method based on three-dimensional intelligent detection according to any one of claims 1 to 9, wherein the automatic data set labeling module comprises the steps of detecting the image sequence of the controlled object required by a specific task, labeling an initialized data set, sending the initialized data set into a target detection network for training to obtain a preliminary weight, further collecting the image to obtain an expanded image sequence, loading the preliminary weight into the target detection network, reasoning the expanded image sequence by using the weight to obtain a preselected position of a target object in the expanded image sequence, manually confirming and finely adjusting the preselected position in the target object to obtain an expanded data set, and training the used target detection network by using the expanded data set, wherein the process is repeated.
CN202211153269.4A 2022-09-21 2022-09-21 Intelligent scheduling method based on three-dimensional intelligent detection Pending CN116109047A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211153269.4A CN116109047A (en) 2022-09-21 2022-09-21 Intelligent scheduling method based on three-dimensional intelligent detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211153269.4A CN116109047A (en) 2022-09-21 2022-09-21 Intelligent scheduling method based on three-dimensional intelligent detection

Publications (1)

Publication Number Publication Date
CN116109047A true CN116109047A (en) 2023-05-12

Family

ID=86253379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211153269.4A Pending CN116109047A (en) 2022-09-21 2022-09-21 Intelligent scheduling method based on three-dimensional intelligent detection

Country Status (1)

Country Link
CN (1) CN116109047A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116977810A (en) * 2023-09-25 2023-10-31 之江实验室 Multi-mode post-fusion long tail category detection method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116977810A (en) * 2023-09-25 2023-10-31 之江实验室 Multi-mode post-fusion long tail category detection method and system
CN116977810B (en) * 2023-09-25 2024-01-09 之江实验室 Multi-mode post-fusion long tail category detection method and system

Similar Documents

Publication Publication Date Title
US11145073B2 (en) Computer vision systems and methods for detecting and modeling features of structures in images
CN113269098B (en) Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle
CN111429574B (en) Mobile robot positioning method and system based on three-dimensional point cloud and vision fusion
Vandapel et al. Natural terrain classification using 3-d ladar data
CN111563442A (en) Slam method and system for fusing point cloud and camera image data based on laser radar
CN112070807B (en) Multi-target tracking method and electronic device
CN112233177B (en) Unmanned aerial vehicle pose estimation method and system
CN111723654A (en) High-altitude parabolic detection method and device based on background modeling, YOLOv3 and self-optimization
CN113359782B (en) Unmanned aerial vehicle autonomous addressing landing method integrating LIDAR point cloud and image data
CN112991389A (en) Target tracking method and device and mobile robot
CN116109047A (en) Intelligent scheduling method based on three-dimensional intelligent detection
CN111709988A (en) Method and device for determining characteristic information of object, electronic equipment and storage medium
CN111401190A (en) Vehicle detection method, device, computer equipment and storage medium
CN115100741A (en) Point cloud pedestrian distance risk detection method, system, equipment and medium
CN116664851A (en) Automatic driving data extraction method based on artificial intelligence
CN116817903B (en) Priori vision guidance-based intelligent robot global positioning method and system
WO2024087962A1 (en) Truck bed orientation recognition system and method, and electronic device and storage medium
van de Ven Applying rendered UAV images to improve convolutional deep learning object detection performance for animal conserva-tion
Kirnos et al. Simultaneous Localization and Mapping Using the Support Image Base Algorithm
CN117576653A (en) Target tracking method, device, computer equipment and storage medium
CN117570959A (en) Man-machine collaborative rescue situation map construction method
CN116068575A (en) Human body state identification method and device based on 2D laser radar and RGB-D camera
CN115546785A (en) Three-dimensional target detection method and device
CN117095060A (en) Accurate cleaning method and device based on garbage detection technology and cleaning robot
KR20240055102A (en) Characterizing and improving image processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination