CN116109047A

CN116109047A - Intelligent scheduling method based on three-dimensional intelligent detection

Info

Publication number: CN116109047A
Application number: CN202211153269.4A
Authority: CN
Inventors: 赵可昕; 秦奕; 梁俊玮; 陈泽明; 梁华岳; 董博雅
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-09-21
Filing date: 2022-09-21
Publication date: 2023-05-12

Abstract

The invention discloses an intelligent scheduling method based on three-dimensional intelligent detection, which comprises the following steps of sending a camera frame into a target detection algorithm to detect a target object; the detection result is added into a target tracking list after being compared; then, sending the tracked object into a target classification network for classification, and simultaneously, reconstructing three-dimensional position information by using a depth map; and sending the classified position information and the priori information into a decision module for intelligent decision, and displaying and controlling a lower computer through a client. The intelligent scheduling platform is improved by the deep neural network and the laser radar, and the position, the category and the environmental information of the monitored range can be effectively utilized under the condition that environmental equipment is not additionally added, so that controlled objects in the range can be accurately, efficiently and real-time scheduled.

Description

Intelligent scheduling method based on three-dimensional intelligent detection

Technical Field

The invention belongs to the field of intelligent scheduling, and relates to an intelligent scheduling method based on three-dimensional intelligent detection.

Background

Along with the continuous promotion of logistics industry, transportation industry and service industry, how to effectively control and dispatch a plurality of unmanned equipment in a scene, such as logistics vehicles, service robots and the like, an intelligent dispatching platform is established, so that the intelligent dispatching platform has the capability of effectively treating various emergency conditions while efficiently running, and becomes a problem to be solved urgently. In the field of intelligent scheduling, although some solutions have been presented, the proposed solutions generally require additional large-scale installation or modification of sites or controlled objects, which is costly.

In the existing warehouse navigation intelligent vehicle dispatching method (CN 201911046869.9) based on global vision, an OpenCV open source vision library is used for extracting characteristic patterns of AGV trolleys, wherein the method comprises binarizing by using HSV images and detecting contours by using a Canny edge detection algorithm. This approach is susceptible to environmental impact and system robustness is not ideal. The method uses the target detection network to identify and classify the targets, has strong adaptability, can be suitable for various environments and has good migration capability. And meanwhile, the laser radar is used for positioning, so that the precision is better, and the method can be suitable for detection and positioning tasks of various targets. In the target fusion method and system (CN202111490323. X) based on multiple cameras and laser radars, multi-camera data are utilized to detect targets, laser radar point clouds are utilized to perform three-dimensional clustering, and then information of the two are fused and matched. The method uses three-dimensional clustering, takes longer time, and relates to the problem of processing the ground point cloud information, and the clustering result is greatly influenced.

Disclosure of Invention

Aiming at the problems, the invention fuses the depth sensor, the light-weight deep learning target detection, the target classification algorithm and the intelligent decision method, and performs modularized design on the scheduling system, so that the state sensing, the intelligent decision and the real-time scheduling are realized under the condition that the site and the controlled object are not additionally modified in a large scale. The invention sends the camera frame into a target detection algorithm to detect a target object; then, the detection result is compared and then added into a target tracking list; then, sending the tracked object into a target classification network for classification, and simultaneously, reconstructing three-dimensional position information by using a depth map; and then, sending the classified position information and the priori information into a decision module for intelligent decision, and displaying and controlling a lower computer through a client. The intelligent scheduling platform is improved by the deep neural network and the laser radar, and the position, the category and the environmental information of the monitored range can be effectively utilized under the condition that environmental equipment is not additionally added, so that controlled objects in the range can be accurately, efficiently and real-time scheduled.

The invention is realized at least by one of the following technical schemes.

An intelligent scheduling method based on three-dimensional intelligent detection comprises the following steps:

1) The automatic labeling module of the data set acquires the current frame and sends the current frame to the target detection network for detection;

2) Tracking a detection result of the target detection network by using a target tracking network, and outputting a tracking result;

3) Performing target classification on the tracking result by using a target classification network, inputting a target frame and a depth map into a three-dimensional mapping module to reconstruct a three-dimensional position, and outputting three-dimensional coordinates under a world coordinate system;

4) The three-dimensional coordinates, the classification result and the environmental information of the tracked object are input into an intelligent decision module together, intelligent decision is carried out by using a behavior tree, the decision result is displayed on a client, and control and scheduling are carried out to a lower computer through serial communication.

Further, the controlled object position detected by the target detection network is sent to the target tracking network after the filtering algorithm, the target tracking network performs feature extraction modeling on the tracked object in each frame to obtain inter-frame correlation measurement, and the next frame position of the tracked object is determined according to the inter-frame correlation, so that output of a target tracking module result is obtained.

Further, inputting the result detected by the target detection network into the target tracking network, initializing N trackers according to the result if the result is the first entry into the target tracking network, wherein the number N of trackers is the same as the number of detection frames detected by the target detection network, if the result is not the first entry into the target tracking module, the rest trackers use a Kalman filter to predict by using the tracking frames, and calculate and match the contact ratio with the input detection frames in pairs to obtain three conditions: the partial detection frames are successfully matched with the tracking frames, the partial detection frames cannot be matched, and the partial tracking frames cannot be matched; if the detection frame is successfully matched with the tracking frame, updating a Kalman filter of the corresponding tracker by using the detection frame; if the tracking frame is not matched with the detection frame, updating a Kalman filter of the tracking frame by using the tracking frame, and adding one to the self-updating times of the tracker; if the detection frame cannot be successfully matched, initializing a tracker by using the detection frame;

mixing all trackers together, detecting whether the mixed existence frame number exceeds a threshold value, deleting the trackers if the existence frame number exceeds the threshold value, and if the existence frame number does not exceed the threshold value, carrying out the following processing:

firstly detecting whether the coincidence degree of tracking frames is higher than a threshold value, if so, the reset mark positions of the two trackers are true, and then detecting whether the continuous hit times of all trackers exceeds the threshold value, if not, the reset mark positions of the trackers are true; detecting whether the reset times of the tracker exceeds a threshold value or not, if so, setting the reset times to zero, and meanwhile, setting a reset mark position of the tracker to be true; finally, detecting whether the confidence coefficient of the target exceeds a threshold value, if not, the reset mark position of the tracker is true; the tracker with the true reset flag bit inputs the image information in the tracking frame into the deep neural network for classification.

Further, the target classification network uses an EfficientNet feature extraction network as a feature extraction backbone network, an image output by a target tracking network is input into the feature extraction backbone network, then a feature extraction backbone network feature extraction result is input into a full-connection layer and a regression layer, class probability corresponding to the image is obtained, class with the maximum probability is selected to be assigned as an image class attribute for target classification, the ID obtained by the classification is compared with the ID obtained by the last classification, if the ID is the same, the continuous classification times are increased by one, otherwise, the continuous classification times are set to zero, meanwhile, the target confidence is reset to the confidence of the classification result, a reset timer is set to zero, then the task of a target tracking module is completed, and all the trackers which are not deleted still remain to the next tracking task.

Further, the three-dimensional position reconstruction of the three-dimensional mapping module comprises the following steps:

0) The following coordinates are defined:

laser radar coordinate system: taking the midpoint of the bottom end of the laser radar as an origin, pointing the x-axis to the front of the laser radar, vertically upwards pointing the z-axis, and horizontally leftwards facing the front of the y-axis;

camera coordinate system: the camera optical center is taken as an origin and faces forward, at the moment, the x-axis is horizontally right, the y-axis is vertically directed to the ground, and the z-axis is parallel to the optical axis and is directed forward;

pixel coordinate system: taking the upper left corner of the image as an origin, horizontally rightward on the x-axis and vertically downward on the y-axis;

1) Starting a thread for being responsible for receiving data from the laser radar;

2) Converting the laser radar data into a format under a Cartesian coordinate system;

/>

wherein:

the measurement ID is the label ID of the data packet;

scan _width is a value of horizontal resolution;

beam_angles are the elevation angle of each laser beam;

beam_azimuth_angles is the azimuth angle of each laser beam;

θ _encoder the rotation angle of the built-in encoder of the laser radar;

θ _azimuth azimuth angle of laser radar beam;

is the height angle of the laser radar beam;

r is range_mm, which is the sum of the mode of the distance vector from the laser radar origin coordinate system center to the laser radar front-end optics and the mode of the distance vector from the laser radar front-end optics to the detection object;

n is lidar_origin_to_beam_origin_mm, and is the size of a distance vector from the center of the laser radar origin coordinate system to the laser radar front-end optics;

x, y and z are Cartesian coordinates of the point cloud;

3) Projecting point cloud data acquired from a laser radar onto an image to obtain a depth map:

the depth map is a single-channel picture with the same resolution as the camera picture, and each pixel is filled with a corresponding Zc value; (X) _l 、X _l 、Z _l ) Is the coordinate of the point cloud under the laser radar coordinate system, u and v are the pixel coordinates of the corresponding pixel point in the image, zc is the Z-axis value of the corresponding point cloud under the camera coordinate system, M ₁ Is an internal reference matrix of the camera, M ₂ Is an external reference matrix of a laser radar camera, M ₁ And M is as follows ₂ Obtained using joint calibration;

4) According to the target frame acquired by target identification, processing depth information in the target frame by using a k clustering method:

firstly traversing all depth values range in a target frame, and obtaining an average value range_mean of all depth information; initializing three clusters, and taking three values of [ range_mean-area_thresh, range_mean, range_mean+area_thresh ] as representative values of the three clusters;

then traversing all depth values range in the target frame again, and calculating the absolute value of each depth value and the representative values of three clusters, comparing the magnitudes of the three absolute values, and attributing the depth value of the point to a cluster with a small absolute value; after traversing a round, calculating the average value of depth values under the three clusters, taking the depth value as a new representative value of the three clusters, and carrying out a new round of circulation again; stopping iteration until the absolute value between the new representative value and the old representative value is smaller than stop_num, or stopping iteration when the iteration times exceed iter_num; at this time, the representative values of the three clusters are rearranged, and the representative value with the smallest value is the front depth representative value range _front And the number of depth values subordinate to the cluster is front _num The maximum value is considered as the background depth representative value range _background And the number of depth values subordinate to the cluster is back group _num While a medium value is considered as the range of the medium depth representative value _middle And the number of depth values under the cluster is middle _num Then calculate the depth representative value z of the target _represent 。

Further, a depth representative value z of the target is calculated _represent There are two calculation methods:

1. if the device is looking up or looking up at the object, fetch range _middle Is z _represent ；

2. If the device is a top-down target, z _represent ＝range _front *front _num +range _middle *middle _num +range _background *background _num ；

5) Acquiring coordinates of a target in a world coordinate system:

wherein (X) _w 、Y _w 、Z _w ) Is the coordinate of the point cloud in the world coordinate system, Z _represent A depth representative value representing a target; a and b are the midpoints of the target detection frame; m is M ₁ Is an internal reference matrix of the camera; m is M ₃ Is an external reference matrix from a world coordinate system to a camera coordinate system, and is obtained through PNP algorithm and manual calibration.

Further, the target detection network inputs the collected image frames into the feature extraction backbone network, then inputs the feature images with different scales extracted from the backbone network into the detection neck and the decoupled detection head, and finally, the detection head carries out regression to obtain the rectangular frame azimuth and the corresponding category probability of the object.

Further, the target classification network adopts a localization deployment method of a deep learning neural network, an image obtained by cutting through combined detection of target detection and target tracking is input into a feature extraction backbone network, a feature extraction result of the feature extraction backbone network is input into a full-connection layer and a regression layer, category probability corresponding to the image is obtained, and a category with the highest probability is selected to be assigned as an image category attribute.

Further, the intelligent decision module customizes an intelligent scheduling logic scheme before starting the system according to specific intelligent scheduling tasks and performance requirements, and realizes the intelligent scheduling logic scheme by using a behavior tree scheme; and (3) inputting the position, category information and other affiliated sensor information of the controlled target into a behavior tree by adopting a behavior tree algorithm, sensing the situation of the field and selecting strategies according to a pre-constructed behavior tree flow logic, and finally outputting an intelligent decision result through an action node in the behavior tree to perform front-end display or lower computer scheduling.

Further, the automatic data set labeling module comprises the steps of detecting the image sequence of the controlled object required by a specific task, labeling an initialized data set, sending the initialized data set into a target detection network for training to obtain a preliminary weight, further collecting images to obtain an expanded image sequence, loading the preliminary weight into the target detection network, reasoning the expanded image sequence by using the weight to obtain a preselected position of a target object in the expanded image sequence, manually confirming and finely adjusting the preselected position in the target object to obtain an expanded data set, training the used target detection network by using the expanded data set, and circularly reciprocating the process.

Compared with the prior art, the invention has the beneficial effects that:

according to the control system, the service function is abstracted into seven modules according to the characteristics of an application scene, the modules are coupled with each other in a low mode, the service function can be flexibly customized according to different requirements of a use scene of a scheduling platform, an intelligent decision module can carry out customization processing according to different tasks and requirements, and the maximum response to specific scheduling requirements can be realized; the invention combines the full-field depth distance detection sensor with the image sensor, the controlled object detection and positioning process is efficient and stable, the positioning progress required by the dispatching task can be realized sufficiently under the condition that the controlled object and the site are not additionally provided with redundant position sensors and other devices, the deployment cost is reduced, and the performance is improved compared with the current intelligent dispatching platform; the intelligent data set labeling process is adopted, so that the dispatching platform can be continuously self-optimized in the actual deployment dispatching process, the precision is continuously improved, and the self-learning function is realized.

Drawings

FIG. 1 is a schematic diagram of a control flow of an intelligent scheduling platform based on three-dimensional intelligent detection in an embodiment of the invention;

FIG. 2 is a schematic diagram of an exemplary target detection architecture according to the present invention;

FIG. 3 is a schematic diagram of an object classification architecture according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a target tracking architecture according to an embodiment of the present invention;

FIG. 5 is a schematic representation of the laser radar detection depth according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an intelligent decision logic according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an intelligent labeling platform for a dataset according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a result displayed at the front end of an intelligent scheduling platform based on three-dimensional intelligent detection according to an embodiment of the present invention;

fig. 9 is a schematic diagram of a front end display of an intelligent scheduling platform based on three-dimensional intelligent detection according to an embodiment of the present invention.

Detailed Description

The invention is further illustrated by the following examples and the accompanying drawings.

Example 1

The intelligent scheduling platform based on the three-dimensional intelligent detection comprises a target detection module, a target tracking module, a target classification module, a three-dimensional mapping and intelligent scheduling module, a data set automatic labeling module, a user interface display module and the like. The target detection module adopts a target detection network to detect and inputs the detection result into a target tracking module, the target tracking module adopts the target tracking network to track the detection result of the target detection network, the output tracking result is input into a target classification module, the target classification module is used for classifying the target of the tracking result, meanwhile, the target frame and the depth map are input into a three-dimensional mapping module to reconstruct the three-dimensional position, three-dimensional coordinates under a world coordinate system are output, the three-dimensional coordinates of a tracked object, the classification result and environmental information are input into an intelligent decision module together, intelligent decision is performed by using a behavior tree, the decision result is displayed on a client, and control scheduling is performed to a lower computer through serial port communication.

The invention provides an intelligent scheduling method based on three-dimensional intelligent detection, which is shown in fig. 1, and specifically comprises the following steps:

1) The current frame is obtained from the camera and sent to the target detection module for detection. Wherein the target detection module adopts a target detection algorithm. The target detection algorithm used in this embodiment is YOLOX, and the target detection network uses YOLOX-s network.

This YOLOX-s is divided into three parts: the feature extraction backbone network, the detection neck and the decoupling detection head. Wherein the feature extraction backbone network is consistent with the feature extraction network of YOLOX 5-s, which is a residual pile-up of a convolution layer, a batch normalization layer and an activation function layer, and YOLOX-s converts the activation function into a SiLU. And the detection neck uses the structure of the feature pyramid to perform feature fusion. And finally, three decoupling heads are used for respectively outputting the category prediction of the target frame, the position information of the target frame and the front background judgment of the target.

And inputting the collected image frames into a feature extraction backbone network, inputting the feature images with different scales extracted from the backbone network into a detection neck and a decoupled detection head, and finally carrying out regression by the detection head to obtain the possible rectangular frame azimuth and the corresponding category probability of the object.

2) And inputting the target detection result of the target detection module into the target tracking module. If the target tracking module is first entered, initializing N trackers according to the result, wherein the number N of the trackers is the same as the number of input (namely, the detection frames detected by the target detection network), and the trackers comprise a Kalman filter (taking the pixel coordinates of the center point of the detection frame and the area and the length-width ratio of the detection frame as state quantities), an ID number (default to be-1 during initialization), a frame number (age) exists, a reset timer reset_count, a continuous hit number hit_count, a self-update number reset_count, a reset flag bit reset_flag, a target confidence percentage prob and a continuous classification number cluster.

If the target tracking module is not entered for the first time, the left tracker uses a Kalman filter to predict a tracking frame and calculates the coincidence ratio with the input detection frame in pairs and matches the coincidence ratio. The degree of overlap is calculated here using the hungarian algorithm and measured on the basis of the IOU of the two boxes (i.e. the two matches with high degree of overlap are selected). Three cases were then obtained: the partial detection frames are successfully matched with the tracking frames, the partial detection frames cannot be matched, and the partial tracking frames cannot be matched. If the detection frame is successfully matched with the tracking frame, updating a Kalman filter of the corresponding tracker by using the detection frame; if the tracking frame is not matched with the detection frame, updating a Kalman filter of the tracking frame by using the tracking frame, and adding one to the self-updating times of the tracker; if the detection frame cannot be successfully matched, initializing a tracker by using the detection frame. All trackers are then mixed together, and whether their number of existing frames exceeds a threshold is detected, and if so, the tracker is deleted, and if not, the following process is performed. It is first detected whether the trace frame overlap is above a threshold (using the IOU metric), if so, the reset flag positions of both trackers are true. And detecting whether the continuous hit times of all trackers exceed a threshold value, and if the continuous hit times of all trackers do not exceed the threshold value, resetting the marker position of the trackers. And detecting whether the reset times of the tracker exceeds a threshold value, if so, setting the reset times to zero, and meanwhile, setting the reset mark position of the tracker to be true. And finally, detecting whether the confidence coefficient of the target exceeds a threshold value, and if not, determining that the reset mark position of the tracker is true. In the series of determinations, if the tracker has a need to reset the flag to be true once, the next detection determination is no longer needed. But can survive a series of decisions (reset flag not double true), its tracker continues to sort times and reset timer plus one. And the tracker with the true reset flag bit inputs the image information in the tracking frame into the target classification module.

3) The target classification network uses the deep neural network to classify the tracking result, as shown in fig. 3, the image obtained by cutting the combined detection of the target detection and the target tracking is input into a feature extraction backbone network (EfficientNet feature extraction network), the feature extraction backbone network feature extraction result is input into a full connection layer and a regression layer, the class probability corresponding to the image is obtained, the class with the highest probability is selected and assigned as the image class attribute to classify the target, and the class is vehicle No. 1-5. After classification, comparing the ID obtained by the classification with the ID obtained by the previous classification, if the ID is the same as the ID obtained by the previous classification, adding one to the continuous classification times, otherwise, setting zero to the continuous classification times. And simultaneously resetting the target confidence as the confidence of the classification result and resetting the reset timer to zero. And then completing the task of the target tracking module, and reserving all the trackers which are not deleted yet to the next tracking task.

Inputting a target tracking result and a depth map into a three-dimensional mapping module to reconstruct a three-dimensional position, wherein the three-dimensional position comprises the following steps:

s0) defining a coordinate system:

pixel coordinate system: with the upper left corner of the image as the origin, the x-axis is horizontally to the right and the y-axis is vertically downward.

S1) starting a thread, and specially taking charge of receiving data from a laser radar;

s2) converting the laser radar data into a format in a Cartesian coordinate system according to the following formula

Wherein, measurementID is the label ID of the data packet.

scan _width Is the value of the horizontal resolution.

beam_angles are the height angle of each laser beam.

beam _ azimuth _ angles is the azimuth angle of each laser beam.

θ _encoder Rotation angle of the built-in encoder of the laser radar.

θ _azimuth Azimuth angle of laser radar beam.

Is the elevation angle of the laser radar beam.

r is range_mm, which is the sum of the mode of the distance vector from the laser radar origin coordinate system center to the laser radar front-end optics and the mode of the distance vector from the laser radar front-end optics to the detection object.

n is lidar_origin_to_beam_origin_mm, and is the magnitude of the distance vector from the center of the lidar origin coordinate system to the lidar front-end optics.

x, y, z are cartesian coordinates of the point cloud.

S3) according to the formula:

and projecting the point cloud data acquired from the laser radar onto an image to obtain a depth map. The effect is shown in fig. 5. Each pixel is filled with a corresponding Zc value; the subscript in the above formula is the coordinate of the point cloud under the laser radar coordinate system, u and v are the pixel coordinates of the corresponding pixel point in the image, zc is the Z-axis value of the corresponding point cloud under the camera coordinate system, M ₁ Is an internal reference matrix of the camera, M ₂ Is an external parameter matrix of the laser radar reaching the camera. M is M ₁ And M is as follows ₂ Obtained using joint calibration;

s4) processing depth information in the target frame by using a k clustering method according to the target frame acquired by target identification: firstly, traversing all depth information range in a target frame, and obtaining an average value range_mean of all depth information. Three clusters are initialized and are set up in [ range_mean-area_thresh, range_mean, range_mean+area_thresh]The three values are representative of the three clusters. Then traversing all depth values range in the target frame again, and attributing the depth value of the point to a cluster with small absolute value by calculating the absolute value of each depth value and the representative value of the three clusters and comparing the magnitudes of the three absolute values. After one round of traversing, the average value of the depth values under the three clusters is obtained, the depth value is used as a new representative value of the three clusters, and a new round of circulation is performed again. Until the absolute value between the new representative value and the old representative value is smaller than stop_num, or the iteration times exceed iter_num, the iteration is stopped. At this time, the representative values of the three clusters are rearranged, and the representative value with the smallest value is the front depth representative value range _front And the number of depth values subordinate to the cluster is front _num . The maximum value is regarded as the background depth representative value range _background And the number of depth values subordinate to the cluster is back group _num . While a medium value is regarded as the range of the medium depth representative value _middle And the number of depth values under the cluster is middle _num . Then calculate the depth representative value z of the target _represent There are two calculation methods:

if the device is looking up or looking up at the target, fetch range _middle Is z _represent 。

If the device is a top-down target, z _represent ＝(range _front *front _num +range _middle *middle _num +range _background *background _num )/(front _num +middle _num +background _num ). Wherein the above-mentioned area_thresh is 1.5, stop_num is 0.01, iter_num is 30, and the three values can be adjusted according to practical situations.

S5) obtaining coordinates of the target in a world coordinate system:

where w is the coordinates of the point cloud in the world coordinate system. Z is Z _represent The method is obtained after the advanced treatment through the steps; u and v are the midpoints of the target detection box; m is M ₁ Is an internal reference matrix of the camera; m is M ₃ Is an external reference matrix from a world coordinate system to a camera coordinate system, and is obtained through PNP algorithm and manual calibration.

After the steps, outputting three-dimensional coordinates in a world coordinate system;

4) And the three-dimensional coordinates of the tracked objects and external environment information such as classification, object electric quantity, occupied condition of field facilities, task time and the like are input into an intelligent decision module together. Inputting the position and type information of a controlled target and other affiliated sensor information of a receiving site into a behavior tree by using an intelligent scheduling logic scheme customized before a system is started according to specific intelligent scheduling tasks and performance requirements, sensing the site situation and selecting strategies according to pre-constructed behavior tree flow logic, and finally, performing front-end display or lower computer scheduling by using an intelligent decision result output system through action nodes in the behavior tree, wherein an intelligent decision logic example is shown in fig. 6;

wherein, intelligent decision logic is: constructing a main decision flow loop under the root node; under the main decision cycle, a series of decision conditions such as whether the base power is sufficient, whether the visitor exists, whether the vehicle exists in the key area or not and the like are respectively set, if the decision conditions are met, corresponding actions are executed, and the traversal is continued; if the decision condition is not met, continuing to execute the method downwards; jumping into a secondary decision cycle in a task execution period; respectively setting decision judgment conditions such as whether the electric quantity of the vehicle is sufficient or not, whether an object taking instruction exists or not, if the conditions are met, executing corresponding decisions, and if the conditions are not met, continuing to execute the condition cycle;

5) Information in the system such as the electric quantity of the object, the occupied condition of the field facility, the task time and the like and the intelligent decision result are screened and displayed on the client, and control and scheduling are carried out to the lower computer through serial communication, so that the effect is shown in fig. 8 and 9.

And for the automatic labeling module of the data set, acquiring an image sequence of the controlled object required by a specific task, carrying out labeling initialization data set, and sending the initialization data set into a target detection network for training to obtain a preliminary weight. And further acquiring images to obtain an extended image sequence. Loading the preliminary weight into a target detection network, and reasoning the extended image sequence by using the weight to obtain the preselected position of the target object in the extended image sequence. And then manually confirming and fine-tuning the preselected position in the target object to obtain an extended data set. And training the used target detection network by using the extended data set, and repeating the process. The user interface is shown in fig. 7. The performance of the target detection module is shown in table 1 below.

TABLE 1 Performance of target detection modules

Class	Ave.IOU	mAP@0.50	Recall
				Car	84.6％	98.99％	99.85％

Wherein the average per frame reasoning time is 3ms.

The performance of the object classification module is shown in table 2 below:

TABLE 2 Performance of the target classification module

Wherein the average per frame reasoning time is 1.758ms.

The embodiment of the invention balances the algorithm complexity, the detection accuracy and the functional completeness, and obtains better effect in implementation.

Example 2

In terms of three-dimensional mapping modules, not all situations are suitable for installing lidar due to site limitations and cost limitations. For this case, the three-dimensional coordinate reconstruction by the three-dimensional mapping module in embodiment 1 using a laser radar is changed to the three-dimensional coordinate reconstruction by using a binocular camera. The three-dimensional coordinate reconstruction process using binocular cameras is characterized in that two parallel cameras acquire images simultaneously, feature points are extracted from the two images and matched, and a formula is adopted for the matched features

Wherein f is the focal length of the camera, b is the polar distance of the binocular camera, and x _l The focal distance between the characteristic point connecting line and the optical center plane, x is the focal distance between the left eye optical axis and the camera _r For the right eye optical axis and the camera, the focal distance between the characteristic point connecting line and the optical center plane is calculated, and the depth Z is calculated, so that the depth map obtained flow can be combined into the laser radar processing depth map flow. The improvement of the link enhances the applicability of the invention in various environments and reduces the installation cost.

Example 3

In terms of intelligent decision making, it may be difficult to describe with conventional behavior tree algorithms due to the variety of decision making that users desire. For this case, the intelligent scheduling link using the behavior tree in embodiment 1 is replaced with the posture determination link using the behavior tree. The gesture judging link using the behavior tree is characterized in that the position and classification information possibly generated by the controlled target are marked with scheduling behaviors or suggestions possibly generated by the user in advance by the external sensor information, the data-tag pair is encoded, a random forest model is built and sent to training, the model automatically searches scheduling suggestions possibly generated by the model aiming at the data which are not marked by the user, and then the scheduling suggestions are output to a scheduling system for the operation of the subsequent flow. The improvement improves the capability of the platform for the description of the complex scheduling task demands of the users, and improves the generalization capability of the platform.

Example 4

Due to the different installation budget and environment, it is not appropriate in some situations to deep learning target detection schemes to detect controlled objects. For this case, the deep learning object detection module in embodiment 1 is changed to a foreground separation module. The foreground separation module adopts adaptive Gaussian background modeling, uses 3-5 Gaussian models to represent the characteristics of each pixel point in the image to model the background, then matches the pixel point of the current frame with the model, if the matching is successful, the background point is the background point, and otherwise, the foreground point is the foreground. Meanwhile, in the process of separating the foreground, the method also continuously updates the Gaussian model parameters of the foreground, and the updating target is to enable the modeling background to be closer to the background of the current video frame, so that the method has the characteristic of self-adaption. The improvement enhances the generalization of the invention, reduces the implementation cost of the invention and enlarges the application range of the invention.

The above embodiments are only for the purpose of elaboration to aid in understanding the technical solutions of the present invention, and any modifications and substitutions made by those skilled in the art without departing from the principles of the present invention are intended to be within the scope of the present invention.

Claims

1. An intelligent scheduling method based on three-dimensional intelligent detection is characterized by comprising the following steps:

2. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the controlled object position detected by the target detection network is sent to the target tracking network after a filtering algorithm, the target tracking network performs feature extraction modeling on the tracked object in each frame to obtain an inter-frame correlation measure, and the next frame position of the tracked object is determined according to the inter-frame correlation measure, so that output of a target tracking module result is obtained.

3. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the result detected by the target detection network is input into the target tracking network, if the result is the first entry into the target tracking network, N trackers are initialized according to the result, the number N of trackers is the same as the number of detection frames detected by the input target detection network, if the result is not the first entry into the target tracking module, the remaining trackers are predicted by using tracking frames by using kalman filters, and the coincidence degree is calculated and matched with the input detection frames two by two to obtain three conditions: the partial detection frames are successfully matched with the tracking frames, the partial detection frames cannot be matched, and the partial tracking frames cannot be matched; if the detection frame is successfully matched with the tracking frame, updating a Kalman filter of the corresponding tracker by using the detection frame; if the tracking frame is not matched with the detection frame, updating a Kalman filter of the tracking frame by using the tracking frame, and adding one to the self-updating times of the tracker; if the detection frame cannot be successfully matched, initializing a tracker by using the detection frame;

4. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the target classification network uses an EfficientNet feature extraction network as a feature extraction backbone network, an image output by a target tracking network is input into the feature extraction backbone network, then a feature extraction backbone network feature extraction result is input into a full-connection layer and a regression layer to obtain a class probability corresponding to the image, a class with the maximum probability is selected to be assigned as an image class attribute for target classification, an ID obtained by the current classification is compared with an ID obtained by the last classification, if the ID is the same, the continuous classification times are increased by one, otherwise, the continuous classification times are set to zero, meanwhile, the confidence of the target is reset to the confidence of the classification result, a reset timer is set to zero, then the task of a target tracking module is completed, and all the trackers which are not deleted remain to the next tracking task.

5. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the three-dimensional position reconstruction of the three-dimensional mapping module comprises the following steps:

0) The following coordinates are defined:

wherein:

measurementID is the label ID of the data packet;

scan _width is a value of horizontal resolution;

beam_angles are the elevation angle of each laser beam;

beam_azimuth_angles is the azimuth angle of each laser beam;

θ _encoder the rotation angle of the built-in encoder of the laser radar;

θ _azimuth azimuth angle of laser radar beam;

is the height angle of the laser radar beam;

x, y and z are Cartesian coordinates of the point cloud;

the depth map is a single-channel picture with the same resolution as the camera picture, and each pixel is filled with a corresponding Zc value; (X) _l 、Y _l 、Z _l ) Is the coordinate of the point cloud under the laser radar coordinate system, u and v are the pixel coordinates of the corresponding pixel point in the image, zc is the Z-axis value of the corresponding point cloud under the camera coordinate system, M ₁ Is a phaseInternal reference matrix of machine, M ₂ Is an external reference matrix of a laser radar camera, M ₁ And M is as follows ₂ Obtained using joint calibration;

6. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 5, wherein the depth representative value z of the target is calculated _represent There are two calculation methods:

2. If the device is a top-down target, z _represent ＝(range _front *front _num +range _middle *middle _num +range _background *background _num )/(front _num +middle _num +background _num )；

5) Acquiring coordinates of a target in a world coordinate system:

7. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the target detection network inputs the collected image frames into the feature extraction backbone network, then inputs the feature images with different scales extracted from the backbone network into the detection neck and the decoupled detection head, and finally, the detection head carries out regression to obtain the rectangular frame azimuth and the corresponding class probability of the object.

8. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein a target classification network adopts a localization deployment method of a deep learning neural network, an image obtained by cutting combined detection of target detection and target tracking is input into a feature extraction backbone network, a feature extraction result of the feature extraction backbone network is input into a full connection layer and a regression layer, category probability corresponding to an image is obtained, and a category with the largest probability is selected to be assigned as an image category attribute.

9. The intelligent scheduling method based on three-dimensional intelligent detection according to claim 1, wherein the intelligent decision module customizes an intelligent scheduling logic scheme before starting a system according to specific intelligent scheduling tasks and performance requirements, and is implemented by using a behavior tree scheme; and (3) inputting the position, category information and other affiliated sensor information of the controlled target into a behavior tree by adopting a behavior tree algorithm, sensing the situation of the field and selecting strategies according to a pre-constructed behavior tree flow logic, and finally outputting an intelligent decision result through an action node in the behavior tree to perform front-end display or lower computer scheduling.

10. The intelligent scheduling method based on three-dimensional intelligent detection according to any one of claims 1 to 9, wherein the automatic data set labeling module comprises the steps of detecting the image sequence of the controlled object required by a specific task, labeling an initialized data set, sending the initialized data set into a target detection network for training to obtain a preliminary weight, further collecting the image to obtain an expanded image sequence, loading the preliminary weight into the target detection network, reasoning the expanded image sequence by using the weight to obtain a preselected position of a target object in the expanded image sequence, manually confirming and finely adjusting the preselected position in the target object to obtain an expanded data set, and training the used target detection network by using the expanded data set, wherein the process is repeated.