CN117078927A

CN117078927A - Combined target labeling method, device, equipment and storage medium

Info

Publication number: CN117078927A
Application number: CN202310980356.5A
Authority: CN
Inventors: 张晓亮; 黄俊维; 胡晨曦
Original assignee: Beijing Jingxiang Technology Co Ltd
Current assignee: Beijing Jingxiang Technology Co Ltd
Priority date: 2023-08-04
Filing date: 2023-08-04
Publication date: 2023-11-17

Abstract

The application discloses a combined target labeling method, device and equipment based on an image and a laser point cloud and a storage medium. Obtaining data to be marked for the same matching target according to image data acquired by a vision sensor and laser point cloud data acquired by a laser radar; inputting the data to be marked into a pre-established target detection model to obtain a target detection result; performing target segmentation processing on the target detection result to obtain at least one target segmentation result; inputting the at least one target segmentation result into a pre-established target optimization model, and generating and optimizing a labeling boundary in the target segmentation result. The method and the system can improve the target marking efficiency and accuracy, reduce marking cost and risk, and provide high-quality marking data service for application tasks such as data analysis, machine learning and the like.

Description

Combined target labeling method, device, equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for labeling a combined target based on an image and a laser point cloud.

Background

The target labeling refers to labeling targets in a video frame, a target tracking algorithm can be trained by using the labeled targets to obtain a trained target tracking model, so that the accuracy of the target labeling directly influences the training effect of the target tracking model, and in the prior art, a fusion algorithm for multiple data is immature, a data matching and registering algorithm is unstable, the target labeling is easy to make mistakes, and the efficiency is low. In view of this, how to improve the efficiency of target labeling is a problem to be solved.

It should be noted that the statements herein merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Disclosure of Invention

In view of the above, the present application proposes a method, apparatus, device and storage medium for joint target labeling based on image and laser point cloud, which overcomes or at least partially solves the above-mentioned problems.

The embodiment of the application adopts the following technical scheme:

in a first aspect, an embodiment of the present application provides a method for labeling a joint target based on an image and a laser point cloud, where the method includes: obtaining data to be marked for the same matching target according to image data acquired by a vision sensor and laser point cloud data acquired by a laser radar; inputting the data to be marked into a pre-established target detection model to obtain a target detection result; performing target segmentation processing on the target detection result to obtain at least one target segmentation result; inputting the at least one target segmentation result into a pre-established target optimization model, and generating and optimizing a labeling boundary in the target segmentation result.

Preferably, inputting the at least one target segmentation result into a pre-established target optimization model, generating and optimizing a labeling boundary in the target segmentation result, including: performing target detection and segmentation on the image data and the laser point cloud data to obtain an image target detection result and a laser point cloud detection result, and generating a labeling boundary in the target segmentation result; optimizing the labeling boundary under the condition of inaccuracy of the labeling boundary; and carrying out merging processing and evaluating the merging result according to the optimized marked boundary result.

Preferably, the at least one target segmentation result is input into a pre-established target optimization model, and after generating and optimizing the labeling boundary in the target segmentation result, the method further comprises the steps of: inputting the target segmentation result with the marked boundary into a pre-established weak supervision learning model; obtaining a labeling result of the target segmentation result according to the weak supervision learning model; analyzing the labeling result and generating labeling suggestions;

responding to the annotation suggestion, and carrying out interactive annotation; and verifying the interactive labeling result by using a predefined standard.

Preferably, the at least one target segmentation result is input into a pre-established target optimization model, and after generating and optimizing the labeling boundary in the target segmentation result, the method further comprises the steps of: displaying the target segmentation result through a visual interface; and deriving different preset labeling formats according to the target segmentation result.

Preferably, the obtaining the data to be marked for the same matching target according to the image data collected by the vision sensor and the laser point cloud data collected by the laser radar includes: after processing the multi-mode data set formed by the image data and the laser point cloud data, respectively extracting characteristic points of the same matching target of different mode data by adopting a deep learning model and a computer vision model; matching the feature points in the different mode data by calculating the similarity between adjacent feature points and/or using a deep neural network model; and registering and correcting the multi-mode data by using a preset feature point registration algorithm to obtain the data to be marked for the same matching target. The method further comprises the steps of: and correcting and evaluating the characteristic point registration algorithm according to the matching and registration evaluation result of the multi-mode dataset.

Preferably, the inputting the data to be marked into a pre-established target detection model to obtain a target detection result includes: and using an interactive labeling algorithm in the target detection model to respond to real-time feedback of labeling personnel and optimize the target detection result.

In a second aspect, an embodiment of the present application further provides a joint target labeling device based on an image and a laser point cloud, where the device includes: the data to be marked acquisition unit is used for acquiring data to be marked for the same matching target according to the image data acquired by the vision sensor and the laser point cloud data acquired by the laser radar; the detection result acquisition unit is used for inputting the data to be marked into a pre-established target detection model to obtain a target detection result; the segmentation result acquisition unit is used for carrying out target segmentation processing on the target detection result to obtain at least one target segmentation result; the labeling boundary result obtaining unit is used for inputting the at least one target segmentation result into a pre-established target optimization model, and generating and optimizing labeling boundaries in the target segmentation result.

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor based on the combined target labeling device of the image and the laser point cloud; and a memory arranged to store computer executable instructions which, when executed, cause the processor to perform any of the methods of the first aspect.

In a fourth aspect, embodiments of the present application also provide a computer readable storage medium storing one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform any of the methods of the first aspect.

The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects:

the method adopts the multi-mode data labeling method to carry out fusion processing on the multi-source data, and improves the accuracy and efficiency of labeling. Meanwhile, according to feedback information in the labeling process, the algorithm can be continuously updated, and the practicability and reliability of the algorithm are maintained; secondly, the application provides a semiautomatic labeling and intelligent suggestion algorithm, which can help labeling personnel to quickly and accurately complete labeling requirements according to manual labeling data and previous labeling records, thereby greatly improving labeling efficiency and quality; besides, the method for automatically generating and optimizing the labeling boundary in an integrated manner is also provided, so that the automatic generation and optimization of the labeling boundary can be realized, and the labeling efficiency and accuracy are improved.

The foregoing description of the embodiments of the present application is merely an overview of the embodiments of the present application, and may be implemented according to the content of the specification, in order to make the above and other objects, features and advantages of the present application more obvious, the following specific embodiments of the present application will be described.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a schematic flow chart of a combined target labeling method in an embodiment of the application;

FIG. 2 is a schematic diagram of a joint semi-automatic labeling process in an embodiment of the application;

FIG. 3 is a schematic diagram of a data obtaining process to be marked in an embodiment of the present application;

FIG. 4 is a schematic diagram of a joint target labeling device according to an embodiment of the application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The method is characterized in that aiming at the current situation that the target labeling method in the prior art is low in efficiency and low in accuracy, an accurate and efficient combined target labeling method based on the image and the laser point cloud is designed.

Based on the research results in the technical field of target labeling, the defects of some prior art schemes, which need improvement, are found. For example, a machine learning algorithm is widely applied to the fields of target detection, segmentation, recognition and the like, can improve the labeling efficiency and accuracy, but has the defects in inaccurate labeling results and data matching registration; for another example, the traditional labeling method comprises two types of manual labeling and automatic labeling, wherein the manual labeling needs to consume manpower and time, the automatic labeling algorithm is limited greatly, the labeling result is easily influenced by subjective factors, and the problem of inaccurate labeling exists; for another example, the multi-mode data fusion technology can improve the processing efficiency when processing the multi-mode data, but still has the problems of low data difference, low fusion precision and the like; as another example, data matching and registration algorithms, traditional data matching and registration algorithms suffer from deficiencies in terms of data noise, nonlinear transformations, etc., resulting in lower accuracy and robustness of matching and registration. The above technical scheme cannot meet the requirements of marking and processing large-scale and high-quality data. Therefore, based on the research, the application provides the image and point cloud combined semi-automatic labeling method based on deep learning, self-adaptive registration, weak supervision learning and intelligent suggestion, which can optimize the labeling flow and improve the labeling efficiency and accuracy.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

The embodiment of the application provides a combined target labeling method, a device, equipment and a storage medium based on an image and a laser point cloud, and as shown in fig. 1, provides a flow diagram of the combined target labeling method based on the image and the laser point cloud in the embodiment of the application, wherein the method at least comprises the following steps of S110 to S140:

step S110, obtaining data to be marked for the same matching target according to image data acquired by a vision sensor and laser point cloud data acquired by a laser radar;

in the application, the data to be processed is acquired through the acquisition unit arranged on the equipment. The acquisition unit at least comprises an image acquisition unit and a laser point cloud acquisition unit, and in addition, in order to increase the diversity of data acquisition and the accuracy of target marking, the acquisition of required data can be carried out through other sensors such as GPS, IMU and the like. And constructing a plurality of data to be processed acquired by the sensor into a multi-mode data set.

Preprocessing is carried out on data types acquired by different sensors, namely multi-mode data sets, wherein the preprocessing comprises data matching and calibration operations such as noise removal, data normalization and the like so as to eliminate differences between the data for the same matching target and facilitate subsequent calculation processing and analysis.

In the application, the multi-mode data matching and calibration are not to fuse the data acquired by a plurality of sensors and not to fuse a plurality of independent data, but to fuse a plurality of data of different modes under one coordinate, for example, the camera coordinate and the lidar coordinate are both transferred to one coordinate system, such as a vehicle body coordinate system. By matching and calibrating the multi-mode data, the laser radar data, the camera image data and other sensor data are matched and registered rapidly and accurately, and the data quality and accuracy are improved.

After data matching and calibration, feature points or feature descriptors of different mode data are extracted by deep learning and computer vision technology, and feature vectors of the data are obtained. After the feature vector is obtained, the feature points extracted from the data of different modes are matched. And the high-efficiency and high-precision feature point matching is realized by calculating the similarity between adjacent points or using a deep neural network and other methods.

After the feature points are matched, model registration is carried out, and the Iterative Closest Point (ICP) algorithm or Probability Data Association (PDA) algorithm and the like are used for registering different mode data through the feature points obtained through calculation. And the registered data is corrected by using methods such as Laplacian smoothing, three-dimensional sampling and grid laplacian so as to reduce random noise, processing errors and the like generated by the system.

After model registration, error correction is carried out, and according to the characteristics and data information of different data types, error correction is carried out on data of different modes, so that the consistency and the accuracy of the data are ensured. After error correction, the application also evaluates the effect of the data. And the algorithm is tested and verified by evaluating the matching and registering results of the multi-mode data, so that the stability and the practicability of the algorithm are further improved.

Step S120, inputting the data to be marked into a pre-established target detection model to obtain a target detection result;

in the application, multi-mode data fusion is introduced during target recognition and segmentation, and at least the combination of laser radar data and camera data is included, so that the influence on factors such as illumination, shielding, visual angle and the like of an image in the traditional image processing algorithm is better solved.

The application uses deep learning algorithms, including Convolutional Neural Networks (CNNs), mask R-CNNs, and the like, to locate, identify, and classify targets using the above algorithms. These deep learning algorithms have great advantages over conventional algorithms in terms of performance, accuracy, computational speed, etc. of object classification and recognition. In addition, the method further realizes the labeling process of online iterative learning, and optimizes the target detection process by adopting the reinforcement learning algorithm so as to further improve the accuracy and efficiency of the target detection process. Furthermore, the application adds interactivity in algorithm design. In the present application, interactivity refers to interactions between a person and an algorithm, i.e., interactions. In target modeling and detection, the implementation of interactivity generally includes interactive modes such as user real-time feedback, error correction, automatic completion and the like. For example, in the target detection process, when the algorithm detects a certain object, the user can feed back the detection result to indicate whether the detection is correct, whether the detection is missed, and the like, so as to adjust the parameters and the model of the algorithm. The interactivity in the application can respond to real-time feedback of labeling personnel in the algorithm, and improves the accuracy of the target detection algorithm on left and right error rate and positioning effect.

Step S130, performing target segmentation processing on the target detection result to obtain at least one target segmentation result;

step S140, inputting the at least one target segmentation result into a pre-established target optimization model, and generating and optimizing a labeling boundary in the target segmentation result.

As shown in fig. 2, before target segmentation and boundary optimization are performed on the data after target detection, preprocessing such as formatting, data normalization and the like are needed to be performed on the input data, so as to reduce invalid calculation of an algorithm and noise influence, and provide a data basis for subsequent processing. Data formatting is mainly to normalize and convert input data according to a certain data format so that an algorithm can understand and process the data. The data formatting process includes operations such as data type conversion (e.g., character or text string to numeric type), missing value padding, data alignment, etc. The data normalization mainly eliminates the influence of measurement units from different data items by carrying out normalization processing on different data items, ensures that variable weights corresponding to different characteristic variables or input variables are equal, and improves the stability and accuracy of an algorithm. Normalization methods used in the present application include min-max normalization, z-score normalization, and the like.

The automatic detection and segmentation of the targets in the image and point cloud data are realized through technologies such as target segmentation in the image and point cloud. Based on a target detection algorithm in the image, the target detection algorithm (RCNN, YOLO and the like) and a segmentation algorithm (FCN, mask R-CNN and the like) based on deep learning automatically detect and segment the target in the image data; and aiming at the point cloud data, performing point cloud target detection and segmentation by a distance method, morphological segmentation, background subtraction, a 3D convolutional neural network and other methods.

After the target is segmented, the automatically generated labeling boundary is optimized, and the precision and accuracy of labeling are further improved by adopting methods such as self-adaptive weighting and the like aiming at inaccuracy of the boundary. And combining and processing the automatically generated and optimized labeling results through the joint processing of the image and the point cloud so as to obtain more accurate and perfect labeling results. And finally, evaluating and verifying the labeling result, and testing to ensure the accuracy and reliability of the labeling result.

The automatic labeling boundary algorithm and the corresponding labeling optimization method introduced by the application can greatly improve the accuracy, the integrity and the quality of labeling and greatly improve the stability and the reliability of labeling results.

In some examples of the present application, inputting the at least one target segmentation result into a pre-established target optimization model, generating and optimizing a labeling boundary in the target segmentation result includes: performing target detection and segmentation on the image data and the laser point cloud data to obtain an image target detection result and a laser point cloud detection result, and generating a labeling boundary in the target segmentation result; optimizing the labeling boundary under the condition of inaccuracy of the labeling boundary; and carrying out merging processing and evaluating the merging result according to the optimized marked boundary result.

After the target is segmented, the automatically generated labeling boundary is optimized, and the precision and accuracy of labeling are further improved by adopting methods such as self-adaptive weighting and the like aiming at inaccuracy of the boundary. And combining and processing the automatically generated and optimized labeling results through the joint processing of the image and the point cloud so as to obtain more accurate and perfect labeling results. And evaluating and verifying the labeling result to test so as to ensure the accuracy and reliability of the labeling result.

In some examples of the present application, the at least one target segmentation result is input into a pre-established target optimization model, and after generating and optimizing the labeling boundary in the target segmentation result, the method further includes: inputting the target segmentation result with the marked boundary into a pre-established weak supervision learning model; obtaining a labeling result of the target segmentation result according to the weak supervision learning model; analyzing the labeling result and generating labeling suggestions; responding to the annotation suggestion, and carrying out interactive annotation; and verifying the interactive labeling result by using a predefined standard.

As shown in fig. 2, when labeling a target, the application firstly performs initial labeling of the round, obtains preliminary labeling information of the target, including information such as the type, the position, the size and the like of the target, and takes the preliminary labeling information as the basis of subsequent labeling. The initial labeling process in the application can be automatic model labeling or manual labeling. The specific choice depends on the data size and the feasibility of the labeling technique. If the data size is smaller and the labeling rule is certain or the labeling person has a certain degree of expertise, the manual labeling mode can be adopted. In this case, the annotators will typically annotate with task goals such as object removal, confidence ordering, etc. If the data scale is larger, or the labeling process is difficult or the labeling rule is less obvious, the model automatic labeling mode can be adopted for initial labeling. The specific implementation method can be to make labeling by utilizing rules or based on a traditional algorithm, and then further adjust and perfect labeling results according to data analysis and feedback. It should be noted that whether manual labeling or model automatic labeling is used, the initial labeling is a very important step, and it directly determines the efficiency and accuracy of the subsequent labeling. Therefore, when initial labeling is performed, the data type and the labeling requirement need to be carefully analyzed, and the most suitable labeling mode is selected according to specific situations so as to ensure the accuracy and the reliability of labeling results.

Then, through technologies such as target detection and segmentation in the image and the point cloud, automatic segmentation of the target in the image and the point cloud data is realized. Based on a target detection algorithm in the image, the target in the image data is automatically segmented by a segmentation algorithm (FCN, mask R-CNN and the like) based on deep learning; and aiming at the point cloud data, performing point cloud target segmentation by a distance method, morphological segmentation, background subtraction, a 3D convolutional neural network and other methods.

The application also realizes the rapid labeling and processing of the data through the training and the prediction of the weak supervision learning model. In model construction, techniques such as support vector machines, deep neural networks, and the like are used. In the process of marking and processing the data, the information is fed back based on manual marking data, and meanwhile, the weakly supervised learning model is updated and optimized, so that the accuracy and the robustness of the model are improved.

The weakly supervised learning model can provide intelligent suggestions for labeling personnel and optimize labeling flow, wherein the intelligent suggestions refer to automatic design based on an algorithm, and can provide optimization suggestions and operation suggestions, so that the adaptability and accuracy of the algorithm are enhanced. In the target detection, the intelligent suggestion can automatically identify and prompt unlabeled targets, automatically provide better labeling frames and the like. The intelligent suggestion can greatly improve the speed and accuracy of modeling and detection, and can reduce the workload of manual labeling. By automatically processing and analyzing the data, marking suggestions, automatic identification and marking are given, and marking personnel are guided to rapidly and accurately complete marking tasks. The giving process of the annotation suggestion comprises the step of generating the annotation suggestion according to image characteristics, prediction results, historical annotation data and the like according to task requirements. Then, in order to make the annotation suggestion more comprehensive and accurate, operations such as screening, deduplication and the like are required to be performed on the annotation suggestion. In addition, in order to make the algorithm stable, the number and direction of labeling suggestions need to be limited and controlled. The automatic identification and labeling process comprises the steps of automatically identifying a labeling object based on a deep learning algorithm, labeling according to an algorithm prediction result, generating a labeling suggestion, providing information such as ordering, confidence and the like of labels, and guiding labeling personnel to complete subsequent labeling work by using the labeling suggestion. In the labeling process, attention is required to correctly label the target, and repetition or error labeling is avoided. In the labeling process, through multiple rounds of interactive labeling, manual auditing and checking, the labeling result is carefully and comprehensively checked, so that the accuracy and reliability of labeling are improved. Meanwhile, feedback of labeling personnel is required to be followed, and feedback and improvement on quality and effect of labeling suggestions are timely carried out. The intelligent suggestion results can be interactively marked by marking personnel, so that the marking quality and accuracy are further improved.

After the interaction labeling, the labeling results are verified, and the labeling results are evaluated and verified by utilizing a predefined standard or through the verification of labeling personnel, so that the accuracy and the reliability of the labeling results are ensured. Because the labeling process is a core link of data preprocessing in a deep learning algorithm, the position, the category, the characteristic information and the like of each target need to be marked accurately, and the labeling quality and the consistency of the labeling result need to be paid attention to in the labeling process so as to ensure the reliability and the stability of the labeling result. The marked data quantity is one of key indexes for evaluating, verifying and optimizing the effect of the deep learning algorithm; in order to train out a high-quality and high-efficiency target detection model, a large amount of labeling data needs to be collected, the types and scenes of objects related to the labeling data need to be rich and diverse enough, meanwhile, the diversity of the data needs to be improved, and a data enhancement method is introduced to fully utilize the data so as to improve the algorithm performance. When the labeling result is evaluated, indexes such as accuracy, recall, precision, F1 value (the harmonic mean of the accuracy and the recall) and the like are generally adopted, the influence of the label unbalance problem needs to be paid attention to, and the quality of an algorithm can be really evaluated by comprehensively analyzing various evaluation indexes. When verifying the labeling result, a cross verification method is generally adopted, labeling data is divided into a training set, a verification set and a test set, and the repeatability, the distribution balance and the randomness of the data set are paid attention to so as to ensure the scientificity and the accuracy of the verification result to the greatest extent. When the evaluation and verification result is wrong, the labeling result needs to be modified and optimized in time, and the correction method comprises the steps of correction of the labeling error, data cleaning, data volume increase or parameter adjustment of the neural network model and the like.

Through the scheme of the detailed semiautomatic labeling and intelligent suggestion algorithm, the labeling efficiency and accuracy can be effectively improved, and the labeling flow can be optimized. In addition, the method combines manual labeling and weak supervision learning, has better generalization capability and applicability, and can adapt to labeling requirements of different fields and different tasks.

In some examples of the present application, the at least one target segmentation result is input into a pre-established target optimization model, and after generating and optimizing the labeling boundary in the target segmentation result, the method further includes: displaying the target segmentation result through a visual interface; and deriving different preset labeling formats according to the target segmentation result.

The combined target labeling method provides a visual interface, designs and realizes an visual user interface, so that labeling personnel can conveniently load data, visualize, interactively label, label storage, export and other operations.

In addition, according to different application scenes and annotation requirements, the method and the device define the annotation files supporting multiple formats, including the annotation formats such as PASCAL VOC and COCO, KITTI, cityscapes, yolo.

Meanwhile, the method and the device can export the labeling result to a plurality of labeling formats, and support the mutual conversion of different software so as to realize seamless connection with other software. And the encryption and signature of the derived annotation file are supported, and the security and confidentiality of the data are ensured.

The labeling personnel can check the labeling result, cross check and modify the labeling result through the interactive labeling in the visual interface so as to ensure the labeling accuracy. The time and the steps of different operations can be counted in the labeling process, the labeling progress is automatically managed and controlled, the labeling efficiency is optimized, and the accuracy and the integrity of data are ensured. In the visual interface, the annotation function is continuously enhanced, and advanced functions such as synchronous watching of multiple cameras, collaborative modification of annotation of multiple users, automatic identification of targets and boundaries, batch synchronous annotation and the like can be supported.

In some examples of the present application, the obtaining the data to be marked for the same matching target according to the image data collected by the vision sensor and the laser point cloud data collected by the laser radar includes: after processing the multi-mode data set formed by the image data and the laser point cloud data, respectively extracting characteristic points of the same matching target of different mode data by adopting a deep learning model and a computer vision model; matching the feature points in the different mode data by calculating the similarity between adjacent feature points and/or using a deep neural network model; registering and correcting the multi-mode data by using a preset feature point registration algorithm to obtain data to be marked for the same matching target; and correcting and evaluating the characteristic point registration algorithm according to the matching and registration evaluation result of the multi-mode dataset.

In the present application, as shown in fig. 3, step S310 is first performed to acquire data to be processed through an acquisition unit installed on a device, where the acquisition unit at least includes an image acquisition unit and a laser point cloud acquisition unit, and in addition, in order to increase diversity and accuracy of data acquisition, required data acquisition may also be performed through other sensors such as GPS and IMU. And constructing a plurality of data to be processed acquired by the sensor into a multi-mode data set.

Step S320 is performed on data types acquired by different sensors, namely multi-mode data sets, and data preprocessing is performed, wherein the preprocessing comprises data matching and calibration operations such as noise removal, data normalization and the like, so that differences of the data on the same matching target are eliminated, and subsequent calculation processing and analysis are facilitated.

After data matching and calibration, feature points or feature descriptors of different mode data are extracted by deep learning and computer vision technology, so as to obtain feature vectors of the data, namely, step S330, and data feature extraction. The method specifically comprises the following steps:

for image data, a conventional feature point detection algorithm, such as SIFT, SURF, ORB or other deep learning algorithm, such as CNN, is employed to extract the corresponding feature descriptors. And for the laser radar data, adopting a feature extraction algorithm on the point cloud, such as ISS, SHOT, FPFH and the like, to extract feature points of the point cloud data, and then extracting descriptive features of the point cloud. And for other types of data, carrying out targeted feature extraction according to different feature extraction methods to obtain corresponding feature vectors. After the feature vector of each data is obtained, the feature vector is processed and analyzed by adopting a related algorithm, such as a machine learning algorithm, a deep learning algorithm, a clustering algorithm, a classification algorithm and the like, and the related algorithm is used for classifying, clustering, searching and other application tasks on the data.

After the feature vector is obtained, step S340 is executed to match feature points, and feature points extracted from different mode data are matched. The method for matching the feature points with high efficiency and high precision is realized by calculating the similarity between adjacent points or using a deep neural network and the like, and is concretely as follows:

Firstly, mutual information and a symmetrical neighbor algorithm are introduced, and similarity between feature descriptors is calculated by adopting the mutual information algorithm, so that the problems of under-fitting and over-fitting in the traditional method are effectively solved. In addition, a symmetrical neighbor algorithm is introduced to correct the matching between different points, so that the accuracy and the robustness of feature matching are further improved.

And secondly, a local feature matching algorithm is used, and is widely applied to the field of computer vision, so that the matching efficiency can be improved and the matching accuracy can be ensured. In the application, the adjacent point pairs with strict comparison relations are obtained by utilizing the local feature matching algorithm, thereby realizing the matching between the feature points and calculating the similarity between the adjacent points.

And thirdly, based on a dissipative flow and a point pair matching algorithm, introducing the dissipative flow algorithm to optimize the characteristic matching weight, so that the method is more suitable for data matching of different modes, and the accuracy and the robustness of a matching result are further improved. Meanwhile, the point-to-match algorithm is used for solving the problems under special conditions, such as unreasonable pair matching, global optimal matching and the like, the priori knowledge of the target field is fully utilized, and the accuracy and the robustness of the matching result are improved.

Finally, by combining a deep learning method and combining the technical means of deep learning and computer vision, the deep learning method is used in multi-mode data matching and registration to express and learn the feature descriptors of different mode data, and more accurate similarity information is obtained.

After the feature points are matched, step S350 is performed, the model registration is performed, and the feature points obtained through calculation are used for registering the data of different modes by using an Iterative Closest Point (ICP) algorithm or a Probability Data Association (PDA) algorithm, etc. And the registered data is corrected by using methods such as Laplacian smoothing, three-dimensional sampling and grid laplacian so as to reduce random noise, processing errors and the like generated by the system.

After model registration, step S360 is performed, and error correction is performed on data of different modes according to characteristics and data information of different data types, so as to ensure consistency and accuracy of the data, and specifically as follows:

for image data, there may be some deviation in the position and posture of the same object in different images because of being influenced by factors such as the image capturing apparatus and environmental conditions. Therefore, in matching and registration, it is necessary to perform denoising and distortion correction on an image using the difference between the image content and the feature point descriptors. The image correction method mainly comprises the methods of histogram equalization, gray level correction, gaussian filtering, edge detection and the like, and the method is used for converting the feature point descriptors of the image into more reliable feature vectors so as to reduce the influence of errors.

For laser radar data, although the accuracy of the data is high, the laser radar data is affected by factors such as the shape and texture of the surface of an object, and certain deviation may exist between different surfaces of the same object. Therefore, in matching and registration, the laser radar measurement error is used to sample, interpolate, filter, classify and the like the image, thereby improving the reliability and accuracy of the laser data.

For other sensing data, such as GPS data, in the GPS data, because the influence of positioning accuracy and positioning error is larger, fitting correction is needed to be carried out on the GPS data when matching and alignment are carried out, and the common GPS data correction method mainly comprises the methods of inertial sensor-based data compensation, model-based topology correction, distance estimation-based correction and the like, so that the accuracy and the accuracy of the GPS data can be improved.

After error correction, the application also carries out step S370 on the data, and the effect evaluation specifically includes:

for image data, there may be a certain deviation in the position and posture of the same object in different images due to the influence of factors such as image acquisition equipment, environmental conditions, and the like. Therefore, image correction is required when matching and registration are performed. The image correction method in the application comprises histogram equalization, gray level correction, gaussian filtering, edge detection and the like. Taking histogram equalization as an example, the method can transfer the gray value distribution of the image to a more balanced state, so that the contrast of the image is more obvious, and the feature point detection and matching are easier. The method comprises the following steps: firstly, calculating gray histograms of an original image and an image to be matched; secondly, equalizing the gray level histogram to ensure that the gray level value distribution is more uniform; mapping the equalized gray level histogram onto the original image and the image to be matched to obtain an image with equalized histogram; and finally, detecting and matching the characteristic points in the image after the histogram equalization.

For laser radar data, due to the influence of factors such as the shape and texture of the surface of an object, certain deviation can exist between different surfaces of the same object. Therefore, laser radar data correction is required when matching and registration are performed. The correction method in the application comprises filtering, sampling, interpolation, classification processing and the like. Taking sampling as an example, the method can sample the original data to a certain resolution, thereby improving the density and accuracy of the data. The specific process comprises the following steps: firstly, sampling original laser data, and reducing the density of point cloud; secondly, decryption and interpolation processing are carried out, sampled data are optimally represented, and accuracy and readability of the data are guaranteed; thirdly, classifying the optimized data, and improving the reliability and reusability of the data; and finally, detecting and matching the characteristic points in the optimized data.

For GPS data, it is necessary to correct errors due to its greater accuracy and positioning errors. The correction method comprises data compensation of an inertial sensor, topology correction of a model, distance estimation correction and the like. Taking distance estimation correction as an example, the method improves the accuracy of the data by performing correlation analysis on the distance estimator and the GPS data and correcting the GPS data. The specific process is as follows: firstly, acquiring GPS data and distance estimation data; secondly, the GPS data is subjected to filtering processing, so that the reliability and accuracy of the data are improved; thirdly, carrying out correlation analysis on the filtered GPS data and the distance estimation data, and determining a linear relation between the GPS data and the distance estimation data; and correcting and optimizing the GPS data by utilizing the linear relation, and finally, detecting and matching the characteristic points in the optimized GPS data.

And the algorithm is tested and verified by evaluating the matching and registering results of the multi-mode data, so that the stability and the practicability of the algorithm are further improved.

In some examples of the present application, the inputting the data to be marked into a pre-established target detection model to obtain a target detection result includes: and using an interactive labeling algorithm in the target detection model to respond to real-time feedback of labeling personnel and optimize the target detection result.

In the application, the interactive labeling algorithm refers to interaction between people and the algorithm, namely interaction and interaction relation. In target modeling and detection, the implementation of interactivity generally includes interactive modes such as user real-time feedback, error correction, automatic completion and the like. For example, in the target detection process, when the algorithm detects a certain object, the user can feed back the detection result to indicate whether the detection is correct, whether the detection is missed, and the like, so as to adjust the parameters and the model of the algorithm. Interactive labeling and intelligent advice, while having some similarities, are different in meaning and implementation in target modeling and detection. Interaction labeling generally requires interaction between a person and an algorithm to achieve, while intelligent suggestion is an automated design based on the algorithm, and has higher automation and intelligent level.

The embodiment of the application also provides a combined target labeling device 400 based on the image and the laser point cloud, as shown in fig. 4, a schematic structural diagram in the embodiment of the application is provided, and the device 400 at least comprises: a data to be annotated acquisition unit 410, a detection result acquisition unit 420, a segmentation result acquisition unit 430, and an annotation boundary result acquisition unit 440, wherein:

in one embodiment of the present application, the data to be annotated obtaining unit 410 is specifically configured to: obtaining data to be marked for the same matching target according to image data acquired by a vision sensor and laser point cloud data acquired by a laser radar;

in one embodiment of the present application, the detection result obtaining unit 420 is specifically configured to: inputting the data to be marked into a pre-established target detection model to obtain a target detection result;

in one embodiment of the present application, the segmentation result acquisition unit 430 is specifically configured to: performing target segmentation processing on the target detection result to obtain at least one target segmentation result;

in one embodiment of the present application, the labeling boundary fruit obtaining unit 440 is specifically configured to: inputting the at least one target segmentation result into a pre-established target optimization model, and generating and optimizing a labeling boundary in the target segmentation result.

It can be understood that the above-mentioned combined target labeling device based on image and laser point cloud can implement each step of the combined target labeling method based on image and laser point cloud provided in the foregoing embodiment, and the relevant explanation about the combined target labeling method based on image and laser point cloud is applicable to the combined target labeling device based on image and laser point cloud, which is not described herein again.

Fig. 5 is a schematic structural view of an electronic device according to an embodiment of the present application. Referring to fig. 5, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 5, but not only one bus or type of bus.

And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form a combined target labeling device based on the image and the laser point cloud on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:

obtaining data to be marked for the same matching target according to image data acquired by a vision sensor and laser point cloud data acquired by a laser radar; inputting the data to be marked into a pre-established target detection model to obtain a target detection result; performing target segmentation processing on the target detection result to obtain at least one target segmentation result; inputting the at least one target segmentation result into a pre-established target optimization model, and generating and optimizing a labeling boundary in the target segmentation result.

The method performed by the joint target labeling device based on the image and the laser point cloud disclosed in the embodiment of fig. 1 of the present application can be applied to a processor or implemented by the processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

The electronic device may further execute the method executed by the joint target labeling device based on the image and the laser point cloud in fig. 1, and implement the functions of the joint target labeling device based on the image and the laser point cloud in the embodiment shown in fig. 1, which is not described herein.

The embodiment of the application also proposes a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device comprising a plurality of application programs, enable the electronic device to perform the method performed by the joint target labeling apparatus based on image and laser point cloud in the embodiment shown in fig. 1, and specifically configured to perform:

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. The joint target labeling method based on the image and the laser point cloud is characterized by comprising the following steps of:

obtaining data to be marked for the same matching target according to image data acquired by a vision sensor and laser point cloud data acquired by a laser radar;

inputting the data to be marked into a pre-established target detection model to obtain a target detection result;

Performing target segmentation processing on the target detection result to obtain at least one target segmentation result;

inputting the at least one target segmentation result into a pre-established target optimization model, and generating and optimizing a labeling boundary in the target segmentation result.

2. The method of claim 1, wherein inputting the at least one target segmentation result into a pre-established target optimization model, generating and optimizing labeling boundaries in the target segmentation result, comprises:

performing target detection and segmentation on the image data and the laser point cloud data to obtain an image target detection result and a laser point cloud detection result, and generating a labeling boundary in the target segmentation result;

optimizing the labeling boundary under the condition of inaccuracy of the labeling boundary;

and carrying out merging processing and evaluating the merging result according to the optimized marked boundary result.

3. The method of claim 2, wherein the at least one target segmentation result is input into a pre-established target optimization model, and further comprising, after generating and optimizing the labeling boundary in the target segmentation result:

inputting the target segmentation result with the marked boundary into a pre-established weak supervision learning model;

Obtaining a labeling result of the target segmentation result according to the weak supervision learning model;

analyzing the labeling result and generating labeling suggestions;

responding to the annotation suggestion, and carrying out interactive annotation;

and verifying the interactive labeling result by using a predefined standard.

4. The method of claim 2, wherein the at least one target segmentation result is input into a pre-established target optimization model, and further comprising, after generating and optimizing the labeling boundary in the target segmentation result:

displaying the target segmentation result through a visual interface;

and deriving different preset labeling formats according to the target segmentation result.

5. The method of claim 1, wherein the obtaining the data to be annotated for the same matching target based on the image data collected by the vision sensor and the laser point cloud data collected by the laser radar comprises:

after processing the multi-mode data set formed by the image data and the laser point cloud data, respectively extracting characteristic points of the same matching target of different mode data by adopting a deep learning model and a computer vision model;

matching the feature points in the different mode data by calculating the similarity between adjacent feature points and/or using a deep neural network model;

And registering and correcting the multi-mode data by using a preset feature point registration algorithm to obtain the data to be marked for the same matching target.

6. The method of claim 5, wherein the method further comprises:

and correcting and evaluating the characteristic point registration algorithm according to the matching and registration evaluation result of the multi-mode dataset.

7. The method of claim 1, wherein the inputting the data to be annotated into the pre-established object detection model to obtain the object detection result comprises:

and using an interactive labeling algorithm in the target detection model to respond to real-time feedback of labeling personnel and optimize the target detection result.

8. A joint target labeling device based on an image and a laser point cloud, the device comprising:

the data to be marked acquisition unit is used for acquiring data to be marked for the same matching target according to the image data acquired by the vision sensor and the laser point cloud data acquired by the laser radar;

the detection result acquisition unit is used for inputting the data to be marked into a pre-established target detection model to obtain a target detection result;

The segmentation result acquisition unit is used for carrying out target segmentation processing on the target detection result to obtain at least one target segmentation result;

the labeling boundary result obtaining unit is used for inputting the at least one target segmentation result into a pre-established target optimization model, and generating and optimizing labeling boundaries in the target segmentation result.

9. An electronic device, comprising: a processor based on the combined target labeling device of the image and the laser point cloud; and a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the method of any of claims 1 to 7.

10. A computer readable storage medium storing one or more programs, which when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform any of the methods of claims 1-7.