CN116524114A

CN116524114A - Automatic labeling method for automatic driving data, electronic equipment and storage medium

Info

Publication number: CN116524114A
Application number: CN202310353975.1A
Authority: CN
Inventors: 莫柠锴; 唐圣钦; 何明
Original assignee: DeepRoute AI Ltd
Current assignee: DeepRoute AI Ltd
Priority date: 2023-03-21
Filing date: 2023-03-21
Publication date: 2023-08-01

Abstract

The application discloses an automatic labeling method of automatic driving data, electronic equipment and a storage medium. The method comprises the following steps: acquiring an original automatic driving data set; preprocessing an original autopilot data set to obtain a scene reconstruction data set, wherein the preprocessing comprises inputting the original autopilot data set into a multi-task network model to output partial scene reconstruction data in the scene reconstruction data set; performing a 4D scene reconstruction using the scene reconstruction dataset to reconstruct a 4D scene; and obtaining a labeling result by using the 4D scene so as to realize automatic labeling. Through the mode, automatic labeling of the automatic driving data of the mass production can be achieved, automatic labeling efficiency is improved, and automatic labeling precision requirements are met.

Description

Automatic labeling method for automatic driving data, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of automatic driving data labeling technologies, and in particular, to an automatic labeling method for automatic driving data, an electronic device, and a storage medium.

Background

The neural network needs a lot of labeling data to be supervised to train, for example, training a 3D detection network, and a lot of manpower is needed to manually label the interested targets (vehicles, pedestrians, etc.) in the 3D space. In the field of autopilot, such labeling requirements are typically on the order of over millions of frames, with high manual labeling costs and low efficiency. The current Auto-labeling method uses some trained perceptual models, called large models, which detect the object of interest from the image and point cloud, and then manually check and correct the detection result.

However, detecting a 3D object based on a large model of an image, the position error is usually 15%, and it is difficult to satisfy the true value, and there is still a problem in performing high-precision automatic labeling on mass production data.

Disclosure of Invention

The application provides at least one automatic marking method of automatic driving data, electronic equipment and a storage medium.

The first aspect of the application provides an automatic labeling method for automatic driving data, which comprises the following steps: acquiring an original automatic driving data set; preprocessing the original automatic driving data set to obtain a scene reconstruction data set, wherein the preprocessing comprises inputting the original automatic driving data set into a multi-task network model so as to output partial scene reconstruction data in the scene reconstruction data set; performing a 4D scene reconstruction using the scene reconstruction dataset to reconstruct a 4D scene; and acquiring a labeling result by using the 4D scene so as to realize automatic labeling.

Wherein the raw autopilot data set comprises: multi-view camera data, forward lidar data, and gps\imu information; the scene reconstruction data set comprises multi-view camera image data, multi-view camera internal and external parameters, forward laser point cloud data, a segmentation result, a depth map and a visual descriptor, wherein the segmentation result, the depth map and the visual descriptor are obtained by inputting the original autopilot data set into the multi-task network model and outputting the original autopilot data set.

Wherein the preprocessing further comprises: acquiring the internal and external parameters of the multi-view camera from the original autopilot data set by using an SFM algorithm; and processing the forward laser radar data in the original automatic driving data set to obtain the forward laser point cloud data.

Wherein said performing a 4D scene reconstruction using said scene reconstruction dataset comprises: performing constraint optimization on the first neural network model by using the scene reconstruction data set and the road surface prior constraint condition to generate a road surface sub-scene; performing constraint optimization on a second neural network model by using the pavement sub-scene and the scene reconstruction data set to generate a static sub-scene; and performing constraint optimization on a third neural network model for each dynamic target in at least one dynamic target in the 4D scene by using the static sub-scene and the scene reconstruction data set to generate at least one dynamic target sub-scene.

The obtaining the labeling result by using the 4D scene includes: obtaining a segmentation result marking result and a depth icon marking result by utilizing the pavement sub-scene and the static sub-scene; and obtaining at least one dynamic target labeling result by utilizing the at least one dynamic target sub-scene.

Wherein, the obtaining the labeling result by using the 4D scene further includes: and acquiring other labeling results by utilizing the static sub-scene and the at least one dynamic target sub-scene.

Wherein the method further comprises: verifying the marking result to obtain a verified marking result, wherein the verified marking result is used for adjusting the multi-task network model; and responding to the number of the verified marking results being larger than a preset value, and adjusting the multi-task network model by utilizing the verified marking results.

The labeling results comprise a segmentation result labeling result, a depth map labeling result and at least one dynamic target labeling result; the verifying the labeling result comprises the following steps: and correcting or deleting at least one of the segmentation result labeling result, the depth map labeling result and the at least one dynamic target labeling result.

A second aspect of the present application provides an electronic device, the electronic device including a processor and a memory, the memory storing program data, the processor being configured to execute the program data to implement the automatic labeling method of autopilot data described in the first aspect.

A third aspect of the present application provides a computer-readable storage medium storing a computer program for implementing the automatic labeling method of automated driving data described in the first aspect above, when executed by a processor.

The beneficial effects of the embodiment of the application are that: in contrast to the prior art, the automatic labeling method for the automatic driving data provided by the application comprises the following steps: by acquiring an original autopilot dataset; preprocessing an original autopilot data set to obtain a scene reconstruction data set, wherein the preprocessing comprises inputting the original autopilot data set into a multi-task network model to output partial scene reconstruction data in the scene reconstruction data set; performing a 4D scene reconstruction using the scene reconstruction dataset to reconstruct a 4D scene; and acquiring a labeling result by using a 4D scene so as to realize an automatic labeling mode, thereby realizing automatic labeling of the automatic driving data of the mass production, improving the automatic labeling efficiency and meeting the requirement of automatic labeling precision.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a flowchart of an embodiment of an automatic labeling method for autopilot data provided in the present application;

fig. 2 is an application scenario schematic diagram of an automatic labeling method of automatic driving data provided in the present application;

FIG. 3 is a schematic structural diagram of an embodiment of an electronic device provided herein;

fig. 4 is a schematic structural diagram of an embodiment of a computer readable storage medium provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are for purposes of illustration only and are not limiting. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present application are shown in the drawings. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

Referring to fig. 1, fig. 1 is a flowchart of an embodiment of an automatic labeling method for automatic driving data provided in the present application. It should be noted that, if there are substantially the same results, the method of the present application is not limited to the flow sequence shown in fig. 1. As shown in fig. 1, the method includes:

s11: an original autopilot dataset is acquired.

The method comprises the steps of acquiring an original autopilot data set, for example, acquiring mass-produced autopilot data through a mass-produced autopilot system, wherein the mass-produced autopilot system is provided with a sensor device, the original autopilot data set is constructed by receiving information acquired by the sensor device, the sensor device comprises an image sensor, a radar sensor, a GPS\IMU and the like, and the radar sensor can be radar equipment which is used for autopilot and meets the precision requirement and is used for providing point cloud perception. Image data may be acquired using an image sensor, such as a camera or the like. The point cloud data is collected using radar sensors, such as millimeter wave radar, lidar, etc. The image sensor, radar sensor, and GPS/IMU may be mounted on a mobile device, such as an autonomous vehicle or the like. The lidar may include a mechanical lidar, a semi-solid state lidar, a solid state lidar, and the like.

In an application scene, an automatic driving vehicle runs on a road, and image data for describing an environment space where vehicle-mounted equipment is located is acquired through an image sensor arranged on the automatic driving vehicle to obtain an initial data set; acquiring point cloud data for describing an environment space where the vehicle-mounted equipment is located by using a radar sensor to obtain an initial data set; and acquiring GPS/IMU data for describing the position information of the vehicle-mounted equipment by using the GPS/IMU to obtain an initial data set.

Each sensor perceives and captures an initial data set for describing the environment space where the vehicle-mounted equipment is located, each initial data set corresponds to one sensor, and at least two sensors capture at least two initial data sets, wherein the types of the original automatic driving data sets comprise, but are not limited to, multi-view camera data, forward laser point cloud data and GPS\IMU data at each moment.

S12: the method comprises the steps of preprocessing an original autopilot data set to obtain a scene reconstruction data set, wherein the preprocessing comprises the steps of inputting the original autopilot data set into a multi-task network model to output partial scene reconstruction data in the scene reconstruction data set.

The original autopilot data set can be converted and output into scene reconstruction data in a preprocessing mode. Wherein preprocessing includes inputting the original autopilot dataset into a multi-tasking network model to output partial scene reconstruction data in the scene reconstruction dataset, e.g., preprocessing the offline original data using a multi-tasking network, outputting partial scene reconstruction data by the multi-tasking network, e.g., segmentation results, depth maps, target detection and tracking results, visual descriptors, etc.

In this embodiment, the original autopilot dataset includes multi-view camera data, forward lidar data, and gps\imu information; the scene reconstruction data set comprises multi-view camera image data, multi-view camera internal and external parameters, forward laser point cloud data, segmentation results, a depth map and a visual descriptor.

Preprocessing an original autopilot data set to obtain a scene reconstruction data set, namely preprocessing multi-view camera data, forward laser radar data and GPS\IMU information according to a preset method to obtain the scene reconstruction data set, for example preprocessing the multi-view camera data to obtain multi-view camera image data. The scene reconstruction data set obtained after preprocessing comprises multi-view camera image data, multi-view camera internal and external parameters, forward laser point cloud data, segmentation results, a depth map and a visual descriptor.

The segmentation result, the depth map and the visual descriptor are obtained by inputting an original automatic driving data set into a multi-task network model and outputting the data.

The original autopilot data set is input into the multi-task network model, that is, the original autopilot data is preprocessed by the task network model, so that partial scene reconstruction data in the scene reconstruction data set can be obtained, for example, a segmentation result, a depth map, a visual descriptor, a target detection and tracking result and the like.

S13: using the scene reconstruction dataset, a 4D scene reconstruction is performed to reconstruct the 4D scene.

According to the scene reconstruction data set obtained after preprocessing, the 4D scene reconstruction can be divided into three sub-tasks: the method comprises the steps of road surface sub-scenes, static sub-scenes and at least one dynamic target sub-scene, further completing reconstruction of a 4D scene, and obtaining a labeling result through the 4D reconstructed scene.

Taking the 4D scene reconstruction as an example to be divided into a pavement sub-scene, a static sub-scene and at least one dynamic target sub-scene for explanation, in the 4D scene reconstruction, firstly, constraint optimization is carried out on a first neural network model by utilizing a scene reconstruction data set and a pavement prior constraint condition to generate the pavement sub-scene, and the first neural network model after constraint optimization is output as a second neural network model. The sub-optimal 4D scene reconstruction result is obtained by relying on multi-view camera image data, segmentation results, depth maps and road surface priori constraint conditions for the non-point cloud coverage area, and further constraint on the sub-optimal 4D scene reconstruction result is completed again through forward laser point cloud data, so that an overall consistent road surface sub-scene is obtained, and in addition, the multi-view camera image data is further optimized through constraint of a visual descriptor, so that the problem caused by image noise is avoided.

And then, carrying out constraint optimization on the second neural network model by utilizing the pavement sub-scene and the scene reconstruction data set to generate a static sub-scene. And performing further constraint optimization on the second neural network model by using the scene reconstruction data set, so as to generate a static sub-scene, and outputting the constraint optimized second neural network model as a third neural network model.

And then, using the static sub-scene and the scene reconstruction data set to perform constraint optimization on the third neural network model for each dynamic target in at least one dynamic target in the 4D scene, so that the third neural network model accords with the physical rule of the real object.

S14: and obtaining a labeling result by using the 4D scene so as to realize automatic labeling.

The method comprises the steps of obtaining marking results by using a pavement sub-scene, a static sub-scene and at least one dynamic target sub-scene, wherein the marking results and the depth icon marking results of the segmentation results are mainly obtained by using the pavement sub-scene and the static sub-scene, and obtaining at least one dynamic target marking result by using the at least one dynamic target sub-scene.

In this embodiment, the original autopilot dataset is obtained; preprocessing an original autopilot data set to obtain a scene reconstruction data set, wherein the preprocessing comprises inputting the original autopilot data set into a multi-task network model to output partial scene reconstruction data in the scene reconstruction data set; performing a 4D scene reconstruction using the scene reconstruction dataset to reconstruct a 4D scene; and acquiring a labeling result by using a 4D scene so as to realize an automatic labeling mode, thereby realizing automatic labeling of the automatic driving data of the mass production, improving the automatic labeling efficiency and meeting the requirement of automatic labeling precision.

In some embodiments, the raw autopilot data set includes multi-view camera data, forward lidar data, and gps\imu information; the scene reconstruction data set comprises multi-view camera image data, multi-view camera internal and external parameters, forward laser point cloud data, a segmentation result, a depth map and a visual descriptor, wherein the segmentation result, the depth map and the visual descriptor are obtained by inputting an original automatic driving data set into a multi-task network model and outputting the original automatic driving data set.

Types of raw autopilot data sets include, but are not limited to, multi-view camera data, forward laser point cloud data, and gps\imu data for each instant.

In some embodiments, the preprocessing further comprises: acquiring internal and external parameters of the multi-view camera from an original autopilot data set by using an SFM algorithm; and processing the forward laser radar data in the original automatic driving data set to obtain forward laser point cloud data.

The original autopilot data set is preprocessed to obtain a scene reconstruction data set, wherein the preprocessing further comprises the step of acquiring partial scene reconstruction data from the original autopilot data set by using an SFM algorithm, namely using the SFM algorithm (Structure From Motion, three-dimensional reconstruction method), wherein the partial scene reconstruction data can be, for example, internal parameters, external parameters, sparse point clouds, other auxiliary information and the like of a multi-view camera.

The forward laser radar data in the original automatic driving data set is processed, so that forward laser point cloud data, such as semantic information and physical attributes of the point cloud data, namely dynamic and static attributes, category information, speed, acceleration, related physical quantities and the like, can be obtained.

In some embodiments, using the scene reconstruction dataset, performing a 4D scene reconstruction includes: performing constraint optimization on the first neural network model by using the scene reconstruction data set and the road surface prior constraint condition to generate a road surface sub-scene; performing constraint optimization on the second neural network model by using the pavement sub-scene and the scene reconstruction data set to generate a static sub-scene; and performing constraint optimization on the third neural network model for each dynamic target in at least one dynamic target in the 4D scene by using the static sub-scene and the scene reconstruction data set to generate at least one dynamic target sub-scene.

The 4D scene reconstruction is executed by utilizing the scene reconstruction data set, specifically, constraint optimization is carried out on the first neural network model by utilizing the scene reconstruction data set and road surface priori constraint conditions, for example, a neural network is optimized by utilizing multi-view camera image data, multi-view camera internal and external parameters, forward laser point cloud data, segmentation results, depth maps and visual descriptors and road surface priori constraint, and the first neural network is obtained, so that the geometric characteristics and semantic characteristics Fu Gedian clouds, images, segmentation results, depth maps, visual descriptors, road surface priors and other constraints of the first neural network are obtained. Further, generating a pavement sub-scene, in a point cloud-free coverage area, relying on multi-view camera image data, a segmentation result, a depth map and pavement priori constraint conditions to obtain a suboptimal 4D scene reconstruction result, completing further constraint on the suboptimal 4D scene reconstruction result through forward laser point cloud data again to obtain an overall consistent pavement sub-scene, outputting a constraint-optimized first neural network model as a second neural network model, shortening training time to 1/5-1/3 of original training time, and further enabling the multi-view camera image data to be further optimized through constraint of a visual descriptor, so that problems caused by image noise are avoided.

And then, performing constraint optimization on the second neural network model by using the pavement sub-scene and the scene reconstruction data set to generate a static sub-scene, namely outputting the first neural network model subjected to constraint optimization as the second neural network model based on the obtained pavement sub-scene, and further optimizing the complete static scene by taking the second neural network model, multi-view camera image data, multi-view camera inner and outer parameters, forward laser point cloud data, segmentation results, a depth map and a visual descriptor as constraints. And on the basis of the second neural network model, performing further constraint optimization on the second neural network model by using the scene reconstruction data set, so as to generate a static sub-scene, and outputting the constraint optimized second neural network model as a third neural network model.

And finally, reconstructing a data set by using the static sub-scene and the scene, outputting the constrained and optimized second neural network model to a third neural network model for each dynamic target in at least one dynamic target in the 4D scene, namely, based on the obtained pavement sub-scene, and further generating at least one dynamic target sub-scene by taking the third neural network model, multi-view camera image data, multi-view camera internal and external parameters, forward laser point cloud data, a segmentation result, a depth map and a visual descriptor as constraints, namely, optimizing one neural network model for each dynamic target, so that the neural network model of each dynamic target accords with the physical rule of a real object.

In some embodiments, using a 4D scene, obtaining the annotation result includes: obtaining a segmentation result marking result and a depth icon marking result by utilizing the pavement sub-scene and the static sub-scene; and obtaining at least one dynamic target labeling result by using at least one dynamic target sub-scene.

The 4D scene may include a pavement sub-scene, a static sub-scene, at least one dynamic target sub-scene, and the like, and the labeling result may be obtained by using the 4D scene, and the labeling result corresponding to the segmentation result and the depth map may be obtained by using the pavement sub-scene and the static sub-scene, which may also be referred to as a segmentation result labeling result and a depth icon labeling result, for example, the 2D labeling segmentation result and the depth map under different viewing angles are obtained by Rendering (Rendering) using the second neural network model and the third neural network model. The at least one dynamic target sub-scene may be used to obtain a labeling result corresponding to the at least one dynamic target, which is also referred to as a dynamic target labeling result, for example, the output of the third neural network model at different moments is used as state information of the labeling result target of the 4D dynamic target, which may be, for example, the type, size, orientation, speed, etc. of the target.

In some embodiments, using the 4D scene, obtaining the annotation result further comprises: and acquiring other labeling results by utilizing the static sub-scene and at least one dynamic target sub-scene.

Other labeling results may also be obtained using the static sub-scene and the at least one dynamic target sub-scene. For example, the second neural network model, the third neural network model, and other information of interest may be derived simultaneously.

In some embodiments, the method further comprises: verifying the marking result to obtain a verified marking result, wherein the verified marking result is used for adjusting the multi-task network model; and responding to the number of the verified marking results being larger than a preset value, and adjusting the multi-task network model by utilizing the verified marking results.

In this embodiment, verifying the labeling result includes correcting or deleting at least one of the segmentation result labeling result, the degree icon labeling result, and the at least one dynamic target labeling result. And if the labeling result has the result data which does not meet the verification standard, correcting or deleting the bad labeling result.

And adding qualified marking results after verification into the data set, and when a certain amount of marking result data is added in the data set, adjusting the multi-task network model by using the marking result data to perform optimization iteration.

As described above, the annotation result is obtained using a 4D scene, and the 4D scene may include a road sub-scene, a static sub-scene, at least one dynamic target sub-scene, etc., and in some embodiments, the annotation result includes a segmentation result annotation result, a depth map annotation result, and at least one dynamic target annotation result; verifying the labeling result, including: at least one of the segmentation result labeling result, the depth map labeling result and the at least one dynamic target labeling result is corrected or deleted.

The application is applicable to the following application scenario, and is described with reference to fig. 2:

first, an original autopilot dataset is acquired and preprocessed.

And then inputting the original autopilot data set into a multi-task network model to output partial scene reconstruction data in the scene reconstruction data set, simultaneously acquiring internal parameters and external parameters of the multi-view camera from the original autopilot data set by using an SFM algorithm, and processing forward laser radar data in the original autopilot data set to obtain forward laser point cloud data.

Then, using the scene reconstruction dataset, a 4D scene reconstruction is performed to reconstruct the 4D scene.

Specifically, a scene reconstruction data set and a road surface priori constraint condition are utilized to conduct constraint optimization on a first neural network model, a road surface sub-scene is generated, and the first neural network model after constraint optimization is output as a second neural network model. The sub-optimal 4D scene reconstruction result is obtained by relying on multi-view camera image data, segmentation results, depth maps and road surface priori constraint conditions for the non-point cloud coverage area, and further constraint on the sub-optimal 4D scene reconstruction result is completed again through forward laser point cloud data, so that an overall consistent road surface sub-scene is obtained, and in addition, the multi-view camera image data is further optimized through constraint of a visual descriptor, so that the problem caused by image noise is avoided.

And then, carrying out constraint optimization on the second neural network model by utilizing the pavement sub-scene and the scene reconstruction data set to generate a static sub-scene. And on the basis of the second neural network model, performing further constraint optimization on the second neural network model by using the scene reconstruction data set, so as to generate a static sub-scene, and outputting the constraint optimized second neural network model as a third neural network model.

And finally, reconstructing a data set by using the static sub-scene and the scene, and performing constraint optimization on the third neural network model for each dynamic target in at least one dynamic target in the 4D scene so that the third neural network model accords with the physical rule of the real object.

And then, obtaining a labeling result by using the 4D scene so as to realize automatic labeling.

The method comprises the steps of obtaining a segmentation result marking result and a depth map marking result by utilizing a road sub-scene and the static sub-scene, and obtaining at least one dynamic target marking result by utilizing at least one dynamic target sub-scene. In addition, the method may further include obtaining other labeling results by using the static sub-scene and the at least one dynamic target sub-scene.

And then, checking the marking result to obtain a checked marking result, wherein the checked marking result is used for adjusting the multi-task network model.

And if the number of the verified marking results is larger than a preset value, adjusting the multi-task network model by utilizing the verified marking results.

According to the flow, the adjustment and update of the multi-task network model and the automatic labeling of the automatic driving data are continuously carried out.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an embodiment of an electronic device provided in the present application. The electronic device 30 comprises a memory 302 and a processor 301 coupled to each other, the processor 301 being configured to execute program instructions stored in the memory 302 to implement the steps of any of the automatic labeling method embodiments of autopilot data described above. In one particular implementation scenario, electronic device 30 may include, but is not limited to: the microcomputer and the server, and the electronic device 30 may also include a mobile device such as a notebook computer and a tablet computer, which is not limited herein.

In particular, the processor 301 is configured to control itself and the memory 302 to implement the steps of the automatic labeling method embodiment of any of the above-described autopilot data. The processor 301 may also be referred to as a CPU (Central Processing Unit ). The processor 301 may be an integrated circuit chip with signal processing capabilities. The processor 301 may also be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 301 may be commonly implemented by an integrated circuit chip.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an embodiment of a computer readable storage medium provided in the present application. The computer readable storage medium 40 is for storing program instructions 401, which program instructions 401, when executed by a processor, are for implementing the method of:

acquiring an original automatic driving data set; preprocessing the original automatic driving data set to obtain a scene reconstruction data set, wherein the preprocessing comprises inputting the original automatic driving data set into a multi-task network model so as to output partial scene reconstruction data in the scene reconstruction data set; performing a 4D scene reconstruction using the scene reconstruction dataset to reconstruct a 4D scene; and acquiring a labeling result by using the 4D scene so as to realize automatic labeling.

It can be understood that the program instructions 401, when executed by the processor, are further configured to implement the technical solution of any one of the foregoing automatic labeling method embodiments of automatic driving data, which is not described herein in detail.

In the several embodiments provided in the present application, it should be understood that the disclosed methods and apparatuses may be implemented in other manners. For example, the above-described device embodiments are merely illustrative, e.g., the division of the circuits or elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the patent application, and all equivalent structures or equivalent processes according to the specification and drawings of the present application, or direct or indirect application in other related technical fields, are included in the scope of the patent protection of the present application.

Claims

1. An automatic labeling method for automatic driving data is characterized by comprising the following steps:

acquiring an original automatic driving data set;

preprocessing the original automatic driving data set to obtain a scene reconstruction data set, wherein the preprocessing comprises inputting the original automatic driving data set into a multi-task network model so as to output partial scene reconstruction data in the scene reconstruction data set;

performing a 4D scene reconstruction using the scene reconstruction dataset to reconstruct a 4D scene;

and acquiring a labeling result by using the 4D scene so as to realize automatic labeling.

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the original automatic driving data set comprises multi-view camera data, forward laser radar data and GPS\IMU information;

the scene reconstruction data set comprises multi-view camera image data, multi-view camera internal and external parameters, forward laser point cloud data, a segmentation result, a depth map and a visual descriptor, wherein the segmentation result, the depth map and the visual descriptor are obtained by inputting the original autopilot data set into the multi-task network model and outputting the original autopilot data set.

3. The method of claim 2, wherein the step of determining the position of the substrate comprises,

the preprocessing further comprises:

acquiring the internal and external parameters of the multi-view camera from the original autopilot data set by using an SFM algorithm;

and processing the forward laser radar data in the original automatic driving data set to obtain the forward laser point cloud data.

4. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the performing a 4D scene reconstruction using the scene reconstruction dataset, comprising:

performing constraint optimization on the first neural network model by using the scene reconstruction data set and the road surface prior constraint condition to generate a road surface sub-scene;

performing constraint optimization on a second neural network model by using the pavement sub-scene and the scene reconstruction data set to generate a static sub-scene;

and performing constraint optimization on a third neural network model for each dynamic target in at least one dynamic target in the 4D scene by using the static sub-scene and the scene reconstruction data set to generate at least one dynamic target sub-scene.

5. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

the obtaining the labeling result by using the 4D scene comprises the following steps:

obtaining a segmentation result marking result and a depth icon marking result by utilizing the pavement sub-scene and the static sub-scene;

and obtaining at least one dynamic target labeling result by utilizing the at least one dynamic target sub-scene.

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

the obtaining the labeling result by using the 4D scene further includes:

and acquiring other labeling results by utilizing the static sub-scene and the at least one dynamic target sub-scene.

7. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the method further comprises the steps of:

verifying the marking result to obtain a verified marking result, wherein the verified marking result is used for adjusting the multi-task network model;

and responding to the number of the verified marking results being larger than a preset value, and adjusting the multi-task network model by utilizing the verified marking results.

8. The method of claim 7, wherein the step of determining the position of the probe is performed,

the labeling results comprise a segmentation result labeling result, a depth map labeling result and at least one dynamic target labeling result;

the verifying the labeling result comprises the following steps:

and correcting or deleting at least one of the segmentation result labeling result, the depth map labeling result and the at least one dynamic target labeling result.

9. An electronic device comprising a processor and a memory, wherein the memory has program data stored therein, the processor being configured to execute the program data to implement the method of any of claims 1-8.

10. A computer readable storage medium for storing a computer program for implementing the method according to any one of claims 1-8 when executed by a processor.