CN114648739A

CN114648739A - Data processing method, data processing equipment and automatic driving system

Info

Publication number: CN114648739A
Application number: CN202011507172.XA
Authority: CN
Inventors: 肖鹏川; 邵振雷; 李泽嵩; 向少卿
Original assignee: Hesai Technology Co Ltd
Current assignee: Hesai Technology Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2022-06-21

Abstract

The data processing method comprises the following steps of: performing feature extraction on the original point cloud data to obtain first point cloud feature data; performing feature extraction on original image data to obtain first image feature data; screening the first point cloud characteristic data and the first image characteristic data respectively based on preset target classification candidate information to obtain second point cloud characteristic data and second image characteristic data; fusing the second point cloud characteristic data with the second image characteristic data to obtain fused data; and carrying out target recognition on the fusion data to obtain a target recognition result. By adopting the scheme, the processing efficiency and accuracy of data fusion can be guaranteed with lower implementation cost, and a more accurate target identification result can be obtained.

Description

Data processing method, data processing equipment and automatic driving system

Technical Field

The invention relates to the field of automatic driving, in particular to a data processing method, data processing equipment and an automatic driving system.

Background

At present, automatic driving systems are often equipped with sensor kits composed of different sensors, such as point cloud acquisition devices, image acquisition devices, integrated navigation devices, and the like. The automatic driving system depends on the sensing of the sensor to the environment, the more accurate the environment information obtained by the sensor is, the more timely and accurate the automatic driving system can make driving decisions, and automatic driving is realized.

However, in the existing sensors for the automatic driving system, environmental information sensed by a single sensor is limited, and the complex and variable requirements in a real driving scene cannot be met. For example, although the image capturing device can capture abundant environmental information at a low implementation cost and obtain two-dimensional image data with more accurate environmental characteristics, the image capturing device is susceptible to the influence of ambient light, resulting in poor image quality; however, the technical development of the existing point cloud collection equipment is still not mature enough, the types of devices are limited, if the automatic driving system has a high accuracy requirement, the implementation cost may be greatly increased, and even after expensive devices are used, the required accuracy cannot be achieved.

In order to improve the environment perception capability of the automatic driving system, the technicians in the field perform multilevel and multi-space information complementation and optimization processing on the data of various sensors, so that fusion data with richer information is obtained, and more complex subsequent processing tasks such as target recognition and the like are facilitated.

In practical application, the image acquisition equipment and the point cloud acquisition equipment can complement information, data fusion can be carried out on the image acquisition equipment and the point cloud acquisition equipment, and then target identification is carried out on the fused data. However, a target identified by fusing data has a large deviation from a real target, and the accuracy of a target identification result is low, so that the driving decision of an automatic driving system is influenced.

Disclosure of Invention

In view of this, the present invention provides a data processing method, a data processing device, and an automatic driving system, which can ensure the processing efficiency and accuracy of data fusion with a low implementation cost, and further obtain a more accurate target recognition result.

The invention provides a data processing method, which comprises the following steps:

performing feature extraction on the original point cloud data to obtain first point cloud feature data;

performing feature extraction on original image data to obtain first image feature data;

screening the first point cloud feature data and the first image feature data respectively based on preset target classification candidate information to obtain second point cloud feature data and second image feature data;

fusing the second point cloud characteristic data with the second image characteristic data to obtain fused data;

and carrying out target recognition on the fusion data to obtain a target recognition result.

Optionally, the screening the first point cloud feature data based on preset target classification candidate information includes:

performing target screening on the first point cloud characteristic data based on preset target classification candidate information to obtain three-dimensional target candidate information of the first point cloud characteristic data;

filtering the three-dimensional target candidate information to obtain optimized target candidate information;

and screening the first point cloud characteristic data based on the optimization target candidate information to obtain second point cloud characteristic data.

Optionally, the screening the first image feature data based on preset target classification candidate information to obtain second image feature data includes:

converting the coordinate system of the optimization target candidate information into a pixel coordinate system of the original image data to obtain two-dimensional target candidate information;

and screening the first image characteristic data based on the two-dimensional target candidate information to obtain second image characteristic data.

Optionally, the filtering the three-dimensional target candidate information to obtain optimized target candidate information includes:

evaluating the three-dimensional target candidate information, and filtering the three-dimensional target candidate information based on a preset evaluation condition to obtain intermediate target candidate information; and taking the intermediate target candidate information as the optimization target candidate information.

Optionally, before the intermediate target candidate information is used as the optimization target candidate information, the method further includes:

and performing de-overlapping processing on the intermediate target candidate information.

Optionally, before performing the overlap elimination processing on the intermediate target candidate information, the method further includes:

and matching the intermediate target candidate information with an image range corresponding to the original image data, and filtering the intermediate target candidate information based on a preset matching condition.

Optionally, the matching the intermediate target candidate information with the image range corresponding to the original image data includes:

and converting the intermediate target candidate information into a pixel coordinate system of the original image data, and matching with an image range corresponding to the original image data.

Optionally, before the first point cloud feature data and the first image feature data are respectively filtered based on preset target classification candidate information, the method further includes:

the target classification candidate information is set based on the object class and the object orientation.

Optionally, the performing feature extraction on the original point cloud data to obtain first point cloud feature data includes:

dividing the original point cloud data to obtain a plurality of voxel units;

performing feature extraction on the plurality of voxel units to obtain voxel feature data;

and compressing the voxel characteristic data to obtain the first point cloud characteristic data.

Optionally, the performing feature extraction on the multiple voxel units to obtain voxel feature data includes:

and respectively carrying out local feature extraction on the plurality of voxel units to obtain the voxel feature data.

Optionally, the local feature extraction is performed on the multiple voxel units respectively to obtain the voxel feature data, where the method includes any one of:

respectively carrying out voxel characteristic accumulation on the voxel units to obtain voxel characteristic data;

and respectively carrying out point data logical operation on the plurality of voxel units to obtain the voxel characteristic data.

Optionally, the compressing the voxel characteristic data includes:

and compressing the voxel characteristic data according to a specified direction through a convolutional neural network block, wherein the convolutional neural network block comprises a sparse convolutional layer and a sub-manifold convolutional layer.

Optionally, the data processing method further includes:

before the first point cloud feature data and the first image feature data are respectively screened based on preset target classification candidate information, judging whether the first image feature data have abnormal conditions;

and if the judgment result shows that no abnormal condition exists, screening the first point cloud characteristic data and the first image characteristic data respectively based on preset target classification candidate information.

Optionally, the data processing method further includes:

and if the judgment result is that the abnormal condition exists, performing target identification on the first point cloud characteristic data to obtain a target identification result.

Optionally, the data processing method further includes:

and carrying out target identification on the fusion data through a fully connected neural network to obtain a target identification result.

The invention also provides a data processing device, which is connected with the image acquisition device and the point cloud acquisition device and is suitable for executing the data processing method of any one of the above embodiments, wherein the data processing device comprises:

the data acquisition unit is suitable for acquiring original image data of the image acquisition equipment and original point cloud data of the point cloud acquisition equipment;

the characteristic extraction unit is suitable for extracting the characteristics of the original point cloud data to obtain first point cloud characteristic data and extracting the characteristics of the original image data to obtain first image characteristic data;

the data screening unit is suitable for screening the first point cloud feature data and the first image feature data respectively according to preset target classification candidate information to obtain second point cloud feature data and second image feature data;

the data fusion unit is suitable for fusing the second point cloud characteristic data and the second image characteristic data to obtain fusion data;

and the target identification unit is suitable for carrying out target identification on the fusion data to obtain a target identification result.

The invention also provides an automatic driving system, which comprises point cloud acquisition equipment, image acquisition equipment and data processing equipment, wherein the data processing equipment is respectively connected with the point cloud acquisition equipment and the image acquisition equipment, and the automatic driving system comprises:

the point cloud acquisition equipment is suitable for acquiring original point cloud data;

an image acquisition device adapted to acquire raw image data;

and the data processing equipment is suitable for executing the data processing method of any one of the above embodiments to process the original point cloud data and the original image data.

By adopting the data processing method of the invention, based on the preset target classification candidate information, the first point cloud characteristic data and the first image characteristic data can be respectively screened, the data quantity of the first point cloud characteristic data and the first image characteristic data is effectively reduced, more accurate useful data is reserved, the data redundancy is reduced, the data quality and the data processing efficiency are improved, moreover, because the screening is carried out according to the preset target classification candidate information, the obtained second point cloud characteristic data has stronger correspondence with the second image characteristic data, when the target is identified, the deviation between the target identified by the fusion data and the real target can be reduced, the cross-over ratio between the identified target and the real target is improved, thus under the condition that no expensive precise device is adopted, the fusion data with smaller data quantity and higher accuracy can be obtained, and furthermore, the accuracy of the target recognition result is improved, the processing time of target recognition is shortened, the requirements of an automatic driving system on data accuracy and high efficiency can be met, and the automatic driving capability is enhanced.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the description of the present invention or the prior art will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive effort.

FIG. 1 is a flow chart of a data processing method in an embodiment of the invention;

fig. 2 is a flowchart of a method for screening first point cloud feature data according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for obtaining feature data of a second point cloud according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a de-overlap process in an embodiment of the invention;

FIG. 5 is a flowchart of a method for filtering feature data of a first image according to an embodiment of the present invention;

FIG. 6 is a flowchart of a method for extracting features of raw point cloud data according to an embodiment of the present invention;

FIG. 7 is a schematic diagram illustrating partitioning of raw point cloud data according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a voxel feature stacking process in an embodiment of the invention;

FIG. 9 is a diagram illustrating a dot data logical operation process according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a voxel characteristic data compression process according to an embodiment of the present invention;

FIG. 11 is a diagram illustrating an anchor region according to an embodiment of the present invention;

fig. 12 is a schematic diagram of target feature extraction performed by a regional candidate network according to an embodiment of the present invention;

FIG. 13 is a flow chart of another data processing method in an embodiment of the present invention;

FIG. 14 is a diagram illustrating a target recognition result according to an embodiment of the present invention;

FIG. 15 is a schematic diagram of a mobile carrier with multiple image capture devices present therein in accordance with an embodiment of the present invention;

fig. 16 is a block diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 17 is a block diagram of a data filtering unit according to an embodiment of the present invention;

fig. 18 is a block diagram of an automatic driving system according to an embodiment of the present invention.

Detailed Description

At present, data fusion methods for image acquisition equipment and point cloud acquisition equipment can be divided into a pre-fusion method and a post-fusion method.

The pre-fusion method is a method for processing on an original data level, and may specifically include: the method comprises the steps of projecting original point cloud data of point cloud acquisition equipment onto original image data, amplifying pixel information of the image data from pure color information (such as red, green and blue information) into fusion information of color and depth, and analyzing the data.

The post-fusion method is a method for processing on a data result level, and specifically may include: and respectively carrying out independent data processing on the original image data acquired by the image acquisition equipment and the original point cloud data acquired by the point cloud acquisition equipment to obtain a target sensing result, and carrying out further data analysis by combining the target sensing results of the plurality of sensors.

However, the modalities of the point cloud acquisition device and the image acquisition device are different, for example, the poses of the point cloud acquisition device and the image acquisition device are different; the original data collected by the point cloud collecting equipment and the image collecting equipment are different in format and dimension; the original point cloud data of the point cloud acquisition equipment is more sparse than the original image data of the image acquisition equipment, and the like.

Due to the modal difference, the accuracy of the fusion data is difficult to guarantee no matter the pre-fusion method or the post-fusion method.

Therefore, a person skilled in the art applies a neural network algorithm to a data fusion method, and hopefully, the fusion effect can be improved, however, the structure of a neural network model is complex, the data processing efficiency is reduced, the neural network model needs a large amount of training data and long parameter adjusting time, after the time cost is increased, the robustness of the trained neural network model does not meet the expected requirement, the neural network model cannot adapt to complicated and variable automatic driving scenes, and the data fusion result cannot be ensured to meet the accuracy requirement.

In conclusion, the existing data fusion method of the image acquisition equipment and the point cloud acquisition equipment has some problems, so that the data fusion result is not ideal, the result accuracy obtained by carrying out target recognition on the fusion data is low, and the driving decision of the automatic driving system is further influenced.

In order to solve the above technical problem, an embodiment of the present invention provides a flowchart of a data processing method, and referring to fig. 1, the method may include the following steps:

and S11, performing feature extraction on the original point cloud data to obtain first point cloud feature data.

In a specific implementation, the point cloud collecting device may collect original point data corresponding to an object in an environment by transmitting and receiving signals, where the original point data may include: three-dimensional coordinates and reflectivity. And the collection of each original point data is the original point cloud data. Specifically, the collection of the original point data acquired by the point cloud acquisition device in one sampling period is a frame of original point cloud data.

In practical application, according to application scenarios and requirements, different point cloud feature extraction algorithms or neural networks may be adopted to perform feature extraction on the original point cloud data, which is not limited in the embodiment of the present invention.

In specific implementation, according to practical application scenarios and requirements of the embodiment of the present invention, different selection rules of the original point cloud data may be set, for example, the original point cloud data may be selected frame by frame, so as to perform feature extraction on each frame of original point cloud data; for another example, the original point cloud data may be original point cloud data of a corresponding frame according to a preset frame interval value, so as to perform feature extraction on the original point cloud data of the corresponding frame respectively. The embodiments of the present invention are not limited in this regard.

And S12, performing feature extraction on the original image data to obtain first image feature data.

In practical application, according to application scenarios and requirements, different image feature extraction algorithms or neural networks may be adopted to perform feature extraction on the original image data, which is not limited in the embodiment of the present invention.

In specific implementation, according to practical application scenarios and requirements of the embodiment of the present invention, different selection rules of the original image data may be set, for example, the original image data may be selected frame by frame, and feature extraction may be performed on each frame of original image data, or the original image data may be selected according to other selection rules, which is not limited in the embodiment of the present invention.

And S13, respectively screening the first point cloud characteristic data and the first image characteristic data based on preset target classification candidate information to obtain second point cloud characteristic data and second image characteristic data.

In specific implementation, the target classification candidate information may be set according to an actual application scenario, and the setting of the target classification candidate information is not specifically limited in the embodiment of the present invention.

In an embodiment of the present invention, the target classification candidate information may be set based on the object class and the object orientation, so that the target classification candidate information can characterize the object class and the object orientation.

Specifically, the object categories may include: movable objects (e.g., cars, trucks, bicycles, pedestrians, etc.), as well as relatively stationary objects (e.g., trees, grass, signs, etc.); the object orientation may include: front, left side, right side, back, etc. The object categories and the object orientations are arranged and combined, and corresponding target classification candidate information can be set according to the arrangement and combination result.

Further, in an implementation, if an error between contour sizes corresponding to a plurality of views of the object is within an error tolerance range, one of the views may be selected as the orientation of the object. For example, if the error between the corresponding contour sizes of the front surface and the back surface of the object is within the error tolerance range, one of the front surface and the back surface of the object can be selected as the orientation of the object; for another example, if the error between the corresponding contour sizes of the left object side and the right object side is within the error tolerance, one of the left object side and the right object side may be selected as the object orientation.

The error allowable range can be set according to a specific scene; the object left side can be the left side perspective of the object designated plane after designating one of the sides of the object as a reference (e.g., front side), and the object right side can be the right side perspective of the object designated plane after designating one of the sides of the object as a reference (e.g., front side). The embodiments of the present invention are not limited to this.

For example, if the object categories include: cars and signs, object orientation may include: and then, target classification candidate information corresponding to the front face of the automobile, target classification candidate information corresponding to the left side face of the automobile, target classification candidate information corresponding to the front face of the sign and target classification candidate information corresponding to the left side face of the sign can be set.

It should be understood that the above examples are only illustrative, and in practical applications, the object type and the object orientation may be selected according to practical requirements, and the present invention is not limited thereto.

And S14, fusing the second point cloud characteristic data and the second image characteristic data to obtain fused data.

In a specific implementation, the second point cloud feature data and the second image feature data may be stacked according to a preset direction (e.g., a height direction), and the obtained result is used as fusion data.

And S15, performing target recognition on the fusion data to obtain a target recognition result.

In specific implementation, different target recognition candidate parameters can be set according to an actual scene, and target recognition is performed on the fusion data based on the preset target recognition candidate parameters to obtain a target recognition result.

The target recognition candidate parameter may be set according to at least one of an object type, an object position, and an object orientation. Specifically, the object categories may include: at least one of an automobile, truck, bicycle, and pedestrian; the object position may include: at least one of a length, a width, a height, a three-dimensional coordinate, and an angle; the object orientation may include: at least one of the front, the back, the left side and the right side.

It is understood that one or more objects in the fusion data that may correspond to the environment, that is, one or more targets may exist in the fusion data, and for each target, a target recognition result matching the target may be obtained according to the set target recognition candidate parameters. The number of targets is not particularly limited in the embodiments of the present invention.

In specific implementation, when the data processing method provided by the embodiment of the invention is adopted to fuse the second point cloud feature data and the second image feature data, a better data fusion effect can be achieved, and the accuracy of a target identification result is improved, the Intersection-over-unity ratio (IoU) between a target and a real target identified by the data processing method provided by the embodiment of the invention can exceed 90%, namely, the deviation between the target and the real target is smaller, so that compared with the existing data fusion method, the data processing method provided by the embodiment of the invention can better meet the requirements of an automatic driving system with accuracy requirements on data fusion and target identification, and the application range is wider.

From the above, the screening according to the preset target classification candidate information can be performed to respectively screen the first point cloud feature data and the first image feature data, so as to effectively reduce the data volume of the first point cloud feature data and the first image feature data, retain accurate useful data, reduce data redundancy, and improve data quality and data processing efficiency, and, because the screening according to the preset target classification candidate information makes the obtained second point cloud feature data and the second image feature data have stronger correspondence, and when performing target identification, the deviation between the identified target and the real target can be reduced, and the intersection ratio between the identified target and the real target can be improved, so that under the condition that no expensive precise device is adopted, fusion data with smaller data volume but higher accuracy can be obtained, further the accuracy of the target identification result is improved, and the processing time of target identification is shortened, and the requirements of an automatic driving system on data accuracy and high efficiency can be met, and the automatic driving capability is enhanced.

In conclusion, the processing efficiency and accuracy of data fusion can be guaranteed with low implementation cost through the scheme, and a more accurate target identification result can be obtained.

It should be understood that steps S11 and S12 in the above embodiment are only examples, and are not used to limit the order of extracting features of the original point cloud data and the original image data, that is, in the actual operation process, steps S11 and S12 do not have an inevitable sequence, steps S11 and S12 may be executed simultaneously, or steps S11 and S12 may be executed according to a preset sequence, which is not limited in this embodiment of the present invention.

In a specific implementation, as shown in fig. 2, a flowchart of a method for screening first point cloud feature data is shown, where the method for screening may include:

and S21, performing target screening on the first point cloud feature data based on preset target classification candidate information to obtain three-dimensional target candidate information of the first point cloud feature data.

In specific implementation, based on preset target classification candidate information, a region which is possibly a target in the first point cloud feature data may be screened and labeled, so as to obtain three-dimensional target candidate information of the first point cloud feature data. The three-dimensional target candidate information can represent the distribution condition of target classification candidate information in the first point cloud feature data.

In a specific implementation, if there are a plurality of target classification candidate information, the distribution of each target classification candidate information in the first point cloud feature data is not necessarily the same, so that the number of corresponding three-dimensional target candidate information may also be different. For example, assuming that there are target classification candidate information HX1 and target classification candidate information HX2, when the target classification candidate information HX1 is found in the first point cloud feature data, three-dimensional target candidate information QY1 and QY2 may be obtained, and when the target classification candidate information HX2 is found in the first point cloud feature data, three-dimensional target candidate information QY3 to QY5 may be obtained.

In practical application, a candidate region extraction algorithm may be adopted to label the first point cloud feature data, which is not limited in the embodiment of the present invention.

In a specific implementation, the target classification candidate information may be represented by a candidate box, in other words, the target classification candidate information may be a target classification candidate box. Specifically, since the point cloud data is three-dimensional data, the target classification candidate frame may be a three-dimensional solid frame. For example, the sizes of the target classification candidate frame corresponding to the front side of the automobile and the target classification candidate frame corresponding to the right side of the automobile may be different, which is not limited in the embodiment of the present invention.

The first point cloud feature data can be labeled based on a preset target classification candidate frame, and the labeling condition of the target classification candidate frame in the first point cloud feature data is the distribution condition of target classification candidate information in the first point cloud feature data, so that three-dimensional target candidate information is obtained.

In specific implementation, because the size of the target classification candidate frame and the object range in the first point cloud feature data have a deviation, the target classification candidate frame may be subjected to operations such as area scaling and position shifting in the first point cloud feature data through a frame regression method, so that the target classification candidate frame is more matched with the object range in the first point cloud feature data.

And S22, filtering the three-dimensional target candidate information to obtain optimized target candidate information.

S23, screening the first point cloud feature data based on the optimization target candidate information to obtain second point cloud feature data.

By adopting the scheme, the first point cloud characteristic data is screened, the data redundancy is reduced, more useful characteristic data are reserved, the data quality is improved, and the data processing efficiency is improved.

In a specific implementation, the three-dimensional target candidate information may be evaluated, and filtering may be performed according to an evaluation result to obtain optimization target candidate information, and specifically, as shown in fig. 3, the flowchart of the method for obtaining optimization target candidate information may specifically include the following steps:

and S31, evaluating the three-dimensional target candidate information, and filtering the three-dimensional target candidate information based on a preset evaluation condition to obtain intermediate target candidate information.

In specific implementation, the target classification prediction may be performed on the first point cloud feature data corresponding to each three-dimensional target candidate information to obtain a target classification prediction result corresponding to each three-dimensional target candidate information, and then, the target classification prediction result corresponding to each three-dimensional target candidate information is evaluated through a scoring function.

The target classification prediction result is determined by preset target classification prediction output parameters, and the target classification prediction output parameters can be set according to the requirements of an actual scene.

For example, the target classification prediction output parameter may be set according to at least one of an object class, an object position, and an object orientation. Specifically, the object categories may include: at least one of an automobile, truck, bicycle, and pedestrian; the object position may include: at least one of a length, a width, a height, a three-dimensional coordinate, and an angle; the object orientation may include: at least one of a front side, a back side, a left side, and a right side.

Further, the target classification prediction output parameter corresponding to the object position can be represented by a position correction amount.

In particular implementation, a neural network may be used for target classification prediction. For example, a Region candidate Network (RPN) may be used to perform target classification prediction on the first point cloud feature data.

In a specific implementation, the scoring function may be a non-linear mapping function, such as a Sigmod function, a tanh function, a ReLu function, a softmax function, and the like. The difference between the target classification prediction results can be highlighted through a nonlinear mapping function.

Specifically, the target classification prediction result can be evaluated through a nonlinear mapping function, a corresponding score is output, whether the score meets the evaluation condition is judged, the target classification prediction result corresponding to the score meeting the evaluation condition is determined, corresponding three-dimensional target candidate information is reserved, the three-dimensional target candidate information is filtered, and intermediate target candidate information is obtained.

It is to be understood that the evaluation condition may be set based on an actual situation, for example, the evaluation condition may be a score threshold, and if the score of the target classification prediction result is greater than the score threshold, the evaluation condition is met; the evaluation condition may be a sorting threshold, the target classified prediction results are sorted according to the scores, and if the sequence number of the target classified prediction results is smaller than the sorting threshold, the evaluation condition is met. The invention is not limited in this regard.

And S32, taking the intermediate target candidate information as the optimization target candidate information.

Therefore, the data volume of the three-dimensional target candidate information can be effectively filtered through the score obtained by evaluating the three-dimensional target candidate information, and the data processing efficiency is improved.

In particular implementations, there may be at least partially overlapping intermediate target candidate information that may correspond to the same object in the environment. For example, as shown in fig. 4, an object 4A in the environment is represented by a black circle, and there are 3 pieces of intermediate

target candidate information

411, 412, and 413 that partially overlap.

For this reason, further optimization processing may be performed on the intermediate target candidate information to reduce data redundancy, and specifically, with reference to fig. 3, before the taking the intermediate target candidate information as the optimization target candidate information, the method may further include:

and S33, performing overlap elimination processing on the intermediate target candidate information.

With continued reference to fig. 4, the 3 pieces of intermediate

target candidate information

411, 412, and 413 having partial overlap are subjected to overlap removal processing, and the intermediate target candidate information 412 is screened out as the overlap-removed intermediate target candidate information. The intermediate target candidate information may be subjected to de-overlapping processing by using a non-maximum suppression method.

In a specific implementation, the field angle between the point cloud collecting device and the image collecting device may also be inconsistent, and therefore, the intermediate target candidate information may be further optimized to reduce data redundancy, and with reference to fig. 3, before performing the de-overlapping processing on the intermediate target candidate information, the method may further include:

and S34, matching the intermediate target candidate information with the image range corresponding to the original image data, and filtering the intermediate target candidate information based on a preset matching condition.

The image range is determined according to the length information and the width information of the original image data, and the length information and the width information of the original image data can be determined by the number of pixels of the image acquisition device.

Therefore, after the intermediate target candidate information is filtered, the duplicate removal processing is carried out, the information misjudgment and wrong duplicate removal are avoided, and the data accuracy is improved.

In practical application, although the image acquisition device and the point cloud acquisition device are installed on the same mobile carrier, such as an unmanned vehicle, an automobile, a handheld device, and the like, the positions and the orientations of the image acquisition device and the point cloud acquisition device on the mobile carrier may be different, and in order to obtain the corresponding relationship between the two devices, the image acquisition device and the point cloud acquisition device may be calibrated in a combined manner, so as to determine coordinate conversion parameters between the two devices. Then, according to the coordinate conversion parameter, the intermediate target candidate information after the overlap removal processing may be converted into a pixel coordinate system of the original image data, and may be matched with an image range of the original image data.

Specifically, the coordinate transformation parameters may include external parameters and internal parameters, the external parameters obtained by joint calibration may project the intermediate target candidate information to a coordinate system of the image acquisition device, and the internal parameters of the image acquisition device are used to transform the coordinate system of the intermediate target candidate information from the coordinate system of the image acquisition device to a pixel coordinate system, so as to obtain the two-dimensional intermediate target candidate information. The two-dimensional intermediate target candidate information may then be matched to the image range of the original image data.

The coordinate system of the image capturing device may be a coordinate system formed by the image capturing device corresponding to the field angle.

From the length information, the width information, and the coordinates of the intermediate object candidate information that is two-dimensionally converted, the portion of the intermediate object candidate information projected outside the image data range and the portion projected within the image data range can be determined.

It is understood that the matching condition may be set based on an actual situation, for example, the matching condition may be: and if the part projected in the image data range in the intermediate target candidate information is reserved, filtering the part projected out of the image data range in the intermediate target candidate information, and further effectively reducing the data volume. The invention is not limited in this regard.

By adopting the scheme, the three-dimensional target candidate information is assisted to be filtered through the original image data, the single object is prevented from corresponding to too much target classification candidate information, the data redundancy is reduced, more accurate information is obtained, and compared with other precise devices, the image acquisition equipment has the advantages of lower hardware cost and implementation cost saving.

In a specific implementation, if the target classification candidate information is a three-dimensional stereo frame, the intermediate target candidate information converted into two-dimensional information under pixel coordinates is a two-dimensional plane frame. The two-dimensional plane frame with the small size contains limited useful information, and the data volume is increased, so that the two-dimensional plane frame which does not accord with the size threshold value can be filtered according to the size of the two-dimensional plane frame and the preset size threshold value, the two-dimensional intermediate target candidate information is filtered, and the data volume is reduced.

In a specific implementation, after the optimization target candidate information is obtained by the method according to any one of the above embodiments or by combining multiple embodiments, the features of the area corresponding to the optimization target candidate information in the first point cloud feature data may be extracted to obtain the second point cloud feature data.

In an implementation, as shown in fig. 5, a flowchart of a method for filtering first image feature data is provided, where the method for filtering may include:

and S51, performing target screening on the first point cloud characteristic data based on preset target classification candidate information to obtain three-dimensional target candidate information of the first point cloud characteristic data.

And S52, filtering the three-dimensional target candidate information to obtain optimized target candidate information.

It is understood that steps S51 and S52 can refer to fig. 2 and the description related thereto, and are not repeated herein.

And S53, converting the coordinate system of the optimization target candidate information into the pixel coordinate system of the original image data to obtain two-dimensional target candidate information.

In specific implementation, according to the coordinate conversion parameter obtained by joint calibration, the optimization target candidate information may be converted into the pixel coordinate system of the original image data, so as to obtain two-dimensional target candidate information. Specifically, reference may be made to the description of the related coordinate transformation, which is not described herein again.

And S54, screening the first image characteristic data based on the two-dimensional target candidate information to obtain second image characteristic data.

In specific implementation, after determining a corresponding region of the two-dimensional target candidate information in the first image feature data, corresponding features may be extracted to obtain second point cloud feature data.

Specifically, according to actual requirements and application scenarios, a corresponding neural network may be selected to perform regional feature extraction on the first image feature data. For example, the first image feature data may be subjected to region feature extraction using an ROI Align (region of interest) neural network or a ROI Pooling (region of interest Pooling) neural network.

The ROI Align neural network realizes a region feature extraction method based on quadratic interpolation, and the method calibrates feature positions when extracting region features, so that the extraction of features with incorrect positions can be avoided.

By adopting the scheme, the first image characteristic data is screened by optimizing the target candidate information, so that the data redundancy is reduced, the relevance between the first image characteristic data and the first point cloud characteristic data is enhanced, and the subsequent data fusion is facilitated.

In specific implementation, before the second point cloud feature data and the second image feature data are fused, the second point cloud feature data and the second image feature data can be input into a full connection layer, so that the data dimensions of the second point cloud feature data and the second image feature data are consistent, and the intersection-parallel ratio of the data is improved.

In specific implementation, in order to reduce the data processing amount and the complexity of the data process, the original point cloud data may be spatially divided according to a preset division method, and feature extraction may be performed on the point data in a spatial range, so as to reasonably reduce the data amount in different spatial ranges and ensure data validity.

In order to make the feature extraction process of the original point cloud data more clear and more practical for those skilled in the art, the following description will be made in detail by specific embodiments with reference to the accompanying drawings.

In an embodiment of the present invention, as shown in fig. 6, it is a flowchart of a method for extracting features from original point cloud data, and the method may include:

and S61, dividing the original point cloud data to obtain a plurality of voxel units.

In specific implementation, a preset voxel unit (voxel grid) size parameter is obtained, and the original point cloud data is spatially divided according to the voxel unit size parameter, so that the original point cloud data is divided into adjacent voxel units.

Wherein, a voxel unit can be regarded as a cubic space, and the voxel unit size parameters include: voxel depth, voxel height, and voxel width. The values of the voxel unit depth, the voxel unit height, and the voxel unit width may be the same, and if the voxel unit depth, the voxel unit height, and the voxel unit width are all 0.1m, the obtained voxel unit is a cube, and the values of the voxel unit depth, the voxel unit height, and the voxel unit width may also be different, and if the voxel unit depth, the voxel unit height, and the voxel unit width are sequentially 0.1m, 0.2m, and 0.3m, the obtained voxel unit is a cuboid, which is not limited in this embodiment of the present invention.

In an alternative example, as shown in fig. 7, a schematic diagram of division of original point cloud data is shown. In fig. 7, the original point cloud data is contained in a three-dimensional space P1, and the preset voxel unit size parameters are: voxel unit depth vD, voxel unit height vH, and voxel unit width vW. Dividing the original point cloud data according to the voxel unit size parameters to obtain a plurality of voxel units as shown in fig. 7, where the depth, the height, and the width of each voxel are vD, vH, and vW in turn, and as shown in fig. 7 by reference to a voxel unit P11.

In a specific implementation, the more finely the original point cloud data is divided, i.e. the smaller the voxel unit is, the higher the voxelization accuracy of the obtained original point cloud data is, and accordingly, the data amount required to be processed is increased. In the automatic driving process, objects with close distances can influence driving decisions, so that point data of close-distance objects acquired by the point cloud acquisition equipment is more important, and point data of long-distance objects is less important.

Based on this, in order to reduce the data amount, the original point cloud data may be filtered based on a preset spatial distance parameter, specifically, a distance selection space is established with a certain point as a center, original point data in the distance selection space is reserved, and original point data outside the distance selection space is removed. Wherein the spatial distance parameter may include: a spatial depth distance parameter, a spatial height distance parameter, and a spatial width distance parameter.

Alternatively, a distance selection space may be established with the point cloud acquisition device as a center, the original point data in the distance selection space may be retained, and the original point data outside the distance selection space may be removed.

Therefore, more important original point data (namely original point data with a short distance) is reserved through the preset spatial distance parameter, and unimportant original point data (namely original point data with a long distance) is filtered, so that the data volume of the original point data is reduced, the number of voxel units can be reduced, and the data effectiveness can be guaranteed.

It can be understood that the spatial distance parameter is related to the characteristics and application scenarios of the point cloud acquisition device, for example, for an urban scene and an application in a desert scene, since objects in the urban scene are usually more than objects in the desert scene, in order to distinguish a long-distance object from a short-distance object, the spatial distance parameter set by the urban scene may be smaller than the spatial distance parameter set by the desert scene; for another example, for the top position and the middle position of a mobile loading platform (e.g., a mobile tool such as a vehicle or an airplane), since the object corresponding to the top position is usually less than the object corresponding to the middle position, in order to distinguish a long-distance object from a short-distance object, the spatial distance parameter set when the point cloud collection device is installed at the middle position may be less than the spatial distance parameter set when the point cloud collection device is installed at the top position.

And S62, performing feature extraction on the plurality of voxel units to obtain voxel feature data.

In a specific implementation, the voxel feature data may be obtained by performing local feature extraction on the plurality of voxel units, respectively. A local feature extraction method may be selected according to actual requirements, for example, voxel feature accumulation may be performed on the plurality of voxel units, respectively, to obtain the voxel feature data; for another example, the voxel feature data may be obtained by performing a point-data logical operation on each of the plurality of voxel units. This is not limited by the present specification, which is illustrated by the following examples.

In an implementation example, as shown in fig. 8, which is a schematic diagram of a voxel feature stacking process, taking a voxel unit as an example, the voxel unit PA includes three original point data a1, a2, and A3, and in fig. 8, the three original point data a1, a2, and A3 are represented by colors with different gray levels.

(1) And calculating a geometric center point A0 among all original point cloud data A1, A2 and A3 in the voxel unit PA, and performing logical operation and splicing on each original point data and the geometric center position to obtain amplified point data A1 ', A2 ' and A3 '.

For example, the original point data A1 is [ x, y, z, r ], where x, y, z represent three-dimensional coordinates and r represents reflectivity. The geometric center point a0 between the original point cloud data a1, a2, and A3 is [ x ', y', z '], wherein x', y ', z' represent three-dimensional coordinates of the geometric center point. And obtaining the amplified dot data A1 'as [ x, y, z, x-x', y-y ', z-z', r ] through logical operation and splicing. By analogy, the amplified dot data a2 'and A3' obtained by performing logic operation and splicing on the original dot data a2 and A3 can be obtained, and details are not described herein again.

As another example, the original point data A1 is [ x, y, z, r ], where x, y, z represent three-dimensional coordinates and r represents the inverse index. The geometric center point a0 between the original point cloud data a1, a2, and A3 is [ x ', y', z ', r' ], wherein x ', y', z 'represent three-dimensional coordinates of the geometric center point, and r' is the reflectivity of the geometric center point. And obtaining amplified dot data A1 ' of [ x, y, z, x-x ', y-y ', z-z ', r-r ' ] through logical operation and splicing. By analogy, the amplified dot data a2 'and A3' obtained by performing logic operation and splicing on the original dot data a2 and A3 can be obtained, and details are not described herein again.

(2) The amplified dot data a1 ', a2 ' and A3 ' were input into the first fully-connected layer, obtaining fully-connected features B1, B2 and B3. In the first fully-connected layer, each augmented point data is mapped to a feature space, and the augmented point data is re-expanded through the first fully-connected layer. The first fully-connected layer may include a first linear mapping sublayer, a first Batch Normalization (BN) sublayer, and a first non-linear mapping sublayer, and the first non-linear mapping sublayer may perform non-linear mapping using a ReLU function, a tanh function, and the like.

(3) Fully connected features B1, B2, and B3 are aggregated element by element through a first pooling layer for fully connected features B1, B2, and B3, i.e., elements in the same feature dimension in fully connected features B1, B2, and B3 are aggregated, resulting in a locally accumulated feature C1.

The first pooling layer performs point-by-point processing according to the full-connected features, that is, maximum pooling extraction is performed on each of the full-connected features B1, B2 and B3, so as to extract a maximum value of each of the full-connected features B1, B2 and B3 to represent feature information of original point data in voxel units, for example, the full-connected feature B1 is [2,8,3], and a maximum value obtained by maximum pooling extraction is 8.

(4) Inputting the local accumulated feature C1 and the full-connection features B1, B2 and B3 into a splicing layer, and respectively splicing the local accumulated feature C1 with the full-connection features B1, B2 and B3 to obtain splicing features D1, D2 and D3, so that the original information of the original point data is reserved, the local maximum value information of the original point data is reflected, and the dimensionality and the diversity information of output data are increased.

(5) The stitching features D1, D2, and D3 are input into the second fully connected layer and the second pooling layer, resulting in voxel feature data E.

It should be understood that the foregoing embodiments are merely illustrative, and in practical applications, there may be a plurality of voxel units, and the voxel characteristic data is obtained after the plurality of voxel units are all subjected to voxel characteristic accumulation.

By adopting the voxel characteristic accumulation scheme, high-accuracy voxel characteristic data can be obtained, and the accuracy of subsequent fusion data and a target recognition result is further guaranteed.

In another practical embodiment, as shown in fig. 9, which is a schematic diagram of a logical operation process of point data, taking a voxel unit as an example, the voxel unit PA includes three original point data a1, a2, and A3, and in fig. 9, the three original point data a1, a2, and A3 are represented by colors with different gray levels.

The axis coordinates of the three original point data a1, a2 and A3 in the voxel unit PA are respectively subjected to logical operation (such as mean operation, median operation, variance operation, etc.), so as to obtain a feature point coordinate, which is used as voxel feature data F of the voxel unit PA.

It should be understood that the foregoing embodiments are merely illustrative, and in practical applications, a plurality of voxel units may exist, and the voxel characteristic data is obtained after the plurality of voxel units are subjected to the logical operation of the point data.

Compared with the voxel characteristic accumulation scheme, the logical operation scheme has the advantages that the calculation amount is greatly reduced, the difference between the data accuracy obtained by the voxel characteristic accumulation and the data accuracy obtained by the logical operation is not large, the data accuracy can be regarded as equivalent, and if the automatic driving scene has higher requirements on real-time performance, the point data logical operation scheme is more advantageous.

And S63, compressing the voxel characteristic data to obtain the first point cloud characteristic data.

In specific implementation, the voxel characteristic data is compressed according to a specified direction through a convolutional neural network block (block). The specified direction may include: at least one of a height direction, a width direction, and a depth direction. The compression process may include: extracting the characteristic of the voxel characteristic data in a specified direction (such as a height direction) and screening the voxel characteristic data.

In consideration of the sparsity of the original point cloud data, a large amount of time is consumed by adopting a common convolution layer, and the calculation speed is reduced. To this end, the convolutional neural network block may include a sparse convolution (sparse convolution) layer, and the computation is limited by the sparse convolution only in the region where there is a corresponding input (i.e., there is original point data), while a large number of regions without original point data in the original point cloud data are ignored, thereby reducing the computation amount.

In particular implementations, to limit the expansion of the original point data, further reducing the computational effort, the convolutional neural network block may include a sparse convolution layer and a sub-manifold convolution layer. Wherein the number of sub-manifold convolution layers may be greater than 1, i.e. the convolutional neural network block comprises at least one sub-manifold convolution (sub-manifold convolution) layer.

Therefore, on the basis of sparse convolution, convolution calculation is limited to be only carried out in the area corresponding to the preset output position one by one, expansion of original point data is limited, and calculation efficiency is improved.

In a specific implementation, a plurality of convolutional neural network blocks can be arranged, the characteristics of the voxel characteristic data in the specified direction are gradually extracted, and the voxel characteristic data is screened, so that the loss of useful information is avoided, and the data volume of the first point cloud characteristic data is effectively reduced.

An area formed by the non-specified direction may be referred to as an anchor area, and the first point cloud feature data includes: the feature data of the anchor point area and the feature data of the designated direction.

For example, as shown in fig. 10, a schematic diagram of a voxel characteristic data compression process is shown. Taking a voxel characteristic in the voxel characteristic data as an example, the voxel characteristic Q1 is input into a first convolutional neural network block 1, the output result of the convolutional neural network block 1 is input into a second convolutional neural network block 2, and so on until the nth convolutional neural network block N, the characteristic information of the height is gradually extracted and the data of the height is removed, so as to obtain first point cloud characteristic data Q2, including the characteristic Q21 of the anchor point region formed by the width and the depth and the characteristic Q22 of the height.

The voxel characteristic Q1 may be a voxel characteristic E obtained by stacking voxel characteristics (specifically, refer to the description of the voxel characteristic E), or a voxel characteristic obtained by logical operation of point data (specifically, refer to the description of the voxel characteristic F). The invention is not limited in this regard.

And after the voxel characteristics of the voxel units are compressed, the obtained characteristic set of the anchor point region and the characteristic set of the designated direction are the voxel characteristic data. The feature data of the anchor point region may be considered as a two-dimensional feature map with a specified direction as a viewing angle, and for example, if the specified direction is a height, the feature data of the anchor point region may be considered as a two-dimensional top view.

In specific implementation, feature extraction can be performed on original image data through a deep neural network to obtain first image feature data. For example, a residual neural network (ResNet) may be employed for deep learning to obtain a feature map. Further, the two-dimensional convolutional layer can replace a full connection layer and a nonlinear mapping layer for output in the ResNet network, so that the two-dimensional convolutional layer is used as an output layer of the ResNet network.

In a specific implementation, the first point cloud feature data may be filtered according to the anchor point region, and in order to make a person skilled in the art more clearly understand and implement the filtering process of the first point cloud feature data, the following detailed description is provided by a specific embodiment.

In specific implementation, the first point cloud feature data may be divided according to an anchor point region, and based on preset target classification candidate information, the first point cloud feature data is labeled according to the anchor point region to obtain three-dimensional target candidate information.

For example, referring to fig. 11, first point cloud feature data (not shown in fig. 11) corresponds to four anchor point regions MD1 to MD4, and then the first point cloud feature data may be divided into four parts, and based on preset target classification candidate information HXA and HXB, the first point cloud feature set of the anchor point region MD1 is labeled, so as to obtain three-dimensional target candidate information ZQY1 corresponding to the target classification candidate information HXA and three-dimensional target candidate information ZQY2 corresponding to the target classification candidate information HXA. By analogy, part of the first point cloud feature data of the other three anchor point regions can be labeled, so that the target classification candidate information of the first point cloud feature data is labeled, and the three-dimensional target candidate information of the first point cloud feature data is obtained.

It is to be understood that the above drawings are only schematic illustrations, and in a specific implementation, the target classification candidate information may be represented by a three-dimensional solid frame, and the obtained three-dimensional target candidate information may also be represented by a three-dimensional solid frame, where the three-dimensional target candidate information may be a three-dimensional target candidate frame.

Optionally, bounding box regression may be performed on the three-dimensional target candidate box, so that the three-dimensional target candidate box is more matched with the actual object range.

In specific implementation, according to the anchor point region, a neural network is adopted to perform target classification prediction on the first point cloud feature data corresponding to each three-dimensional target candidate information, so as to obtain a target classification prediction result of each three-dimensional target candidate information.

Specifically, as shown in fig. 12, a schematic diagram of an area candidate network for performing target feature extraction is shown, where, taking one three-dimensional target candidate information in one anchor point area as an example, first point cloud feature data corresponding to the three-dimensional target candidate information is determined, and in the area candidate network, features of point data are extracted through n convolutional neural network blocks (Convblcok), where the convolutional neural network blocks include: convolution layer, normalization layer and non-linear operation layer. Then, the extracted features may be respectively subjected to deconvolution neural network (Deconv) to obtain stacking features with consistent dimensions, and the stacking features may be subjected to a function simulator, such as 1 × 1 two-dimensional convolution neural network (Conv2D), and according to preset target classification prediction output parameters, m types of target classification prediction results may be obtained, such as target classification prediction result 1 to target classification prediction result m shown in fig. 12. By analogy, target classification prediction results corresponding to all three-dimensional target candidate information can be obtained according to the anchor point area.

And evaluating the three-dimensional target candidate information through a scoring function, only keeping the three-dimensional target candidate information corresponding to the highest score for each anchor point region, then performing score sorting on all target classification prediction results, and keeping the three-dimensional target candidate information corresponding to the first k scores to obtain intermediate target candidate information.

Furthermore, the intermediate target candidate information can be matched with an image range corresponding to the original image data, and the intermediate target candidate information is filtered based on a preset matching condition; and performing de-overlapping processing on the intermediate target candidate information. Reference may be made to the description of the related parts, which are not repeated herein.

To facilitate understanding by those skilled in the art, the following is illustrated by actual numerals. For example, assuming that there are 8 target classification candidate information, the feature data (i.e., two-dimensional feature map) of the anchor region is 124 × 176, the anchor region is 1 × 1, and there are 124 × 176 anchor regions in total; labeling each anchor region based on 8 target classification candidate information to obtain 124, 176, 8 and 174592 three-dimensional target candidate information; after the target classification prediction results of the three-dimensional target candidate information are obtained, the target classification prediction results of the 174592 three-dimensional target candidate information are evaluated according to a scoring function, and for each anchor point region, only one three-dimensional target candidate information with the highest score is reserved, namely only 1/8 data amount is reserved, and 21824 three-dimensional target candidate information are reserved. And then sorting according to the scores, and selecting the three-dimensional target candidate information corresponding to the first 3000 scores from large to small, so that the number of the three-dimensional target candidate information is filtered from 174592 to 3000, the three-dimensional target candidate information is efficiently filtered, the data volume is greatly reduced, and the subsequent processing efficiency is improved.

In addition, the intermediate target candidate information may be matched with an image range corresponding to the original image data, the intermediate target candidate information may be filtered based on a preset matching condition, and the intermediate target candidate information may be subjected to de-overlapping processing, and the data amount may also be reduced, for example, the number of three-dimensional target candidate information may be reduced from 3000 to 200.

The intermediate target candidate information obtained through the optimization processing is used as optimization target candidate information, and then the first point cloud feature data is screened based on the optimization target candidate information to obtain second point cloud feature data; converting the coordinate system of the optimization target candidate information into a pixel coordinate system of the original image data to obtain two-dimensional target candidate information; and screening the first image characteristic data based on the two-dimensional target candidate information to obtain second image characteristic data.

In a specific implementation, as shown in fig. 13, it is a flowchart of another data processing method, where steps SA to SE may refer to fig. 1 and related descriptions, which are not repeated herein. Compared with the method shown in fig. 1, in the method shown in fig. 13, before the step of respectively screening the first point cloud feature data and the first image feature data based on preset target classification candidate information, the method further includes: step SF, judging whether the first image characteristic data has abnormal conditions or not; if not, determining that the first image characteristic data has no abnormal condition, and continuing to the step SC.

The abnormal condition of the first image feature data may be set according to actual needs and application scenarios, for example, the abnormal condition of the first image feature data may be: frame dropping of the original image data, and the time difference between the original image data and the original point cloud image is larger than a preset time threshold value. The invention is not limited in this regard.

In particular implementations, the fused data may be used to perform more complex data analysis tasks, such as target recognition, and the like. However, the result obtained by performing data fusion by the existing method is poor, and the subsequent target recognition result is also affected.

The fusion data obtained by the embodiment of the invention has higher accuracy and less data volume, can improve the data processing efficiency, meets the requirements of an automatic driving system on data accuracy and high processing efficiency, and can ensure the data processing efficiency and the target identification quality even in a dynamic complex driving scene, so that the data processing method provided by the embodiment of the invention has stronger adaptability and is beneficial to realizing automatic driving.

In specific implementation, a fully-connected neural network can be adopted to perform target recognition on the fusion data to obtain a target recognition result. Because the fully-connected neural network has better robustness and more applicable scenes, the data of all dimensions can be processed, and the data analysis is facilitated.

In specific implementation, after the target recognition is performed on the fusion data, the obtained target recognition result can be subjected to visualization processing, and the recognized target is represented by a three-dimensional solid frame. In addition, for the same object, different object recognition results can be represented by different three-dimensional frames, for example, as shown in fig. 14, a schematic diagram of one object recognition result is shown, wherein a black three-dimensional frame 14A can represent one object recognition result, and a gray three-dimensional frame 14B can represent another object recognition result. The embodiments of the present specification are not limited thereto.

In specific implementation, the data processing method provided by the embodiment of the invention can obtain the first point cloud feature data with higher accuracy, so that even if the first image feature data has an abnormal condition, a subsequent target identification task can be continued through the first point cloud feature data.

For example, with reference to fig. 13, if the determination result in step SF is yes, that is, it is determined that the first image feature data has an abnormal condition, step SG is continued to perform target recognition on the first point cloud feature data to obtain a target recognition result.

Therefore, the data processing method provided by the invention has better robustness, and even if the first image characteristic data has an abnormal condition, a better data analysis result can be obtained only through the first point cloud characteristic data.

In a specific implementation, the data processing method provided by the embodiment of the present invention may be implemented by a target recognition model, where the target recognition model includes various neural networks for implementing the data processing method of the embodiment of the present invention, and the neural networks are connected according to corresponding logic.

Before the target recognition model is used, the target recognition model needs to be trained, parameters of the target recognition model are adjusted, the target recognition model is made to be convergent, however, parameters related to the target recognition model are various, a large amount of training data are needed to be repeatedly adjusted, in order to control training time of the model, only the training data can be screened, robustness of the trained target recognition model is poor, and the trained target recognition model cannot adapt to complex and variable automatic driving scenes.

For example, the image capturing devices may change the poses on the moving carrier as needed, and more than one image capturing device may be mounted on the moving carrier, and as shown in fig. 15, image capturing devices Y11, Y12, and Y13 are respectively provided at three locations of the moving carrier Y1. If the pose states of the image capturing devices Y11, Y12, and Y13 are different from the pose state of the image capturing device during training, the trained target recognition model may not be able to output an accurate target recognition result.

The data processing method provided by the embodiment of the invention can reduce the data volume of the model, so that the target detection model can be trained by adopting training data of more scenes.

Specifically, before the original image data and the original point cloud data are input into an initial target recognition model, the target recognition model is trained through preset training data, and the training data includes: training point cloud data, training image data acquired from different postures and corresponding image acquisition equipment parameters, and a reference recognition result.

Then, the error between the target recognition result and the reference target recognition result can be calculated through a preset loss function, when the error value is larger than an error threshold value, the parameters of the target recognition model can be adjusted, and training is carried out again through training data; and when the error value is smaller than the error threshold value, the target recognition model completes training.

Wherein, the parameters of the target recognition model can be adjusted by adopting a gradient descent method or a back propagation method.

By adopting the scheme, the training data volume is increased, the robustness of the model is further enhanced, and in the training process, training image data collected by image collection devices with different poses in a training data set can be randomly selected, so that the model obtained by training can adapt to the image collection devices with any poses in the actual driving process.

In specific implementation, the acquisition frequencies of the original point cloud data and the original image data may be different, which may cause inconsistency of the acquisition time information of the original point cloud data and the original image data, and in order to facilitate management of the original image data and the original point cloud data, and the processed data of the original image data and the original point cloud data, the original image data and the original point cloud data within a preset time range may be marked with the same timestamp, so as to obtain time-synchronized original image data and original point cloud data, so that time synchronization between the first image feature data and the first point cloud feature data is achieved.

It should be noted that, because a better data analysis result can be obtained even when only the first point cloud feature data is available, the data processing method provided in the embodiment of the present invention is not specifically limited to determine whether time synchronization is performed between the original point cloud data and the original image data for feature extraction.

It will be appreciated that while various embodiments of the present invention have been described above, alternatives described for the various embodiments can be combined, cross-referenced, and so forth without conflict, to extend to the variety of possible embodiments that can be considered disclosed and disclosed herein.

The embodiment of the invention also provides data processing equipment corresponding to the data processing method, and the data processing equipment is described in detail through specific embodiments with reference to the attached drawings. It should be understood that the data processing device described below may be regarded as a functional module that is required to implement the data processing method provided in the embodiment of the present invention, and in practical applications, the units, modules, and the like of the data processing device may be implemented by hardware, software, or a combination of hardware and software; the contents of the data processing apparatus described below may be referred to in correspondence with the contents of the data processing method described above.

Referring to a block diagram of a data processing device in an embodiment of the present invention shown in fig. 16, in the embodiment of the present invention, the data processing device M1 is connected to an image capturing device and a point cloud capturing device, and the data processing device M1 may include:

a data obtaining unit M11 adapted to obtain original image data of the image acquisition apparatus and original point cloud data of the point cloud acquisition apparatus;

the feature extraction unit M12 is suitable for performing feature extraction on the original point cloud data to obtain first point cloud feature data, and performing feature extraction on the original image data to obtain first image feature data;

the data screening unit M13 is adapted to respectively screen the first point cloud feature data and the first image feature data according to preset target classification candidate information to obtain second point cloud feature data and second image feature data;

and the data fusion unit M14 is adapted to fuse the second point cloud feature data and the second image feature data to obtain fusion data.

And the target identification unit M15 is suitable for carrying out target identification on the fusion data to obtain a target identification result.

By adopting the scheme, the first point cloud characteristic data and the first image characteristic data can be respectively screened based on the preset target classification candidate information, the data quantity of the first point cloud characteristic data and the first image characteristic data is effectively reduced, accurate useful data is reserved, data redundancy is reduced, the data quality and the data processing efficiency are improved, moreover, because screening is carried out according to the preset target classification candidate information, the obtained second point cloud characteristic data has stronger correspondence with the second image characteristic data, when target identification is carried out, the deviation between the target identified by the fusion data and the real target can be reduced, the intersection ratio between the target identified by the fusion data and the real target is improved, and therefore, the fusion data with smaller data quantity but higher accuracy can be obtained under the condition that expensive precise devices are not adopted, and then the accuracy of the target recognition result is improved, the processing time of the target recognition is shortened, the requirements of an automatic driving system on data accuracy and high efficiency can be met, and the automatic driving capability is enhanced.

In conclusion, the scheme can ensure the processing efficiency and the accuracy of data fusion with lower implementation cost, and further can obtain a more accurate target identification result.

In a specific implementation, as shown in fig. 17, the data filtering unit M13 may include:

the information obtaining subunit M131 is adapted to perform target screening on the first point cloud feature data according to preset target classification candidate information to obtain three-dimensional target candidate information of the first point cloud feature data;

the information filtering subunit M132 is adapted to filter the three-dimensional target candidate information to obtain optimized target candidate information;

and the screening subunit M133 is adapted to screen the first point cloud feature data according to the optimization target candidate information to obtain second point cloud feature data.

In a specific implementation, as shown in fig. 17, the data filtering unit M13 may further include: a coordinate conversion subunit M134, adapted to convert the coordinate system of the optimization target candidate information into the pixel coordinate system of the original image data, so as to obtain two-dimensional target candidate information;

the screening subunit M133 may screen the first image feature data according to the two-dimensional target candidate information to obtain second image feature data.

In a specific implementation, as shown in fig. 17, the information filtering subunit M132 may include:

the first filtering module M1321 is adapted to evaluate the three-dimensional target candidate information, and filter the three-dimensional target candidate information based on a preset evaluation condition to obtain intermediate target candidate information;

an information obtaining module M1322 is adapted to obtain the intermediate target candidate information as the optimization target candidate information.

In a specific implementation, as shown in fig. 17, the information filtering subunit M132 may further include: a second filtering module M1323, located between the first filtering module M1321 and the information obtaining module M1322, where the second filtering module M1323 is adapted to perform de-overlapping processing on the intermediate target candidate information.

In a specific implementation, as shown in fig. 17, the information filtering subunit M132 may further include: a third filtering module M1324, located between the first filtering module M1321 and the second filtering module M1323, where the third filtering module M1324 is adapted to match the intermediate target candidate information with an image range corresponding to the original image data, and filter the intermediate target candidate information based on a preset matching condition.

Further, as shown in fig. 17, the third filtering module M1324 is adapted to convert the intermediate target candidate information into a pixel coordinate system of the original image data, match an image range corresponding to the original image data, and filter the intermediate target candidate information based on a preset matching condition.

In a specific implementation, with continued reference to fig. 16, the data obtaining unit M11 may include:

the voxel division subunit M111 is suitable for dividing the original point cloud data to obtain a plurality of voxel units;

a feature extraction subunit M112, adapted to perform feature extraction on the plurality of voxel units to obtain voxel feature data;

and the data compression subunit M113 is adapted to compress the voxel characteristic data to obtain the first point cloud characteristic data.

In a specific implementation, as shown in fig. 16, the data processing device M1 may further include:

and the data abnormity judging unit M16 is positioned between the feature extracting unit M12 and the data screening unit M13, is connected with the target identifying unit M15, and is suitable for judging whether the first image feature data has abnormity.

The data screening unit M13 is adapted to screen the first point cloud feature data and the first image feature data according to preset target classification candidate information after the data anomaly determination unit M16 determines that the first image feature data has no anomaly.

The target identifying unit M15 is adapted to perform target identification on the first point cloud feature data after the data abnormality determining unit M16 determines that the first image feature data has an abnormal condition, so as to obtain a target identification result.

The embodiment of the invention also provides an automatic driving system corresponding to the data processing method, and the automatic driving system is described in detail through specific embodiments with reference to the attached drawings. It should be understood that, in practical applications, the devices in the automatic driving system described below may be implemented by hardware, software, or a combination of hardware and software; the contents of the automatic driving system described below may be referred to in correspondence with the contents of the data processing method described above.

Referring to a block diagram of an automatic driving system in an embodiment of the present invention shown in fig. 18, in an embodiment of the present invention, the automatic driving system M3 may include: a point cloud acquisition device M31, an image acquisition device M32 and a data processing device M33 placed on a mobile carrier, said data processing device M33 being connectable with said point cloud acquisition device M31 and said image acquisition device M32, respectively, wherein:

a point cloud acquisition device M31 adapted to acquire raw point cloud data;

an image acquisition device M32 adapted to acquire raw image data;

the data processing device M33 is adapted to execute the data processing method provided in any of the above embodiments, and processes the raw point cloud data and the raw image data.

Wherein the image acquisition device may comprise at least one of a digital camera, an infrared camera, and a thermal imaging camera. The point cloud collecting apparatus may include: at least one of laser radar and millimeter wave radar.

It can be understood that the connection between the data processing device and the image acquisition device may be a wired connection or a wireless connection, and accordingly, the data transmission between the data processing device and the image acquisition device may be through cable or wireless communication for interaction; in addition, the data processing device may directly perform data interaction with the image acquisition device, or may indirectly perform data interaction through a communication transfer platform (such as a switch, etc.), which is not limited in this embodiment of the present invention.

Similarly, the connection between the data processing device and the point cloud acquisition device can be a wired connection or a wireless connection, and accordingly, the data transmission between the data processing device and the point cloud acquisition device can be through interaction of cables or wireless communication; in addition, the data processing device may directly perform data interaction with the point cloud acquisition device, or may indirectly perform data interaction through a communication relay platform (such as an exchange), which is not limited in this embodiment of the present invention.

In particular implementations, the data processing device may include a memory that may store one or more computer-executable instructions and a processor that may invoke the one or more computer-executable instructions to perform the steps of the methods provided by embodiments of the present invention.

The embodiment of the present invention further provides a computer-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed, the steps of the method according to any of the above embodiments of the present invention may be executed. The computer readable storage medium may be various suitable readable storage media such as an optical disc, a mechanical hard disc, a solid state hard disc, and the like. The instructions stored in the computer-readable storage medium may be used to execute the method according to any of the embodiments, which may specifically refer to the embodiments described above and will not be described again.

The computer-readable storage medium may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit, for example, memory, removable or non-removable media, erasable or non-erasable media, writeable or re-writeable media, digital or analog media, hard disk, floppy disk, compact disk read Only memory (CD-ROM), compact disk recordable (CD-R), compact disk Rewriteable (CD-RW), optical disk, magnetic media, magneto-optical media, removable memory cards or disks, various types of Digital Versatile Disk (DVD), a tape, a cassette, or the like.

The computer instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object-oriented, visual, compiled and/or interpreted programming language.

It is noted that reference to "one embodiment" or "an embodiment" of the present invention means that a particular feature, structure or characteristic may be included in at least one implementation of the present invention. Also, in the description of the present invention, the terms "first", "second", and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined in terms of "first," "second," etc. may explicitly or implicitly include one or more of that feature. Moreover, the terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Although the embodiments of the present invention have been disclosed, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A data processing method, comprising:

2. The data processing method according to claim 1, wherein the screening the first point cloud feature data based on preset target classification candidate information comprises:

3. The data processing method according to claim 1, wherein the screening the first image feature data based on preset target classification candidate information to obtain second image feature data comprises:

4. The data processing method according to claim 2 or 3, wherein the filtering the three-dimensional object candidate information to obtain optimized object candidate information comprises:

evaluating the three-dimensional target candidate information, and filtering the three-dimensional target candidate information based on a preset evaluation condition to obtain intermediate target candidate information;

and taking the intermediate target candidate information as the optimization target candidate information.

5. The data processing method according to claim 4, wherein before the intermediate target candidate information is taken as the optimization target candidate information, further comprising:

6. The data processing method of claim 5, further comprising, before performing the de-overlapping process on the intermediate target candidate information:

7. The data processing method of claim 6, wherein the matching the intermediate target candidate information with the image range corresponding to the original image data comprises:

8. The data processing method according to claim 1, before the step of respectively screening the first point cloud feature data and the first image feature data based on preset target classification candidate information, the method further comprising:

9. The data processing method of claim 1, wherein the performing feature extraction on the original point cloud data to obtain first point cloud feature data comprises:

dividing the original point cloud data to obtain a plurality of voxel units;

10. The data processing method according to claim 9, wherein the performing feature extraction on the plurality of voxel units to obtain voxel feature data includes:

11. The data processing method according to claim 10, wherein the obtaining of the voxel feature data by performing local feature extraction on each of the plurality of voxel units includes any one of:

12. The data processing method of claim 9, wherein the compressing the voxel characteristic data comprises:

13. The data processing method of claim 1, further comprising:

14. The data processing method of claim 13, further comprising:

15. The data processing method of claim 1, further comprising:

16. A data processing device connected with an image acquisition device and a point cloud acquisition device, and adapted to perform the data processing method of any one of claims 1 to 15, wherein the data processing device comprises:

17. An automatic driving system, characterized by, includes point cloud collection equipment, image acquisition equipment and data processing equipment, data processing equipment respectively with point cloud collection equipment with the image acquisition equipment is connected, wherein:

an image acquisition device adapted to acquire raw image data;

data processing apparatus adapted to perform the data processing method of any one of claims 1 to 15, processing the raw point cloud data and raw image data.