CN117078800A

CN117078800A - Method and device for synthesizing ground identification based on BEV image

Info

Publication number: CN117078800A
Application number: CN202310951584.XA
Authority: CN
Inventors: 别晓芳; 张松; 梅近仁; 李剑; 孟超
Original assignee: Zero Beam Technology Co ltd
Current assignee: Zero Beam Technology Co ltd
Priority date: 2023-07-31
Filing date: 2023-07-31
Publication date: 2023-11-17

Abstract

The invention discloses a method and a device for synthesizing ground identification based on BEV images, wherein the method comprises the following steps: acquiring a ground identifier template and a background image, wherein the ground identifier template comprises a ground arrow and a text identifier; projecting the background image to an overhead view to obtain a BEV image, and identifying position coordinates for adding the ground identifier in the BEV image; and fusing the ground identifier and the BEV image into a target BEV image according to the position coordinates, and converting the target BEV image into a common image under a common viewing angle, wherein the common image comprises the ground identifier template. According to the method, the background image and the ground mark are utilized, the target data with the labels can be automatically synthesized in batches by means of the road surface segmentation results of the lane lines and the drivable areas, and the new type of data synthesis can be directly performed without any label of the data to be synthesized in the process, so that the model training problem in the data shortage is solved to a great extent.

Description

Method and device for synthesizing ground identification based on BEV image

Technical Field

The invention relates to the field of automatic driving image data synthesis, in particular to a method and a device for synthesizing ground marks based on BEV images.

Background

The data enhancement method is divided into offline enhancement and online enhancement, and common enhancement methods include geometric enhancement, online enhancement which is widely used in deep learning and other means, and the generalization of the model can be improved to a certain extent. Advanced enhancement means such as CycleGAN can also be used for image style migration, but the data enhancement means can only enhance on the original tagged data in supervised learning, and cannot generate new categories.

The model pre-labeling method refers to a technology for obtaining a pseudo label by carrying out model reasoning on data to be labeled through a pre-trained model, such as a yolox model and a yolov7 model which are widely applied in the engineering field, wherein the model is trained by a large amount of data with labels, is sensitive to the trained category, and has no discrimination capability on the category which is not learned.

At present, the technology of data synthesis has been widely applied to the 2D and 3D fields of automatic driving, and is expected to realize high efficiency of sensing tasks in the automatic driving field. More in the 2D field, data synthesis is performed based on images, and more complex factors such as background, viewing angle, camera depth, illumination, gesture and the like need to be considered in data synthesis in the 3D field. The synthetic data has very wide application, and can be almost suitable for all machine learning and deep learning tasks. The data generation can be performed through the StyleGAN and other generation type countermeasure networks, but the StyleGAN method is widely applied to face generation at present, and has unsatisfactory effects in other fields.

Disclosure of Invention

Aiming at the technical problems, the invention provides a method and a device for synthesizing a ground identifier based on BEV images, which can realize ground identifier synthesis based on the BEV images.

In a first aspect of the invention, there is provided a method of synthesizing a ground identification based on BEV images, comprising:

acquiring a ground identifier template and a background image, wherein the ground identifier template comprises a ground arrow and a text identifier;

projecting the background image to an overhead view to obtain a BEV image, and identifying position coordinates for adding the ground identifier in the BEV image;

and fusing the ground identifier and the BEV image into a target BEV image according to the position coordinates, and converting the target BEV image into a common image under a common viewing angle, wherein the common image comprises the ground identifier template.

In an alternative embodiment, the projecting the background image to the overhead view obtains a BEV image, and identifies position coordinates in the BEV image for adding the ground identifier, including:

during recognition, the panoramic segmentation model is utilized to segment the background image into a lane line mask and a pavement travelable area mask;

and determining position coordinates between two adjacent lane lines for placing the ground identifier template according to the constraint condition.

In an alternative embodiment, the identifying, using the panorama segmentation model to segment the background image into the lane line and the drivable area of the road surface, comprises:

projecting the background image, the lane line mask and a pavement movable region mask into the BEV image by inverse perspective transformation;

the road vehicle is filtered according to the road exercisable area mask, and four position point coordinates for placing the ground identifier template in the BEV image are calculated.

In an alternative embodiment, the determining the position coordinates between two adjacent lane lines for placing the ground identifier template according to the constraint condition includes:

and calculating and determining the position coordinates between two adjacent lane lines for placing the ground identifier template according to the width of the lane lines, the distance between the adjacent lane lines and the size of the traffic identifier.

In an alternative embodiment, the method for synthesizing the ground identifier based on the BEV image further comprises performing enhancement processing on the ground identifier template by means of size transformation, shading transformation, gaussian blur, texture transformation and noise addition; the background of the ground identifier is processed into pixels similar to the background image.

In an alternative embodiment, the fusing the ground identifier and the BEV image according to the location coordinates to a target BEV image comprises:

the ground identifier is fused with the background image projected into the BEV image according to the position coordinates.

In an alternative embodiment, the method for synthesizing the ground identifier based on the BEV image further includes outputting a synthesized data tag corresponding to the ground identifier when the target BEV image is transformed into a normal image under a normal viewing angle.

In a second aspect of the invention, there is provided an apparatus for synthesizing a ground identification based on BEV images, comprising:

the acquisition module is used for acquiring a ground identifier template and a background image, wherein the ground identifier template comprises a ground arrow and a text identifier;

the processing module is used for projecting the background image to an overhead view to obtain a BEV image, and identifying position coordinates for adding the ground identifier in the BEV image;

and the synthesis module is used for fusing the ground identifier and the BEV image into a target BEV image according to the position coordinates, and converting the target BEV image into a common image under a common viewing angle, wherein the common image comprises the ground identifier template.

In a third aspect of the present invention, there is provided an electronic apparatus comprising:

at least one processor; and at least one memory communicatively coupled to the processor, wherein: the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method according to the first aspect of the embodiments of the invention.

In a fourth aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored which, when run by a computer, performs the method according to the first aspect of the embodiment of the invention.

According to the method, the background image and the ground mark are utilized, the target data with the labels can be automatically synthesized in batches by means of the road surface segmentation results of the lane lines and the drivable areas, and the new type of data synthesis can be directly performed without any label of the data to be synthesized in the process, so that the model training problem in the data shortage is solved to a great extent.

Drawings

FIG. 1 is a flow chart of a method for synthesizing a ground identification based on BEV images in accordance with an embodiment of the present invention.

Fig. 2 is a schematic diagram of background color replacement of a ground identifier in an embodiment of the invention.

Fig. 3 is a schematic diagram of a result of recognition of a background image and panoramic segmentation in an embodiment of the present invention.

FIG. 4 is a schematic illustration of the projection of the diagram of FIG. 3 into BEV space.

Fig. 5 is a diagram showing the comparison between the ground identifier before and after synthesis at a common viewing angle in an embodiment of the present invention.

FIG. 6 is a block diagram of an apparatus for synthesizing a ground identification based on BEV images in accordance with an embodiment of the present invention.

Fig. 7 is a schematic structural view of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the terms "comprises" and "comprising," when used in this specification and the claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The invention relates to 2D data synthesis of ground identification data such as ground arrows, and the like, and the invention realizes 2D data synthesis based on BEV images, namely realizes ground identification data synthesis of 2D common images in 3D scenes, unlike data synthesis in 2D scenes. The invention firstly obtains the template of the synthesized data and determines the placement position of the template. When the template positioning needs to be considered, for example, the pavement identifier should be on the ground and satisfies the camera front view angle rule of 'near-far-small', the background image can be projected to the aerial view angle through an inverse perspective transformation (IPM, inverse Perspective Mapping) method to obtain a corresponding aerial view. Under the view angle of the aerial view, the positions of the lane lines are firstly positioned through the lane line segmentation model, and the positions of vehicles on the road surface are filtered through a mask (mask) of the drivable area of the road surface; and then positioning the position of the template between two adjacent lane lines according to the position relation of the lane lines and a certain priori position constraint condition, so as to determine the final position of the template. The ground identification is then poisson fused with the background image, enabling the ground identification to be fused more naturally with the background. Finally, the synthesized data and labels can be used for downstream detection and segmentation tasks. The method comprises the following steps:

referring to fig. 1, fig. 1 is a flowchart of a method for synthesizing a ground identifier based on BEV images according to an embodiment of the present invention. The invention provides a method for synthesizing a ground identifier based on BEV images, which comprises the following steps:

step 100: a ground identifier template and a background image are acquired, wherein the ground identifier template comprises a ground arrow and a text identifier.

This step is used to obtain templates for data synthesis. For a 2D automatic driving scene, the acquisition of the ground identifier template mainly comprises the steps of acquiring from an existing template library, acquiring a template from an open source data set and manually manufacturing the template; the background image is a road surface image acquired by the front-view camera, and the background image selects a clean road surface without an arrow or a character identifier in a certain range or a ground without an arrow or a character identifier. The template library refers to a manufactured picture template and contains all data types required to be synthesized; the partially open source dataset may release a template that has been already manufactured, for example, a template in which 128 traffic signs have been released by TT100K (traffic sign dataset), or may use a cut target as a template. Illustratively, the ground identifier includes an indication of straight, an indication of straight ahead or right turn, an indication of straight ahead or left turn or right turn, an indication of right and left turn only on a road ahead, an indication of left or right turn ahead, an indication of left and right turn ahead, an indication of straight ahead or right turn ahead, an indication of left and right turn ahead of a road, an indication of right and left turn ahead of a road, an indication of right and right turn ahead of a road, and the like.

The background image of the BEV image can be obtained from a cloud data lake, and the background image can be projected into the BEV image by using a transformation method. Data enhancement is required for the acquired ground identifier template. The enhancement of the ground identifier template comprises common enhancement, background replacement and other methods, and aims to enhance the diversity and generalization of the template, and the details are described later.

Step 200: and projecting the background image to an overhead view to obtain a BEV image, and identifying position coordinates for adding the ground identifier in the BEV image.

In automated/assisted driving, detection of lane lines is very important. In the image captured by the front-view camera, the objects that are originally parallel to each other are intersected in the image due to the existence of the perspective effect. And the inverse perspective transformation (IPM transformation) is to eliminate this perspective effect. In this step, the background image is projected to the bird's eye view angle by the inverse perspective transformation method to obtain a corresponding bird's eye view, that is, the background image is displayed at the bird's eye view angle. The application scene of the invention comprises, but is not limited to, data synthesis under the cloud automatic driving BEV, and is also suitable for the data synthesis scene under the vehicle-end automatic driving BEV under the condition of sufficient vehicle-end calculation power resources.

And after the BEV image is acquired, the lane lines and the pavement drivable areas in the background image are identified by using an image segmentation model. The lane lines may then be projected into the BEV image by an inverse perspective transformation; since the identified lane lines exist in the bird's eye view, the placement position of the ground identifier can be positioned in the bird's eye view according to the lane lines.

Step 300: and fusing the ground identifier and the BEV image into a target BEV image according to the position coordinates, and converting the target BEV image into a common image under a common viewing angle, wherein the common image comprises the ground identifier template.

After the processing of the steps, a lane line mask and a pavement travelable area mask corresponding to each background image can be obtained, and then a basic model for automatic driving training can be obtained. The base model is added with a ground identifier to synthesize training data with labels for automatic driving training.

After determining the location coordinates for adding the ground identifier, the ground identifier and the BEV image are fused into a target BEV image according to the location coordinates.

After synthesizing the pavement identifier under the view angle of the aerial view, the aerial view is transformed back to the normal camera view angle through the inverse matrix perspective of the transformation matrix, so that the pavement image under the view angle of the front-view camera can be obtained, the physical rule of actual 'near-large-far-small' is met, the corresponding synthesized data label is obtained, and the synthesized data can be used for downstream detection and segmentation tasks.

From the above, the invention utilizes the background image and the ground mark, and can realize automatic batch synthesis of the target data with the labels by means of the road surface segmentation results of the lane lines and the drivable region, thereby realizing the data enhancement of BEV images and the image enhancement of common visual angles; in the process, the new type of data synthesis can be directly performed without any label of the data to be synthesized, and the model training problem in the data shortage is solved to a great extent.

Further, the method further comprises the step of carrying out data enhancement and preprocessing on the ground identifier template before carrying out data synthesis, wherein the data enhancement and preprocessing comprises background color processing, background image preprocessing and the like of the ground identifier. Referring to fig. 2, fig. 2 is a schematic diagram of template background color substitution of a ground identifier. For example, the background color of the ground identification template may be directly changed to the same hue as the background image. For example, the number of the arrow templates is 12, and the 12 templates are directly subjected to background color replacement.

In addition, the arrow may also be background color replaced after data synthesis. The ground identifier can be identified by using the yolo series model, and the background of the ground identifier is processed into pixels similar to the background image, so that the effect of the finally synthesized picture can be more real.

The ground identifier and the background image can be enhanced by means of size transformation, shading transformation, gaussian blur, texture transformation and noise addition.

Further, in the above step 200, the projecting the background image to the overhead view to obtain a BEV image, and identifying, by using a panorama segmentation model, a position coordinate for adding the ground identifier in the BEV image specifically includes:

when the model is used for recognition, the background image is divided into a lane line mask (mask) and a pavement travelable area mask (mask) by using a Panoptic-deep Lab model; of course, other models that achieve similar results may be used. As shown in fig. 3, the left side view (1) is a background image, the center view (2) is an identified lane line mask, and the right side view (3) is a road surface travelable region mask. Wherein the road vehicles can be filtered according to the road exercisable area mask to reduce the impact on data.

The lane line of the background image is identified through the panoramic segmentation model, so that the position of the lane line can be determined better, and the placement position of the ground identifier is determined according to the lane line. The default lane lines are parallel to each other in the bird's eye view. And determining position coordinates between two adjacent lane lines for placing the ground identifier template according to the constraint condition.

Further, as shown in FIG. 4, the background image, the lane line mask, and the pavement movable region mask are projected into the BEV image by an inverse perspective transformation. Wherein the left side view (4) is a background image under the bird's eye view, the central view (5) is a lane line mask under the bird's eye view, and the right side view (6) is a pavement drivable region mask under the bird's eye view. As can be seen from fig. 4, the lane lines are parallel to each other in the bird's eye view, and the coordinates of the position points of the ground identifier can be located by the lane lines parallel to each other in the lane line mask.

Specifically, the position where the road identifier can be pasted can be located by setting some constraint conditions. For example, according to the width of the lane lines, the distance between the adjacent lane lines and the size of the traffic identifier, the position coordinates for placing the ground identifier template between the two adjacent lane lines are calculated and determined. The line width of the lane lines is 15cm, the distance between adjacent lane lines of the urban road is 3.5 meters, and the pavement identifier of the urban road is 4.5 meters. The constraining road surface identifier is arranged in the middle between the adjacent lane lines, namely the origin of coordinates of the road surface identifier is positioned at the midpoint between the adjacent lane lines, and the transverse distance of the road surface identifier is smaller than the distance between the adjacent lane lines.

Four location point coordinates in the BEV image for placement of the ground identifier template are calculated based on the intermediate line between adjacent lane lines and the ground identifier template itself size. The homography projective transformation may complete the spatial transformation by eight point coordinates, such as the pavement identifier for background replacement in fig. 2, which is a rectangular icon, and its four corner coordinates may be converted into coordinates in BEV space by the homography transformation algorithm. Since the lane line mask projective transformation uses the coordinate transformation of the background image under the bird's eye view, the coordinates of the road surface identifier on the lane line mask can also be transformed into BEV space based on the coordinates of the original background image. And then fusing the ground identifier with the background image projected into the BEV image according to the position coordinates by using a poisson fusion technology means to obtain a target BEV image.

Further, after synthesizing the pavement arrow identifier under the view angle of the aerial view, the aerial view is transformed back to the normal camera view angle through the inverse matrix perspective of the transformation matrix; and outputting the synthesized data label corresponding to the ground identifier when the target BEV image is converted into a common image under a common visual angle. The common image output by the invention can meet the physical rule of actual 'near-big-far-small', and simultaneously obtains the corresponding synthesized data label, and the synthesized data can be used for downstream detection and segmentation tasks. The effect diagram before and after synthesis is shown in fig. 5, in which the left side (7) is an image before synthesis and the right side (8) is an image after synthesis.

The invention performs a synthetic data ablation experiment of a Ceymo (road marking dataset) public dataset based on class 1 Stright Arrow data, and a model adopts yolov7 as shown in the following table 1. Based on self-collected data as background images, 1614 pieces of synthesized data are synthesized. From table 1, it can be seen that 50% of training sets randomly selected from the streight Arrow category on the public data set Ceymo data set can reach 77% of maps by finetune based on the synthesized data, which proves that the data synthesis method provided by the invention is effective and reliable.

TABLE 1

Type(s)	Synthesizing data	Ceymo training set	Total training set	Ceymo test set	Map@0.5	Map@0.5:0.95
							1(SA)	0	677	677	256	0.975	0.804
1(SA)	1614	677*0.3 = 203	1817	256	0.327	0.215
							1(SA)	1614	677*0.5 = 338	1952	256	0.773	0.594

。

The method provided by the invention has the advantages that the data synthesis speed is high, the data synthesis can be automatically carried out in batches, and the diversity of the data is ensured; the synthesized data is provided with the labels, and when the data is synthesized, the corresponding labels can be generated together without manual labeling, so that a great deal of labeling cost is saved; the synthesized data is safer in privacy protection.

As shown in fig. 6, the present invention further provides an apparatus for synthesizing a ground identifier based on BEV images, including:

the obtaining module 61 is configured to obtain a ground identifier template and a background image, where the ground identifier template includes a ground arrow and a text identifier.

During recognition, the panoramic segmentation model is utilized to segment the background image into a lane line mask and a pavement travelable area mask; and determining position coordinates between two adjacent lane lines for placing the ground identifier template according to the constraint condition.

Specifically, the background image, the lane line mask and the pavement movable area mask are projected into the BEV image through inverse perspective transformation; the road vehicle is filtered according to the road exercisable area mask, and four position point coordinates for placing the ground identifier template in the BEV image are calculated.

A processing module 62 is configured to project the background image into an overhead view to obtain a BEV image, and identify position coordinates in the BEV image for adding the ground identifier.

For example, the position coordinates between two adjacent lane lines for placing the ground identifier template are calculated and determined according to the width of the lane lines, the distance between the adjacent lane lines and the size of the traffic identifier.

And a synthesis module 63, configured to fuse the ground identifier and the BEV image into a target BEV image according to the position coordinates, and transform the target BEV image into a normal image under a normal viewing angle, where the normal image includes the ground identifier template. For example, the ground identifier is fused with the background image projected into the BEV image according to the position coordinates. The synthesis module 63 is further configured to output a synthetic data tag corresponding to the ground identifier when the target BEV image is transformed into a normal image under a normal viewing angle.

Further, the device for synthesizing the ground identifier based on the BEV image further comprises a preprocessing module, a processing module and a processing module, wherein the preprocessing module is used for performing enhancement processing on the ground identifier template in a mode of size transformation, shading transformation, gaussian blur, texture transformation and noise addition; the background of the ground identifier is processed into pixels similar to the background image.

As shown in fig. 7, the present invention further provides an electronic device, including:

at least one processor; and at least one memory communicatively coupled to the processor, wherein: the memory stores program instructions executable by the processor, which invoke the program instructions to perform the method of synthesizing a ground identification based on BEV images described above.

The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the above method of synthesizing a ground identification based on BEV images.

It is understood that the computer-readable storage medium may include: any entity or device capable of carrying a computer program, a recording medium, a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a software distribution medium, and so forth. The computer program comprises computer program code. The computer program code may be in the form of source code, object code, executable files, or in some intermediate form, among others. The computer readable storage medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a software distribution medium, and so forth.

In some embodiments of the present invention, the apparatus for synthesizing a ground identification based on BEV images may include a controller, which is a single-chip microcomputer chip, integrated with a processor, a memory, a communication module, etc. The processor may refer to a processor comprised by the controller. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for synthesizing a ground identification based on BEV images, comprising:

2. The method of claim 1, wherein projecting the background image into an overhead view to obtain a BEV image and identifying position coordinates in the BEV image for adding the ground identifier comprises:

3. The method for synthesizing a ground identification based on BEV images according to claim 2, wherein the identifying, using a panorama segmentation model to segment the background image into a lane line and a drivable region of a road surface, comprises:

4. The method of synthesizing a ground identification based on BEV images according to claim 2, wherein said determining the position coordinates between two adjacent lane lines for placing the ground identifier template according to the constraint condition comprises:

5. The method of claim 1, further comprising enhancing the ground identifier template by size conversion, shading, gaussian blur, texture conversion, noise addition; the background of the ground identifier is processed into pixels similar to the background image.

6. The method of claim 1, wherein the fusing the ground identifier with the BEV image to a target BEV image according to the location coordinates comprises:

7. The method of claim 1, further comprising outputting a composite data tag corresponding to the ground identifier when transforming the target BEV image into a normal image at normal viewing angles.

8. An apparatus for synthesizing a ground identification based on BEV images, comprising:

9. An electronic device, comprising:

at least one processor; and at least one memory communicatively coupled to the processor, wherein: the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being run by a computer, performs the method according to any one of claims 1 to 7.