CN117876987A

CN117876987A - Data augmentation method and device

Info

Publication number: CN117876987A
Application number: CN202410039442.0A
Authority: CN
Inventors: 安耀祖
Original assignee: Beijing Jd Yuansheng Technology Co ltd
Current assignee: Beijing Jd Yuansheng Technology Co ltd
Priority date: 2024-01-10
Filing date: 2024-01-10
Publication date: 2024-04-12

Abstract

The invention discloses a data augmentation method and device, and relates to the technical field of automatic driving. One embodiment of the method comprises the following steps: acquiring a training data set; performing first data augmentation processing on the training data set to obtain a first augmentation data set, wherein the first data augmentation processing is any one of the augmentation processing sets, and the augmentation processing set comprises image-level data augmentation processing, instance-level data augmentation processing and object mask data augmentation processing; performing second data augmentation processing on the first data set to obtain a second data set, wherein the second data augmentation processing is at least one of the two rest augmentation processing which is different from the first data augmentation processing in the augmentation processing set; the second augmented data set is used as a result of the data augmentation process. According to the embodiment, the data augmentation processing of the training data of monocular 3D target detection is realized, so that the purposes of improving the robustness and generalization capability of the 3D visual detection model are achieved.

Description

Data augmentation method and device

Technical Field

The invention relates to the technical field of automatic driving, in particular to a data augmentation method and device.

Background

Monocular 3D target detection is one of the most widely applied computer vision technologies at present, 3D detection is carried out on 2D image information of a camera through a monocular 3D vision detection model to obtain category information of a target object and information such as length, width, height, angle, position distance and the like of the target object in a three-dimensional space, and an application scene mainly comprises the fields of a vehicle automatic driving system, an intelligent robot, intelligent traffic and the like. A stable monocular 3D visual inspection model requires a large amount of rich training data to promote its robustness and generalization ability.

In the process of implementing the present invention, the inventor finds that the following problems exist in the prior art:

because of the individual difference of the hardware acquisition equipment and the difference of the acquisition angle height, the external data set and the actual scene disclosed in the prior art have differences and can only be used as the supplementary data of model training, and a great amount of training data sets with rich scenes are constructed with great cost, so the data augmentation is a method for effectively enriching the training data. The existing data augmentation method is mainly applied to the field of visual 2D target detection and the field of point cloud 3D target detection, and for monocular 3D target detection, as the 2D image information of the training data has corresponding 3D attribute information, the existing data method is difficult to directly apply to the augmentation of the training data of monocular 3D target detection, and the actual requirement cannot be met.

Disclosure of Invention

In view of this, the embodiment of the invention provides a method and a device for data augmentation, which can perform data augmentation on training data detected by a monocular 3D target, enrich the training data with high efficiency and low cost, so as to achieve the purpose of improving the robustness and generalization capability of a 3D visual detection model.

To achieve the object, according to an aspect of an embodiment of the present invention, there is provided a method of data augmentation, including:

acquiring a training data set to be subjected to data augmentation processing;

performing first data augmentation processing on the training data set to obtain a first augmentation data set, wherein the first data augmentation processing is any one of the augmentation processing sets, and the augmentation processing set comprises image-level data augmentation processing, instance-level data augmentation processing and object mask data augmentation processing;

performing second data augmentation processing on the first data set to obtain a second data set, wherein the second data augmentation processing is at least one of the two rest of the augmentation processing which is different from the first data augmentation processing in the augmentation processing set;

the second augmented data set is used as a result of a data augmentation process.

Optionally, in the case that the first data augmentation process is the image-level data augmentation process, performing the first data augmentation process on the training data set includes: at least one of data augmentation processing of zooming focal length and data augmentation processing of adjusting imaging range is performed on the image data in the training data set; performing a second data augmentation process on the first augmented data set, comprising: at least one of the example-level data augmentation processing and the object mask data augmentation processing is performed on the first augmented data set.

Optionally, performing zoom-in and zoom-out data augmentation processing on the image data in the training data set, including: determining a focal length scaling multiple value of the image data and a three-dimensional coordinate value of the image data; and obtaining a first processing result of the image data augmentation processing according to the focal length scaling multiple value and the three-dimensional coordinate value.

Optionally, the data augmentation process for adjusting the imaging range of the image data in the training data set includes: determining an imaging range coefficient of the image data; and combining the imaging range coefficient by taking the center point of the image data as a reference to obtain a second processing result of the image data augmentation processing.

Optionally, performing the instance-level data augmentation processing on the first augmented data set includes: acquiring image data in the first augmentation data set and point cloud data corresponding to the image data to form an image point cloud pair; determining an image point cloud pair to be copied and an image point cloud pair to be pasted from the image point cloud pair; and copying and pasting the image point cloud pair to be copied in the corresponding image point cloud pair to be pasted.

Optionally, copying and pasting the image point cloud pair to be copied in the corresponding image point cloud pair to be pasted includes: copying and pasting point cloud data to be copied in the image point cloud pair to be copied into point cloud data to be pasted in the image point cloud pair to be pasted; calculating two-dimensional attribute information of the point cloud data to be copied relative to the image to be pasted in the point cloud pair of the image to be pasted according to a transformation matrix from a preset point cloud coordinate system to a camera coordinate system and internal parameters of a camera; and copying and pasting the image data to be copied in the image point cloud pair to be copied into the image data to be pasted in the image point cloud pair to be pasted according to the two-dimensional attribute information.

Optionally, performing data augmentation processing of the object mask on the first augmented data set includes: acquiring the marking frame attribute of the image data in the first augmentation data set, and screening out the image data to be augmented, wherein the marking frame attribute is non-shielding; determining a mask range and mask proportion according to an application scene; and determining a deletion area of the image data to be amplified according to the mask range and the mask proportion and combining a preset deletion rule, and deleting the deletion area from the image data to be amplified.

Optionally, in the case that the first data augmentation process is the data augmentation process of the instance layer, performing a second data augmentation process on the first augmented data set, including: performing at least one of data augmentation processing at the image level and data augmentation processing at the object mask on the first augmented data set; or, in the case that the first data augmentation process is the data augmentation process of the object mask, performing a second data augmentation process on the first augmented data set, including: at least one of the image-level data augmentation processing and the instance-level data augmentation processing is performed on the first augmented data set.

According to a second aspect of an embodiment of the present invention, there is provided an apparatus for data augmentation, comprising:

the training data set acquisition module is used for acquiring a training data set to be subjected to data augmentation processing;

the first augmentation processing module is used for performing first data augmentation processing on the training data set to obtain a first augmentation data set, wherein the first data augmentation processing is any one of the augmentation processing sets, and the augmentation processing set comprises image-level data augmentation processing, instance-level data augmentation processing and object mask data augmentation processing;

the second data augmentation processing module is used for performing second data augmentation processing on the first data augmentation data set to obtain a second data augmentation data set, wherein the second data augmentation processing is at least one of the two rest augmentation processing which is different from the first data augmentation processing in the augmentation processing set;

and the augmentation result determining module is used for taking the second augmentation data set as the result of the data augmentation processing.

According to a third aspect of an embodiment of the present invention, there is provided an electronic device for data augmentation, including:

one or more processors;

storage means for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method provided by the first aspect of the embodiments of the present invention.

According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium having stored thereon a computer program which when executed by a processor implements the method provided by the first aspect of embodiments of the present invention.

One embodiment of the invention has the following advantages or benefits: acquiring a training data set to be subjected to data augmentation processing; performing first data augmentation processing on the training data set to obtain a first augmented data set, wherein the first data augmentation processing is any one of the augmented processing sets, and the augmented processing set comprises image-level data augmentation processing, instance-level data augmentation processing and object mask data augmentation processing; performing second data augmentation processing on the first data set to obtain a second data set, wherein the second data augmentation processing is at least one of the two remaining augmentation processing in the augmentation processing set, which is different from the first data augmentation processing; the second augmentation data set is used as a data augmentation processing result, so that the data augmentation processing of the training data of monocular 3D target detection is realized, the training data is enriched efficiently and with low cost, and the purposes of improving the robustness and generalization capability of the 3D visual detection model are achieved.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main flow of a method of data augmentation according to an embodiment of the present invention;

FIG. 2 is a flow diagram of data augmentation processing at an example level of an embodiment of the present invention;

FIG. 3 is an exemplary diagram of data augmentation processing at an example level of an embodiment of the present invention;

FIG. 4 is a schematic diagram of the result of the data augmentation process of the object mask according to the embodiment of the present invention;

FIG. 5 is a schematic diagram of the main blocks of an apparatus for data augmentation according to an embodiment of the present invention;

FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;

fig. 7 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

In the technical scheme of the invention, the related aspects of acquisition/collection, updating, analysis, use, transmission, storage and the like of the personal information of the user accord with the regulations of related laws and regulations, are used for legal and reasonable purposes, are not shared, leaked or sold outside the legal use aspects and the like, and are subjected to supervision and management of the national supervision and management department. Necessary measures should be taken for the personal information of the user, the use or access of the personal information data should be selectively prevented to prevent illegal access to such personal information data, to ensure that personnel having access to the personal information data comply with the regulations of the relevant laws and regulations, and to ensure the personal information security of the user. Furthermore, once such user personal information data is no longer needed, the risk should be minimized by limiting or even prohibiting the data collection and/or deletion.

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In order to solve the above problems in the prior art, the present invention proposes a data augmentation method, which considers that the data augmentation of monocular 3D target detection needs to synchronously adjust a 2D image and corresponding 3D attribute information, sequentially performs data augmentation processing on an image level, data augmentation processing on an instance level, and at least two kinds of data augmentation processing in the data augmentation processing of an object mask on a training data set of monocular 3D target detection, so as to achieve the purpose of data augmentation on the training data of monocular 3D target detection, obtain abundant training data with high efficiency and low cost, and further improve the robustness and generalization capability of a 3D visual detection model.

Fig. 1 is a schematic diagram of a main flow of a data augmentation method according to an embodiment of the present invention, as shown in fig. 1, the data augmentation method according to an embodiment of the present invention includes the following steps S101 to S104.

Step S101, a training data set to be subjected to data augmentation processing is obtained.

Specifically, the system receives the data augmentation processing request, analyzes the data augmentation processing request, determines a storage unit of the training data set to be subjected to the data augmentation processing, and acquires the training data set by accessing the storage unit.

Step S102, performing a first data augmentation process on the training data set to obtain a first augmented data set, where the first data augmentation process is any one of an augmentation process set, and the augmentation process set includes an image-level data augmentation process, an instance-level data augmentation process, and an object mask data augmentation process.

Specifically, the embodiment of the invention provides a data augmentation method for monocular 3D target detection, which considers that obvious differences exist between monocular 3D target detection and visual 2D target detection and point cloud 3D target detection, images and corresponding target information in the visual 2D target detection task are 2D information, point cloud 3D target detection task point cloud information and corresponding target information are 3D information, monocular 3D target detection obtains 3D target information by using 2D images, and corresponding 3D attribute information exists in 2D image information of training data of monocular 3D target detection, so that 2D images and corresponding 3D attribute information need to be synchronously adjusted when data augmentation is performed on a training data set of monocular 3D target detection. According to the embodiment of the invention, the characteristics of the 2D image are analyzed, the training data set of monocular 3D target detection is subjected to data augmentation processing from multiple dimensions, and the purposes of synchronously augmenting the 2D image and adjusting 3D attribute information are achieved by using the calibrated external parameters between the laser radar and the camera (the conversion matrix between the radar and the camera coordinate system, the conversion matrix from the camera coordinate system to the image coordinate system) and the internal parameters of the camera imaging.

Further, the embodiment of the invention provides a data augmentation processing mode with three dimensions, namely, image-level data augmentation processing, instance-level data augmentation processing and object mask data augmentation processing. Wherein the data augmentation at the image level is mainly based on the overall image level of the training dataset, and generates a new image with overall geometric transformation by randomly changing parameters such as focal length, receptive field and position of the camera system, for example, generating the new image and corresponding three-dimensional coordinate values by scaling the focal length of the image data of the training dataset; the data augmentation processing of the example level is mainly based on specific image content examples in images of a training data set, and new image contents are generated by randomly pasting target objects in specific images to other scenes, and the consistency of focal length, size and coordinate position is required to be maintained in the process; the data augmentation processing of the object mask is mainly based on the specific content level of the image, the 2D target frame of the image is randomly cut off, and partial target objects can be effectively cut off.

Further, one of the data augmentation processing at the image level, the data augmentation processing at the instance level, and the data augmentation processing at the object mask is selected, and the first data augmentation processing is performed on the training data set, so that a first augmented data set richer than the training data set is obtained.

And step S103, performing second data augmentation processing on the first data augmentation data set to obtain a second data augmentation data set, wherein the second data augmentation processing is at least one of the two rest augmentation processing which is different from the first data augmentation processing in the augmentation processing set.

Specifically, based on the first data augmentation process described above, two remaining types of augmentation processes that are different from the first data augmentation process in the augmentation process set are determined, for example, the first data augmentation process is an image-level data augmentation process, and then the two remaining types of augmentation processes are an instance-level data augmentation process, and an object mask data augmentation process. And selecting at least one data augmentation process from the rest two augmentation processes to perform second data augmentation processing on the first augmentation data set, so as to obtain a rich second augmentation data set which meets the requirements of the embodiment of the invention.

It can be understood that, in the embodiment of the present invention, the data augmentation processing on the image level, the data augmentation processing on the instance level, and the data augmentation processing on the object mask are sequentially performed on the training data set, and the result of the data augmentation processing is used as the object to be augmented for performing the data augmentation processing. Two or three kinds of data augmentation treatments are freely selected according to the needs of scenes, and a rich data set after the augmentation treatment is obtained. For example, when the training data of the training data set of the current scene is less and a large amount of abundant augmentation training data is needed, the data augmentation processing of the image layer can be performed on the training data set to obtain the data augmentation result of the image layer, then the data augmentation processing of the instance layer is performed on the data augmentation result of the image layer to obtain the data augmentation result of the image instance layer, and finally the data augmentation processing of the object mask is performed on the data augmentation result of the image instance layer to obtain the final data augmentation processing result. When the training image of the shielding target object is absent in the training data of the current scene, the data augmentation processing of the example level is performed on the training data set to obtain the data augmentation result of the example level, and then the data augmentation processing of the object mask is performed on the data augmentation result of the example level to obtain the final data augmentation processing result.

According to an embodiment of the present invention, in a case where the first data augmentation process is the image-level data augmentation process, performing the first data augmentation process on the training data set includes: at least one of data augmentation processing of zooming focal length and data augmentation processing of adjusting imaging range is performed on the image data in the training data set; performing a second data augmentation process on the first augmented data set, comprising: at least one of the example-level data augmentation processing and the object mask data augmentation processing is performed on the first augmented data set.

Specifically, when the first data augmentation process is the data augmentation process at the image level, the data augmentation process is performed on each training data in the training data set from the image level, whereas the data augmentation at the image level is mainly performed by zooming the focal length and/or adjusting the imaging range of the pictures in the training data set, the zooming and cropping at the image level will not affect the 3D attribute information of the image, so that the 3D attribute information of the image may not be focused too much, and only the augmentation process at the image level needs to be focused. Accordingly, at least one of the data augmentation processing of the example layer in the augmentation processing set and the data augmentation processing of the object mask is selected to perform the second data augmentation processing on the first augmentation data set, for example, the data augmentation processing of the example layer may be performed on the first augmentation data set, or the data augmentation processing of the object mask may be performed on the first augmentation data set, or the data augmentation processing of the example layer may be performed on the first augmentation data set, and then the data augmentation processing of the object mask may be performed on the data set after the data augmentation processing of the example layer, or the data augmentation processing of the object mask may be performed first.

According to another embodiment of the present invention, the data augmentation processing of zoom focus is performed on the image data in the training data set, including: determining a focal length scaling multiple value of the image data and a three-dimensional coordinate value of the image data; and obtaining a first processing result of the image data augmentation processing according to the focal length scaling multiple value and the three-dimensional coordinate value.

Specifically, zooming the image data corresponds to adjusting the size of the target object of the 2D image, for example, setting the value of the zooming magnification to s times corresponds to zooming the size of the target object of the 2D image by 1/s times, and moving the camera forward or backward. And then calculating to obtain a first processing result according to the focal length scaling multiple value s and the three-dimensional coordinate value of the image data and combining the internal parameters imaged by the camera. The specific solving formula is as follows:

wherein X represents three-dimensional coordinate values of image data, f _v And f _u Representing focal lengths in x and y directions, c, respectively _v And c _u Representing the amount of translation of the image center point in the x and y directions, s representing the focal length scale factor value.

According to still another embodiment of the present invention, the data augmentation process for adjusting the imaging range of the image data in the training data set includes: determining an imaging range coefficient of the image data; and combining the imaging range coefficient by taking the center point of the image data as a reference to obtain a second processing result of the image data augmentation processing.

Specifically, the imaging range of the image data is adjusted by randomly narrowing the receptive field range of the camera without changing the focal length. According to the imaging range of the determined image data, the center image which is consistent with the imaging range coefficient is cut out from the image data according to the imaging range coefficient which is generally less than 1 with the center point as a reference, and the edge portion is processed by zeroing. For example, the imaging range coefficient is 70%, then only 70% of the picture size needs to be reserved with reference to the center of the image data, and the remaining 30% is filled with 0. In this way, different imaging range coefficients can be set according to the requirement to improve the generalization capability of the model. Specifically, the method can be expressed by the following formula:

wherein u is _a ，u _b The range coordinate value representing the x direction of the image is larger than 0, and the value range is smaller than the width of the original image; v _a ，v _b The coordinate value of the Y-direction range of the image is represented, and the value range is larger than 0 and smaller than the height of the original image.

According to yet another embodiment of the present invention, the example-level data augmentation processing is performed on the first augmented data set, including: acquiring image data in the first augmentation data set and point cloud data corresponding to the image data to form an image point cloud pair; determining an image point cloud pair to be copied and an image point cloud pair to be pasted from the image point cloud pair; and copying and pasting the image point cloud pair to be copied in the corresponding image point cloud pair to be pasted.

Specifically, the data augmentation processing at the instance level is performed on the first augmented data set, namely, data augmentation is performed based on a specific target object in the image, and considering that the data augmentation processing at the instance level involves copying and pasting the target object in the image, namely, the instance object in the image changes, corresponding 3D attribute information is correspondingly caused to change, and the corresponding 3D attribute information needs to be extracted through synchronous data augmentation on the 3D point cloud data. Firstly, acquiring image data in a first augmentation data set and point cloud data corresponding to the image data to obtain image point cloud pairs, and preparing various calibration external parameters, namely a conversion matrix; determining an image point cloud pair to be copied from the image point cloud pair, wherein the image point cloud pair can be an image point cloud pair of a certain object and the image point cloud pair to be pasted can be an image point cloud pair of a scene; and finally, copying and pasting the image point cloud pair to be copied in the corresponding image point cloud pair to be pasted.

According to another embodiment of the present invention, copying and pasting the image point cloud pair to be copied in the corresponding image point cloud pair to be pasted includes: copying and pasting point cloud data to be copied in the image point cloud pair to be copied into point cloud data to be pasted in the image point cloud pair to be pasted; calculating two-dimensional attribute information of the point cloud data to be copied relative to the image to be pasted in the point cloud pair of the image to be pasted according to a transformation matrix from a preset point cloud coordinate system to a camera coordinate system and internal parameters of a camera; and copying and pasting the image data to be copied in the image point cloud pair to be copied into the image data to be pasted in the image point cloud pair to be pasted according to the two-dimensional attribute information.

In particular, considering that the data augmentation processing at the instance level involves copying and pasting of a target object in an image, that is, the instance object in the image changes, it is necessary to synchronously perform data augmentation of the image and adjust corresponding 3D attribute information by using calibration external parameters between the laser radar and the camera and internal parameters of camera imaging. For the augmentation of the point cloud data in the image point cloud pair, the point cloud data to be copied is directly copied and pasted into the point cloud data to be pasted, for example, the point cloud data to be copied is the point cloud data of a vehicle, the point cloud data to be pasted is the point cloud data of a street, and the point cloud data of the vehicle is copied and pasted into the point cloud data of the street. For the augmentation of image data, firstly, according to a transformation matrix from a point cloud coordinate system of an image point cloud pair to be pasted to a camera coordinate system and internal parameters of a camera, calculating 2D attribute information (coordinate size) of the point cloud data to be copied in the image to be pasted, and then according to the 2D attribute information, transforming and adjusting the image data to be copied in the image point cloud pair to be copied, and pasting the image data to be pasted. The specific calculation formula of the 2D attribute information is as follows:

Wherein R and T represent the transformation matrix of the point cloud coordinate system to the camera coordinate system, [ x ] _w y _w z _w ]Representing the point cloud coordinates, Z _c And (3) representing distance information, K representing an internal parameter matrix of the camera, and u and v being 2D attribute information to be obtained.

Through the data augmentation processing of the example layer, the positions of the 2D image data corresponding to the point cloud data to be copied in the image data to be pasted are adjusted by utilizing the conversion matrix between the laser radar and the camera and the internal parameters of camera imaging, so that the purposes of synchronously carrying out image data augmentation and adjusting 3D attribute information are achieved.

Fig. 2 is a flow diagram of data augmentation processing at an example level of an embodiment of the present invention. Acquiring image data in a first augmentation data set and point cloud data corresponding to the image data to form an image point cloud pair; determining an image point cloud pair A to be copied and an image point cloud pair B to be pasted from the image point cloud pair, and copying the image point cloud pair A; pasting the point cloud data of A into the point cloud data of B; calculating to obtain two-dimensional attribute information of the point cloud data of the A relative to the image data of the B; and pasting the image data of A into the image data of B according to the two-dimensional attribute information.

Fig. 3 is an exemplary diagram of data augmentation processing at an example level of an embodiment of the present invention. And copying and pasting the bus on the left side to the point cloud pair of the image to be pasted of the street scene by taking the bus on the main road in the left side as the point cloud pair of the image to be copied, and obtaining the data augmentation processing result of the right side example layer.

According to yet another embodiment of the present invention, performing data augmentation processing of the object mask on the first augmented data set includes: acquiring the marking frame attribute of the image data in the first augmentation data set, and screening out the image data to be augmented, wherein the marking frame attribute is non-shielding; determining a mask range and mask proportion according to an application scene; and determining a deletion area of the image data to be amplified according to the mask range and the mask proportion and combining a preset deletion rule, and deleting the deletion area from the image data to be amplified.

Specifically, in order to further enhance the generalization capability of the 3D visual inspection model, the data augmentation process of the object mask ObjectMask may also be performed on the above-mentioned first augmented data set. Considering the condition that the whole object in the Mask image is easy to appear or the Mask image is not available at all in the existing similar method, the embodiment of the invention firstly acquires the image data to be amplified, which has the whole object in the image and is not blocked; according to the application scene, a Mask range and a Mask proportion are determined, for example, a target detection object in a street scene is a vehicle, the Mask range is defined as the lower half part of an image, the Mask proportion can be set to be a plurality of, for example, 50%, 40%, 60% and the like according to the requirement, a plurality of augmented image data can be obtained, and then a preset deletion rule is combined, for example, in the street scene, a deleted Mask area is required to be clung to a bottom frame, a deletion area of the image data to be augmented is determined according to the parameters, the deletion area is deleted from the image data to be augmented, and a result after object Mask augmentation processing is obtained.

Fig. 4 is a schematic diagram of a data augmentation processing result of an object mask according to an embodiment of the present invention. And taking the lower half part of the image data to be amplified as a Mask range Q, selecting a Mask proportion alpha to obtain a Mask area as alpha Q, tightly attaching the Mask area to the bottom frame according to a deletion rule to obtain a gray area as a deletion area, and deleting the deletion area to obtain a data amplification processing result of the object Mask.

According to an embodiment of the present invention, in a case where the first data augmentation process is the data augmentation process of the instance level, performing a second data augmentation process on the first data set includes: performing at least one of data augmentation processing at the image level and data augmentation processing at the object mask on the first augmented data set; or, in the case that the first data augmentation process is the data augmentation process of the object mask, performing a second data augmentation process on the first augmented data set, including: at least one of the image-level data augmentation processing and the instance-level data augmentation processing is performed on the first augmented data set.

Specifically, in addition to the above-mentioned first data augmentation process being the image-level data augmentation process, the first data augmentation process may be the example-level data augmentation process, and then the training data set may be subjected to the augmentation process according to the above-mentioned example-level data augmentation process method, and only the data set object of the augmentation process may be adjusted to the training data set, thereby obtaining the first augmentation data set, and then at least one of the image-level data augmentation process and the object mask data augmentation process may be performed on the first augmentation data set according to the above-mentioned image-level data augmentation process and the object mask data augmentation process, thereby obtaining the second augmentation data set. Similarly, in the case that the first data augmentation process is the data augmentation process of the object mask, the training data set is subjected to the augmentation process according to the method of the data augmentation process of the object mask, the data set object of the augmentation process is simply adjusted to the training data set, the first augmented data set is obtained, and then at least one of the data augmentation process of the image layer and the data augmentation process of the instance layer is performed on the first augmented data set according to the method of the data augmentation process of the image layer and the data augmentation process of the instance layer, so that the second augmented data set is obtained.

Step S104, the second augmentation data set is used as a result of data augmentation processing.

Specifically, the obtained second augmentation data set is integrated and summarized according to a contracted data format to obtain a data augmentation processing result, namely data augmentation training data suitable for monocular 3D target detection.

The data augmentation method provided by the embodiment of the invention can effectively solve the problem of unbalanced category in the training data set, and can also enhance the robustness and generalization capability of the 3D visual detection model.

Fig. 5 is a schematic diagram of main modules of an apparatus for data augmentation according to an embodiment of the present invention. As shown in fig. 5, the data augmentation apparatus 500 mainly includes a training data set acquisition module 501, a first augmentation processing module 502, an augmentation result determining module 503, and an augmentation result determining module 504.

A training data set obtaining module 501, configured to obtain a training data set to be subjected to data augmentation processing;

a first augmentation processing module 502, configured to perform a first data augmentation process on the training data set to obtain a first augmented data set, where the first data augmentation process is any one of an augmentation processing set, and the augmentation processing set includes an image-level data augmentation process, an instance-level data augmentation process, and an object mask data augmentation process;

A second augmentation processing module 503, configured to perform a second data augmentation process on the first data set to obtain a second data set, where the second data augmentation process is at least one of two remaining augmentation processes in the augmentation process set that are different from the first data augmentation process;

an augmentation result determining module 504, configured to take the second augmentation data set as a result of the data augmentation process.

According to an embodiment of the present invention, in the case that the first data augmentation process is the image-level data augmentation process, the first augmentation processing module 502 is further configured to: at least one of data augmentation processing of zooming focal length and data augmentation processing of adjusting imaging range is performed on the image data in the training data set; the second augmentation processing module 503 is further configured to: at least one of the example-level data augmentation processing and the object mask data augmentation processing is performed on the first augmented data set.

According to another embodiment of the present invention, the first augmentation processing module 502 is further configured to: determining a focal length scaling multiple value of the image data and a three-dimensional coordinate value of the image data; and obtaining a first processing result of the image data augmentation processing according to the focal length scaling multiple value and the three-dimensional coordinate value.

According to a further embodiment of the present invention, the first augmentation processing module 502 is further configured to: determining an imaging range coefficient of the image data; and combining the imaging range coefficient by taking the center point of the image data as a reference to obtain a second processing result of the image data augmentation processing.

According to a further embodiment of the present invention, the second augmentation processing module 503 is further configured to: acquiring image data in the first augmentation data set and point cloud data corresponding to the image data to form an image point cloud pair; determining an image point cloud pair to be copied and an image point cloud pair to be pasted from the image point cloud pair; and copying and pasting the image point cloud pair to be copied in the corresponding image point cloud pair to be pasted.

According to another embodiment of the present invention, the second augmentation processing module 503 is further configured to: copying and pasting point cloud data to be copied in the image point cloud pair to be copied into point cloud data to be pasted in the image point cloud pair to be pasted; calculating two-dimensional attribute information of the point cloud data to be copied relative to the image to be pasted in the point cloud pair of the image to be pasted according to a transformation matrix from a preset point cloud coordinate system to a camera coordinate system and internal parameters of a camera; and copying and pasting the image data to be copied in the image point cloud pair to be copied into the image data to be pasted in the image point cloud pair to be pasted according to the two-dimensional attribute information.

According to a further embodiment of the present invention, the second augmentation processing module 503 is further configured to: acquiring the marking frame attribute of the image data in the first augmentation data set, and screening out the image data to be augmented, wherein the marking frame attribute is non-shielding; determining a mask range and mask proportion according to an application scene; and determining a deletion area of the image data to be amplified according to the mask range and the mask proportion and combining a preset deletion rule, and deleting the deletion area from the image data to be amplified.

According to a further embodiment of the present invention, in the case that the first data augmentation process is the data augmentation process of the instance level, the second augmentation processing module 503 is further configured to: performing at least one of data augmentation processing at the image level and data augmentation processing at the object mask on the first augmented data set; alternatively, in a case where the first data augmentation process is the data augmentation process of the object mask, the second augmentation processing module 503 is further configured to: at least one of the image-level data augmentation processing and the instance-level data augmentation processing is performed on the first augmented data set.

FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be applied.

As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 is used as a medium to provide communication links between the terminal devices 601, 602, 603 and the server 605. The network 604 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 605 via the network 604 using the terminal devices 601, 602, 603 to receive or send messages, etc. Various communication client applications, such as a data augmentation application, etc. (by way of example only) may be installed on the terminal devices 601, 602, 603.

The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 605 may be a server providing various services, such as a background management server (by way of example only) that provides support for data augmentation by users using the terminal devices 601, 602, 603. The background management server can acquire a training data set to be subjected to data augmentation processing; performing first data augmentation processing on the training data set to obtain a first augmentation data set, wherein the first data augmentation processing is any one of the augmentation processing sets, and the augmentation processing set comprises image-level data augmentation processing, instance-level data augmentation processing and object mask data augmentation processing; performing second data augmentation processing on the first data set to obtain a second data set, wherein the second data augmentation processing is at least one of the two rest of the augmentation processing which is different from the first data augmentation processing in the augmentation processing set; the second augmentation data set is processed as a result of the data augmentation process, etc., and the processing result (e.g., the result of the augmentation process, etc. -by way of example only) is fed back to the terminal device.

It should be noted that, the method for data augmentation provided in the embodiment of the present invention is generally performed by the server 605, and accordingly, the device for data augmentation is generally disposed in the server 605.

It should be understood that the number of terminal devices, networks and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Reference is now made to fig. 7, which is a schematic diagram illustrating the architecture of a computer system suitable for use in implementing an embodiment of the present invention. The terminal device or server shown in fig. 7 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present invention.

As shown in fig. 6, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 701.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination thereof.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present invention may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor comprising: the system comprises a training data set acquisition module, a first augmentation processing module, a second augmentation processing module and an augmentation result determining module.

The names of these modules do not constitute a limitation of the module itself in some cases, and for example, the training data set acquisition module may also be described as "a module for acquiring a training data set to be subjected to data augmentation processing".

In another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the embodiment; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: acquiring a training data set to be subjected to data augmentation processing; performing first data augmentation processing on the training data set to obtain a first augmentation data set, wherein the first data augmentation processing is any one of the augmentation processing sets, and the augmentation processing set comprises image-level data augmentation processing, instance-level data augmentation processing and object mask data augmentation processing; performing second data augmentation processing on the first data set to obtain a second data set, wherein the second data augmentation processing is at least one of the two rest of the augmentation processing which is different from the first data augmentation processing in the augmentation processing set; the second augmented data set is used as a result of a data augmentation process.

According to the technical scheme provided by the embodiment of the invention, the method has the following advantages or beneficial effects: acquiring a training data set to be subjected to data augmentation processing; performing first data augmentation processing on the training data set to obtain a first augmented data set, wherein the first data augmentation processing is any one of the augmented processing sets, and the augmented processing set comprises image-level data augmentation processing, instance-level data augmentation processing and object mask data augmentation processing; performing second data augmentation processing on the first data set to obtain a second data set, wherein the second data augmentation processing is at least one of the two remaining augmentation processing in the augmentation processing set, which is different from the first data augmentation processing; the second augmentation data set is used as a data augmentation processing result, so that the data augmentation processing of the training data of monocular 3D target detection is realized, the training data is enriched efficiently and with low cost, and the purposes of improving the robustness and generalization capability of the 3D visual detection model are achieved.

The described embodiments do not limit the scope of the invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method of data augmentation comprising:

acquiring a training data set to be subjected to data augmentation processing;

2. The method of claim 1, wherein, in the case where the first data augmentation process is a data augmentation process at the image level,

performing a first data augmentation process on the training data set, comprising: at least one of data augmentation processing of zooming focal length and data augmentation processing of adjusting imaging range is performed on the image data in the training data set;

Performing a second data augmentation process on the first augmented data set, comprising: at least one of the example-level data augmentation processing and the object mask data augmentation processing is performed on the first augmented data set.

3. The method of claim 2, wherein performing a zoom-in data augmentation process on image data in the training dataset comprises:

determining a focal length scaling multiple value of the image data and a three-dimensional coordinate value of the image data;

and obtaining a first processing result of the image data augmentation processing according to the focal length scaling multiple value and the three-dimensional coordinate value.

4. The method of claim 2, wherein the data augmentation process for adjusting the imaging range for the image data in the training dataset comprises:

determining an imaging range coefficient of the image data;

and combining the imaging range coefficient by taking the center point of the image data as a reference to obtain a second processing result of the image data augmentation processing.

5. The method of claim 2, wherein performing the instance-level data augmentation processing on the first augmented data set comprises:

Acquiring image data in the first augmentation data set and point cloud data corresponding to the image data to form an image point cloud pair;

determining an image point cloud pair to be copied and an image point cloud pair to be pasted from the image point cloud pair;

and copying and pasting the image point cloud pair to be copied in the corresponding image point cloud pair to be pasted.

6. The method of claim 5, wherein copying and pasting the pair of image point clouds to be copied in the corresponding pair of image point clouds to be pasted comprises:

copying and pasting point cloud data to be copied in the image point cloud pair to be copied into point cloud data to be pasted in the image point cloud pair to be pasted;

calculating two-dimensional attribute information of the point cloud data to be copied relative to the image to be pasted in the point cloud pair of the image to be pasted according to a transformation matrix from a preset point cloud coordinate system to a camera coordinate system and internal parameters of a camera;

and copying and pasting the image data to be copied in the image point cloud pair to be copied into the image data to be pasted in the image point cloud pair to be pasted according to the two-dimensional attribute information.

7. The method of claim 2, wherein performing data augmentation processing of the object mask on the first augmented data set comprises:

Acquiring the marking frame attribute of the image data in the first augmentation data set, and screening out the image data to be augmented, wherein the marking frame attribute is non-shielding;

determining a mask range and mask proportion according to an application scene;

and determining a deletion area of the image data to be amplified according to the mask range and the mask proportion and combining a preset deletion rule, and deleting the deletion area from the image data to be amplified.

8. The method of claim 1, wherein, in the case where the first data augmentation process is the instance-level data augmentation process, performing a second data augmentation process on the first set of augmented data comprises: performing at least one of data augmentation processing at the image level and data augmentation processing at the object mask on the first augmented data set;

or, in the case that the first data augmentation process is the data augmentation process of the object mask, performing a second data augmentation process on the first augmented data set, including: at least one of the image-level data augmentation processing and the instance-level data augmentation processing is performed on the first augmented data set.

9. An apparatus for data augmentation, comprising:

10. A mobile electronic device terminal, comprising:

one or more processors;

storage means for storing one or more programs,

when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-8.

11. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-8.