CN111274927A

CN111274927A - Training data generation method and device, electronic equipment and storage medium

Info

Publication number: CN111274927A
Application number: CN202010054351.6A
Authority: CN
Inventors: 王丽雯; 周锴
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2020-06-12

Abstract

The application discloses a training data generation method and device, electronic equipment and a storage medium. The method comprises the following steps: loading a material model and a basic scene based on a graphic engine to obtain a virtual scene; carrying out scene shooting according to a virtual camera arranged on the appointed material model to obtain a scene picture; generating picture marking information based on a material model contained in a scene picture; and taking a scene picture and picture marking information thereof as a piece of training data. The method has the advantages that scene pictures can be obtained in a virtual scene according to the graphic engine and the virtual camera, training data can be generated conveniently and quickly by combining picture marking information, the generation method of the training data is enriched, the generation cost of the training data is reduced, the problem of insufficient training data samples is effectively solved, and a good auxiliary effect is achieved on the model training effect.

Description

Training data generation method and device, electronic equipment and storage medium

Technical Field

The application relates to the field of deep learning, in particular to a training data generation method and device, electronic equipment and a storage medium.

Background

The automatic driving scene has higher requirements on the accuracy of detecting and identifying the road obstacle, and for detecting and identifying the efficient and accurate obstacle, the deep learning technology is generally used in the prior art, but the deep learning technology depends on training data, so that when the training data set is more complete and rich and the quality is higher, the accuracy of the trained model on different test sets is better, namely, the generalization performance is better. However, in practice, training data set resources often need to obtain image collection qualification awarded by the country and then carry out frequent collection activities, and the problems of high sample acquisition cost, low efficiency and difficult acquisition in certain specific scenes exist.

Disclosure of Invention

In view of the above, the present application is proposed to provide a training data generation method, apparatus, electronic device and storage medium that overcome or at least partially solve the above problems.

According to an aspect of the present application, there is provided a method of generating training data, including:

loading a material model and a basic scene based on a graphic engine to obtain a virtual scene; the material model corresponds to an object to be focused in an automatic driving scene;

carrying out scene shooting according to a virtual camera arranged on the appointed material model to obtain a scene picture;

generating picture marking information based on a material model contained in a scene picture;

and taking a scene picture and picture marking information thereof as a piece of training data.

Optionally, at least one of the material model, the base scene, the specified material model, and the virtual camera is determined according to scene configuration information, and the scene configuration information includes at least one of: the method comprises the steps of material model category, material model number, material model deployment position, material model movement parameters, basic scene category, virtual camera viewpoint position and scene picture acquisition frequency.

Optionally, the generating of the picture tagging information based on the material model included in the scene picture includes:

determining material models contained in the scene pictures and axial vertex coordinates of the contained material models in all axial directions of a world coordinate system according to the scene configuration information;

and generating a three-dimensional bounding box of the corresponding material model according to the determined axial vertex coordinates.

Optionally, the material model includes a standard model with complete coordinate information and a non-standard model with incomplete coordinate information;

determining the material model contained in the scene picture and the axial vertex coordinates of the contained material model in each axial direction of the world coordinate system according to the scene configuration information comprises:

for the standard model, directly determining axial vertex coordinates in each axial direction of a world coordinate system according to coordinate information contained in the standard model;

for the non-standard model, analyzing a three-dimensional vertex coordinate set of the non-standard model by using a principal component analysis method, and determining three principal directions and a mass center; and carrying out coordinate transformation according to the determined three-dimensional vertex coordinate set of the main direction and the mass center non-standard model, and determining axial vertex coordinates in each axial direction of the world coordinate system according to the transformed coordinates.

determining a material model contained in the scene picture and a three-dimensional coordinate set of the contained material model according to the scene configuration information;

based on the imaging projection relation between the world coordinate system and the virtual camera coordinate system, projecting each element in the three-dimensional coordinate set of the contained material model onto the picture coordinate system to obtain a two-dimensional coordinate set of the contained material model;

and determining a two-dimensional surrounding frame of the corresponding material model according to the two-dimensional coordinate set.

Optionally, the method further comprises: adding two-dimensional image features into a scene picture, wherein the two-dimensional image features comprise at least one of the following: motion blur features, brightness features, camera distortion features, image compression features.

Optionally, the adding the two-dimensional image feature into the scene picture includes:

and enabling KL divergence between the feature distribution of the scene picture added with the two-dimensional image features and the feature distribution of the real scene picture to be smaller than a preset threshold value.

According to another aspect of the present application, there is provided a training data generation apparatus, including:

the loading unit is used for loading the material model and the basic scene based on the graphic engine to obtain a virtual scene; the material model corresponds to an object to be focused in an automatic driving scene;

the shooting unit is used for shooting a scene according to a virtual camera arranged on the specified material model to obtain a scene picture;

the labeling unit is used for generating image labeling information based on a material model contained in the scene image;

and the training unit is used for taking a scene picture and picture marking information thereof as a piece of training data.

Optionally, the labeling unit is configured to determine, according to the scene configuration information, material models included in the scene picture and axial vertex coordinates of the included material models in each axial direction of a world coordinate system; and generating a three-dimensional bounding box of the corresponding material model according to the determined axial vertex coordinates.

the labeling unit is used for directly determining axial vertex coordinates in all axial directions of a world coordinate system for the standard model according to the coordinate information contained in the standard model; for the non-standard model, analyzing a three-dimensional vertex coordinate set of the non-standard model by using a principal component analysis method, and determining three principal directions and a mass center; and carrying out coordinate transformation according to the determined three-dimensional vertex coordinate set of the main direction and the mass center non-standard model, and determining axial vertex coordinates in each axial direction of the world coordinate system according to the transformed coordinates.

Optionally, the labeling unit is configured to determine, according to the scene configuration information, a material model included in the scene picture and a three-dimensional coordinate set of the included material model; based on the imaging projection relation between the world coordinate system and the virtual camera coordinate system, projecting each element in the three-dimensional coordinate set of the contained material model onto the picture coordinate system to obtain a two-dimensional coordinate set of the contained material model; and determining a two-dimensional surrounding frame of the corresponding material model according to the two-dimensional coordinate set.

Optionally, the apparatus further comprises: a processing unit, configured to add a two-dimensional image feature to the scene picture, where the two-dimensional image feature includes at least one of: motion blur features, brightness features, camera distortion features, image compression features.

Optionally, the processing unit is configured to enable a KL divergence between the feature distribution of the scene picture added with the two-dimensional image feature and the feature distribution of the real scene picture to be smaller than a preset threshold.

In accordance with yet another aspect of the present application, there is provided an electronic device including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method as any one of the above.

According to a further aspect of the application, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method as in any above.

According to the technical scheme, the material model and the basic scene are loaded based on the graphic engine to obtain the virtual scene; carrying out scene shooting according to a virtual camera arranged on the appointed material model to obtain a scene picture; generating picture marking information based on a material model contained in a scene picture; and respectively taking each scene picture and the picture marking information thereof as a piece of training data. The method has the advantages that scene pictures can be obtained in a virtual scene according to the graphic engine and the virtual camera, training data can be generated conveniently and quickly by combining picture marking information, the generation method of the training data is enriched, the generation cost of the training data is reduced, the problem of insufficient training data samples is effectively solved, and a good auxiliary effect is achieved on the model training effect.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a schematic flow diagram of a method of generating training data according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an apparatus for generating training data according to an embodiment of the present application;

FIG. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 4 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application;

FIG. 5a shows a schematic diagram of a three-dimensional bounding box and a two-dimensional bounding box labeled in a picture;

fig. 5b shows a schematic view of the three-dimensional bounding box marked in the picture.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 shows a schematic flow diagram of a method of generating training data according to an embodiment of the present application. As shown in fig. 1, the method includes:

and step S110, loading the material model and the basic scene based on the graphic engine to obtain a virtual scene. Wherein the material model corresponds to an object to be focused on in an automatic driving scene.

The material model and the basic scene are loaded through the graphic engine, so that various real scenes such as a block, a highway, even an automobile accident scene and the like can be simulated through software. The graphic engine may be a game engine, for example, a Unity3D engine, and may load various material models of vehicles, pedestrians, traffic light facilities, etc. that may occur in the driving environment of the vehicle, and various basic scenes, such as urban roads, rural roads, surrounding buildings, etc., that may occur in the driving environment of the vehicle, based on the game engine, and may finally synthesize a virtual scene of the driving environment of the vehicle.

The material model may be obtained by modeling a three-dimensional scene model for a real scene in the physical world, or by directly using an existing map scene model and other material models, such as: vehicles, pedestrians, traffic signs, etc.

When the material model and the basic scene are loaded, the material library can be deployed according to different scenes, for example, in an urban map scene, material models such as cars and buses can be deployed in a motorway area, material models such as bicycles and tricycles can be deployed in a non-motorway area, and material models such as pedestrians and garbage cans can be deployed in a sidewalk area. And then loading the material models into a game engine for rendering. In order to better conform to the real scene, the material model can be set in the engine to have a fixed moving mode or a fixed motionless mode respectively. Therefore, the real automobile driving environment can be simulated on software through the image engine, and virtual scenes such as automatic driving, logistics distribution, takeaway and the like can be widely simulated.

And step S120, carrying out scene shooting according to the virtual camera arranged on the specified material model to obtain a scene picture.

In order to simulate the information acquisition process of the driving environment and the surrounding scenes in a real scene, a virtual camera can be arranged on a designated material model in a virtual scene for scene shooting, and the appropriate position of the virtual camera can be flexibly set in different detection and identification scenes to determine a more appropriate viewpoint position. For example, in an automatic driving scenario, a viewpoint position may be set on a certain vehicle body model by simulating the position of a camera of a road collection vehicle, and a driving route may be set for the model so that the viewpoint position moves therewith. In this way, it is achieved that the position of the virtual camera is set in the virtual scene to determine the appropriate viewpoint position.

Step S130, generating image marking information based on the material model contained in the scene image.

Similar to the picture information acquired by the camera in the real scene, the virtual camera arranged in the virtual scene can also acquire the corresponding scene picture, and can perform information annotation on the scene pictures acquired by the virtual camera. For example, the scene picture obtained by the virtual camera includes material models such as pedestrians, fire engines, trucks, and the like, and information can be labeled on the material models respectively. Therefore, the scene picture acquired by the virtual camera can be converted into the picture with the labeling information.

Step S140, a scene picture and its picture marking information are used as a piece of training data.

In order to efficiently and accurately detect and recognize the scene pictures, an automatic detection recognition model can be established by using an artificial intelligence deep learning method, and a large number of scene picture sets can be used as training data to train the automatic detection recognition model until a preset training result is achieved. Therefore, the scene picture acquired by the virtual camera and the corresponding scene picture labeling information can be used as training data of the automatic detection recognition model, and the automatic detection recognition model can be trained in a targeted manner.

Therefore, as shown in fig. 1, the method can obtain scene pictures in a virtual scene according to a graphic engine and a virtual camera, and can conveniently and quickly generate training data by combining picture marking information, thereby enriching the generation method of the training data, reducing the generation cost of the training data, effectively solving the problem of insufficient training data samples, and having a good auxiliary effect on the model training effect. In the fields of automatic driving, logistics distribution and takeaway, the automatic detection and identification method has wide and good application.

In an embodiment of the application, in the method, at least one of the material model, the base scene, the specified material model, and the virtual camera is determined according to scene configuration information, and the scene configuration information includes at least one of: the method comprises the steps of material model category, material model number, material model deployment position, material model movement parameters, basic scene category, virtual camera viewpoint position and scene picture acquisition frequency.

Recording a material model and a basic scene in a virtual scene, and setting a virtual camera on a specified material model can be determined through scene configuration information, wherein the configuration information determines the material model according to configuration information such as the type of the material model, the number of the material models, the deployment position of the material model, the movement parameters of the material model and the like, and for example, whether pedestrians, the number of pedestrians, the specific positions of the pedestrians in the virtual scene, the movement speed of the pedestrians and the like exist in the virtual scene can be determined according to the configuration information.

The type of the basic scene can be determined according to the configuration information, for example, an urban road scene is determined according to the scene so as to simulate the surrounding environment of the urban road in the real physical scene. The specific position set by the virtual camera can be determined according to the configuration information, for example, the position of the roof of the mobile collection vehicle where the camera is set, the alignment angle of the camera, and the like can be determined according to the configuration information.

In order to flexibly use different scene task requirements, the acquisition frequency of different scene pictures can be set, for example, in a tracking task, a higher frequency screenshot can be adopted, for example, the frequency is set to 15fps, and in an image detection task, a lower frequency screenshot can be adopted, for example, a screenshot is carried out every 2 seconds.

The method can identify the simulated scene by adopting the preset configuration information, and can flexibly determine different configuration information according to different tasks and different requirements of users, so that richer simulated scenes are constructed, and the method has remarkable convenience and lower cost advantage compared with scene picture data acquisition under real scenes.

In an embodiment of the present application, in the method, generating the picture tagging information based on the material model included in the scene picture includes: determining material models contained in the scene pictures and axial vertex coordinates of the contained material models in all axial directions of a world coordinate system according to the scene configuration information; and generating a three-dimensional bounding box of the corresponding material model according to the determined axial vertex coordinates.

The scene picture acquired by the virtual camera is a two-dimensional image. The peripheral outline characteristics can reflect the characteristics of different material models, and the material models can be detected and identified by acquiring the peripheral outline characteristics. In order to efficiently determine the spatial profile of the material model, a three-dimensional bounding box of the material model can be determined, and specifically, when performing 3D to 2D projection calculation, all vertexes of the 3D model do not need to be projected to an image plane, but only 8 vertexes of the three-dimensional bounding box need to be determined. As shown in fig. 5b, for example, taking a car in a scene picture as an example, according to the scene configuration information, it may be determined that the material model belonging to the scene picture is the car, then it may determine axial vertex coordinates of the car in each axial direction of the world coordinate system, and then according to the six determined axial vertex coordinates, according to the spatial relationship, the eight vertex coordinate values of the three-dimensional bounding box of the corresponding material model may be represented by values of the six axial vertex coordinates, and the minimum three-dimensional bounding box may be obtained according to the eight vertices. Thus, the three-dimensional bounding box of the material model is determined in the scene picture.

In an embodiment of the application, in the method, the material model includes a standard model with complete coordinate information and a non-standard model with incomplete coordinate information; determining a material model contained in a scene picture and axial vertex coordinates of the contained material model in each axial direction of a world coordinate system according to scene configuration information comprises the following steps: for the standard model, directly determining axial vertex coordinates in each axial direction of a world coordinate system according to coordinate information contained in the standard model; for the non-standard model, analyzing a three-dimensional vertex coordinate set of the non-standard model by using a principal component analysis method, and determining three principal directions and a mass center; and carrying out coordinate transformation according to the determined three-dimensional vertex coordinate set of the main direction and the mass center non-standard model, and determining axial vertex coordinates in each axial direction of the world coordinate system according to the transformed coordinates.

The material model comprises two standard models with relatively complete coordinate information, for example, the standard models are obtained through standardized modeling; and non-standard models with incomplete coordinate information, such as models gathered via a network or the like, may not contain complete information. The three-dimensional bounding boxes of the corresponding material models can be determined by adopting different modes according to the two models with different standard degrees.

For the standard model, since the coordinate information of the material model is already clear, the axial vertex coordinates in each axial direction of the world coordinate system can be directly determined according to the contained coordinate information. For example, the 8 vertexes of the three-dimensional bounding box may be determined according to the 3D coordinates of the material model and the maximum value coordinate value max _ X of the X axis, the minimum value coordinate value min _ X of the X axis, the maximum value coordinate value max _ Y of the Y axis, the minimum value coordinate value min _ Y of the Y axis, the maximum value coordinate value max _ Z of the Z axis, and the minimum value coordinate value min _ Z of the Z axis, so as to determine the minimum bounding box of the material model according to the vertex coordinates of the vertexes of the material model.

For the non-standard model, a three-dimensional vertex coordinate set of the non-standard model may be analyzed, and since all vertex coordinates of the material model are known, three Principal directions of the vertex set of the model may be obtained by Principal Component Analysis (PCA). The principal component analysis method is a statistical method, and can convert a group of variables possibly having correlation into a group of linearly uncorrelated variables through orthogonal transformation, and the group of converted variables are called principal components. Thus, three main directions and centroids of the model vertex set can be determined through a principal component analysis method. And then, through covariance calculation and covariance matrix calculation, an eigenvalue and an eigenvector of the covariance matrix can be obtained, wherein the obtained eigenvector is the principal direction.

For the unification of the coordinate system, the vertex coordinates of the material model can be converted to the origin according to the obtained main direction and the centroid, and the main direction is coincident with the direction of the coordinate system. In this way, a unification of the coordinate system is achieved. Then, the maximum coordinate value max _ X of the X axis, the minimum coordinate value min _ X of the X axis, the maximum coordinate value max _ Y of the Y axis, the minimum coordinate value min _ Y of the Y axis, the maximum coordinate value max _ Z of the Z axis and the minimum coordinate value min _ Z of the Z axis are obtained by traversing the model vertex set, and therefore the minimum bounding box of the material model vertex is determined.

Therefore, the minimum bounding box of the vertex of the material model can be determined according to different modes for the material models with different standard degrees.

In an embodiment of the present application, in the method, generating the picture tagging information based on the material model included in the scene picture includes: determining a material model contained in the scene picture and a three-dimensional coordinate set of the contained material model according to the scene configuration information; based on the imaging projection relation between the world coordinate system and the virtual camera coordinate system, projecting each element in the three-dimensional coordinate set of the contained material model onto the picture coordinate system to obtain a two-dimensional coordinate set of the contained material model; and determining a two-dimensional surrounding frame of the corresponding material model according to the two-dimensional coordinate set.

For example, when the material model is a car, fig. 5a shows a schematic diagram of a three-dimensional bounding box and a two-dimensional bounding box marked in a picture, and fig. 5b shows a schematic diagram of a three-dimensional bounding box marked in a picture. As can be seen from fig. 5a and 5b, the three-dimensional bounding box is illustrated in a rectangular parallelepiped perspective view satisfying a perspective relationship in the picture, and the two-dimensional bounding box is illustrated in a rectangular shape. According to the scene configuration information, the material model contained in the scene picture and the three-dimensional coordinate set of the contained material model can be determined. In order to obtain the two-dimensional image and the model coordinates thereof in the image pixel coordinate system, the conversion of the world coordinate system of the three-dimensional model into the camera coordinate system and then into the image coordinate system can be simulated.

Specifically, the position of each target model can be captured for recording, the coordinates of the target model are generated conveniently, then the conversion relationship from the three-dimensional coordinates to the two-dimensional coordinates and from the material model to the scene image is established according to the viewpoint position, the direction, the three-dimensional coordinates of the material model and the imaging projection relationship between the world coordinate system and the virtual camera coordinate system, and each element in the three-dimensional coordinate set of the contained material model is projected onto the picture coordinate system to obtain the two-dimensional coordinate set of the contained material model. Specifically, a camera coordinate system may be established in advance, and then world coordinates of the three-dimensional material model may be converted into the camera coordinate system by means of rigid body transformation. Wherein, O_W(X_W，Y_W，Z_W) Representing the world coordinate system, O_C(X_C，Y_C，Z_C) Representing the camera coordinate system. When a camera coordinate system is established, the focusing center of a camera model can be used as an origin, and the optical axis of the camera is used as Z_CAxis establishing a three-dimensional rectangular coordinate system, X_C，Y_CGenerally parallel to the X, Y of the physical coordinate system of the image and in a front projection model. Then will containThe image coordinate system can be divided into different coordinate systems according to different units, specifically, the image coordinate system can be divided into an image pixel coordinate system by taking a pixel as a unit, and the image coordinate system can be divided into an image physical coordinate system by taking a millimeter as a unit. Therefore, the projection of the material model of the three-dimensional world to the two-dimensional scene image under any view angle can be realized. Then, a two-dimensional bounding box of the corresponding material model can be determined according to the two-dimensional coordinate set.

In an embodiment of the present application, the method further includes: adding two-dimensional image features into the scene picture, wherein the two-dimensional image features comprise at least one of the following: motion blur features, brightness features, camera distortion features, image compression features.

In a virtual scene, there may be some deviation in simulation effect for the simulation of a real physical scene. For example, in an automatic driving scene, due to the influence of a vehicle speed, an image captured by a camera in a real scene may have image blurring caused by motion, a significant difference in image brightness caused by a complex reason of illumination in the real scene, distortion effects such as barrel distortion and pincushion distortion when a camera shoots, and the influence of image compression in different degrees may exist. Therefore, in order to more vividly restore the environment of the automatic driving scene in the real physical scene, certain effect processing can be performed on the scene picture, for example, two-dimensional image features such as a motion blur feature, a brightness feature, a camera distortion feature, an image compression feature and the like can be added into the scene picture, so that the two-dimensional image features generate corresponding processing effects, and the environment of the automatic driving scene in the real physical scene is restored to a greater extent.

In an embodiment of the present application, the adding a two-dimensional image feature to the scene picture in the method includes: and enabling KL divergence between the feature distribution of the scene picture added with the two-dimensional image features and the feature distribution of the real scene picture to be smaller than a preset threshold value.

In order to better balance the degree of the processing effect on the scene picture, when the scene picture is processed, the feature distribution in the sample generation process can be determined according to the feature distribution of the real sample, and the feature distribution is fused. Specifically, the characteristic distribution in the process of generating the sample can be controlled to be close to the characteristic distribution of the real sample, the evaluation can be performed by means of the KL divergence of the characteristic distribution, the specific evaluation parameter can be the KL divergence of the characteristic distribution of the real sample and the KL divergence of the characteristic distribution of the generated sample in the virtual scene, the KL divergence parameter is made to be a smaller positive value in principle, and in the evaluation process, whether the KL divergence parameter is the smaller positive value can be determined by using a cross validation method. Therefore, the characteristics of the generated sample and the real sample in the virtual scene are more similar through fusion, a better simulation effect is obtained, and the training effect is indirectly improved.

Fig. 2 shows a schematic structural diagram of a training data generation apparatus according to an embodiment of the present application. As shown in fig. 2, the training data generation device 200 includes:

and the loading unit 210 is configured to load the material model and the basic scene based on the graphics engine to obtain a virtual scene. Wherein the material model corresponds to an object to be focused on in the automatic driving scene.

And the shooting unit 220 is configured to perform scene shooting according to the virtual camera arranged on the specified material model to obtain a scene picture.

And the labeling unit 230 is configured to generate image labeling information based on the material model included in the scene image.

The training unit 240 is configured to use a scene picture and its picture marking information as a piece of training data.

Therefore, the device shown in fig. 2 can obtain scene pictures in a virtual scene according to a graphic engine and a virtual camera, and can conveniently and quickly generate training data by combining picture marking information, so that the generation method of the training data is enriched, the generation cost of the training data is reduced, the problem of insufficient training data samples is effectively solved, and a good auxiliary effect is achieved on the model training effect. In the fields of automatic driving, logistics distribution and takeaway, the automatic detection and identification method has wide and good application.

In one embodiment of the present application, in the above apparatus, at least one of the material model, the base scene, the specified material model, and the virtual camera is determined according to scene configuration information, and the scene configuration information includes at least one of: the method comprises the steps of material model category, material model number, material model deployment position, material model movement parameters, basic scene category, virtual camera viewpoint position and scene picture acquisition frequency.

In an embodiment of the present application, in the above apparatus, the labeling unit 230 is configured to determine, according to the scene configuration information, material models included in the scene picture and axial vertex coordinates of the included material models in each axial direction of the world coordinate system; and generating a three-dimensional bounding box of the corresponding material model according to the determined axial vertex coordinates.

In an embodiment of the present application, in the apparatus, the material model includes a standard model with complete coordinate information and a non-standard model with incomplete coordinate information; the labeling unit 230 is used for directly determining the axial vertex coordinates of the standard model in each axial direction of the world coordinate system according to the contained coordinate information; for the non-standard model, analyzing a three-dimensional vertex coordinate set of the non-standard model by using a principal component analysis method, and determining three principal directions and a mass center; and carrying out coordinate transformation according to the determined three-dimensional vertex coordinate set of the main direction and the mass center non-standard model, and determining axial vertex coordinates in each axial direction of the world coordinate system according to the transformed coordinates.

In an embodiment of the present application, in the above apparatus, the labeling unit 230 is configured to determine, according to the scene configuration information, a material model included in the scene picture and a three-dimensional coordinate set of the included material model; based on the imaging projection relation between the world coordinate system and the virtual camera coordinate system, projecting each element in the three-dimensional coordinate set of the contained material model onto the picture coordinate system to obtain a two-dimensional coordinate set of the contained material model; and determining a two-dimensional surrounding frame of the corresponding material model according to the two-dimensional coordinate set.

In one embodiment of the present application, the apparatus further includes: the processing unit is used for adding two-dimensional image features into the scene picture, wherein the two-dimensional image features comprise at least one of the following: motion blur features, brightness features, camera distortion features, image compression features.

In an embodiment of the present application, in the above apparatus, the processing unit is configured to make a KL divergence between a feature distribution of the scene picture to which the two-dimensional image feature is added and a feature distribution of the real scene picture smaller than a preset threshold.

It should be noted that, for the specific implementation of each apparatus embodiment, reference may be made to the specific implementation of the corresponding method embodiment, which is not described herein again.

In summary, according to the technical scheme of the application, a material model and a basic scene are loaded based on a graphic engine to obtain a virtual scene; carrying out scene shooting according to a virtual camera arranged on the appointed material model to obtain a scene picture; generating picture marking information based on a material model contained in a scene picture; and respectively taking each scene picture and the picture marking information thereof as a piece of training data. The method has the advantages that scene pictures can be obtained in a virtual scene according to the graphic engine and the virtual camera, training data can be generated conveniently and quickly by combining picture marking information, the generation method of the training data is enriched, the generation cost of the training data is reduced, the problem of insufficient training data samples is effectively solved, and a good auxiliary effect is achieved on the model training effect.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the training data generating apparatus according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 300 comprises a processor 310 and a memory 320 arranged to store computer executable instructions (computer readable program code). The memory 320 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 320 has a storage space 330 storing computer readable program code 331 for performing any of the method steps described above. For example, the storage space 330 for storing the computer readable program code may comprise respective computer readable program codes 331 for respectively implementing various steps in the above method. The computer readable program code 331 may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 4. FIG. 4 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 400 has stored thereon a computer readable program code 331 for performing the steps of the method according to the application, readable by a processor 310 of an electronic device 300, which computer readable program code 331, when executed by the electronic device 300, causes the electronic device 300 to perform the steps of the method described above, in particular the computer readable program code 331 stored on the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 331 may be compressed in a suitable form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A method of generating training data, comprising:

2. The method of claim 1, wherein at least one of the material model, the base scene, the specified material model, and the virtual camera is determined based on scene configuration information, the scene configuration information including at least one of: the method comprises the steps of material model category, material model number, material model deployment position, material model movement parameters, basic scene category, virtual camera viewpoint position and scene picture acquisition frequency.

3. The method of claim 2, wherein the generating of the picture annotation information based on the material model included in the scene picture comprises:

4. The method of claim 3, wherein the material model includes a standard model in which coordinate information is complete, and a non-standard model in which coordinate information is incomplete;

5. The method of claim 2, wherein the generating of the picture annotation information based on the material model included in the scene picture comprises:

6. The method of any one of claims 1-5, further comprising: adding two-dimensional image features into a scene picture, wherein the two-dimensional image features comprise at least one of the following: motion blur features, brightness features, camera distortion features, image compression features.

7. The method of claim 6, wherein adding two-dimensional image features to the scene picture comprises:

8. An apparatus for generating training data, comprising:

9. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the method of any one of claims 1-7.

10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-7.