CN111680758B

CN111680758B - Image training sample generation method and device

Info

Publication number: CN111680758B
Application number: CN202010543613.5A
Authority: CN
Inventors: 许娅彤
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2024-03-05
Anticipated expiration: 2040-06-15
Also published as: CN111680758A

Abstract

The application discloses a method and a device for generating an image training sample, and belongs to the field of computer vision. According to the method, the obtained three-dimensional texture model of the target object is subjected to model transformation, a plurality of three-dimensional transformation models can be obtained, one or more two-dimensional images corresponding to each three-dimensional transformation model are regenerated, and therefore a large number of image training samples similar to the target object can be obtained. That is, the scheme can automatically generate a large number of image training samples meeting the requirements rapidly, and saves labor and time cost. In addition, because the image training samples in the scheme are obtained according to the three-dimensional texture model of the target object, the image training samples and the target object belong to the same type and have higher fidelity. Therefore, compared with an image training sample grabbed from the Internet, the method can simplify the sample processing process, alleviate the problem of sample quality reduction caused by manual operation, and is beneficial to subsequent training of the deep learning network.

Description

Image training sample generation method and device

Technical Field

The present application relates to the field of Computer Vision (CV), and in particular, to a method and apparatus for generating an image training sample.

Background

The current deep learning technology is widely applied to the CV field, in order to enhance the generalization capability of the deep learning network and reduce the overfitting performance of the deep learning network, massive image training samples are generally required to be used for training the deep learning network, and how to acquire massive image training samples becomes a problem to be solved when the deep learning technology is applied to the CV field.

In the related art, an image training sample may be acquired by a method of capturing an image by a camera or capturing an image from the internet. However, since the method of camera acquisition is limited by hardware, labor costs, and the like, it is desirable to acquire images of sufficient magnitude, and the acquisition period is long. Most images captured from the Internet cannot be directly used, the captured images need to be subjected to sample enhancement work such as cleaning and labeling by relying on manual work, the time cost and the labor cost are high, and the quality of the samples is possibly reduced due to manual operation, so that the training of the deep learning network is affected.

Disclosure of Invention

The application provides a method and a device for generating an image training sample, which can save the time cost and the labor cost for acquiring the image training sample and can improve the sample quality. The technical scheme is as follows:

In one aspect, there is provided an image training sample generation method, the method comprising:

acquiring a three-dimensional texture model of a target object;

performing model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models, wherein the model transformation comprises at least one of model deformation and texture transformation;

generating one or more two-dimensional images corresponding to each three-dimensional transformation model in the plurality of three-dimensional transformation models, and taking the generated two-dimensional images as image training samples.

Optionally, the acquiring the three-dimensional texture model of the target object includes:

creating a three-dimensional model of the target object;

and performing texture mapping on the three-dimensional model according to the texture of the target object to obtain the three-dimensional texture model.

Optionally, when the model transformation includes the texture transformation, the performing model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models includes:

carrying out semantic segmentation on the three-dimensional texture model to obtain a plurality of local feature areas, wherein each local feature area in the plurality of local feature areas corresponds to one semantic;

according to the semantics corresponding to the local feature areas, a plurality of target texture materials corresponding to each local feature area are obtained from the stored texture materials;

And carrying out texture transformation on the three-dimensional texture model according to a plurality of target texture materials respectively corresponding to the plurality of local characteristic areas to obtain a plurality of three-dimensional transformation models.

Optionally, when the model transformation includes the model deformation, the performing model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models includes:

obtaining a plurality of groups of structure editing parameters;

and editing the geometric structures of the three-dimensional texture models according to the multiple groups of structure editing parameters to obtain multiple three-dimensional transformation models.

Optionally, the generating one or more two-dimensional images corresponding to each three-dimensional transformation model in the plurality of three-dimensional transformation models includes:

acquiring reference camera parameters;

and projecting each three-dimensional transformation model from one or more different view angles according to the reference camera parameters to obtain one or more two-dimensional images corresponding to the corresponding three-dimensional transformation model.

In another aspect, there is provided an image training sample generation apparatus, the apparatus comprising:

the acquisition module is used for acquiring a three-dimensional texture model of the target object;

the transformation module is used for carrying out model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models, and the model transformation comprises at least one of model deformation and texture transformation;

The generation module is used for generating one or more two-dimensional images corresponding to each three-dimensional transformation model in the plurality of three-dimensional transformation models, and taking the generated two-dimensional images as image training samples.

Optionally, the acquiring module includes:

a creating sub-module for creating a three-dimensional model of the target object;

and the mapping sub-module is used for mapping the texture of the three-dimensional model according to the texture of the target object to obtain the three-dimensional texture model.

Optionally, when the model transform includes the texture transform, the transform module includes:

the semantic segmentation sub-module is used for carrying out semantic segmentation on the three-dimensional texture model to obtain a plurality of local feature areas, and each local feature area in the plurality of local feature areas corresponds to one semantic;

the first acquisition submodule is used for acquiring a plurality of target texture materials corresponding to each local feature region from the stored texture materials according to the semantics corresponding to the local feature regions;

and the transformation submodule is used for carrying out texture transformation on the three-dimensional texture model according to a plurality of target texture materials respectively corresponding to the plurality of local characteristic areas to obtain a plurality of three-dimensional transformation models.

Optionally, when the model transformation includes the model deformation, the transformation module includes:

the second acquisition submodule is used for acquiring a plurality of groups of structure editing parameters;

and the editing sub-module is used for editing the geometric structures of the three-dimensional texture models according to the multiple groups of structure editing parameters to obtain the multiple three-dimensional transformation models.

Optionally, the generating module includes:

the third acquisition submodule is used for acquiring reference camera parameters;

and the projection sub-module is used for projecting each three-dimensional transformation model from one or more different view angles according to the reference camera parameters to obtain one or more two-dimensional images corresponding to the corresponding three-dimensional transformation model.

In another aspect, a computer device is provided, where the computer device includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus, where the memory is used to store a computer program, and where the processor is used to execute the program stored on the memory, so as to implement the steps of the image training sample generating method described above.

In another aspect, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor, implements the steps of the image training sample generation method described above.

In another aspect, a computer program product is provided comprising instructions which, when run on a computer, cause the computer to perform the steps of the image training sample generation method described above.

The technical scheme that this application provided can bring following beneficial effect at least:

by performing model transformation on the obtained three-dimensional texture model of the target object, a plurality of three-dimensional transformation models can be obtained, one or more two-dimensional images corresponding to each three-dimensional transformation model are regenerated, and thus a large number of image training samples similar to the target object can be obtained. That is, the scheme that this application provided can be fast automatic generate a large amount of image training samples that satisfy the demand, compares in actual camera collection sample, has saved equipment input, human cost and time cost, can produce tens of thousands times or more samples in the same time. In addition, because the image training samples in the scheme are obtained according to the three-dimensional texture model of the target object, the image training samples and the target object belong to the same type and have higher fidelity. In this way, compared with the image training samples grabbed from the Internet, the sample processing process can be simplified, so that the problem of sample quality reduction caused by manual operation in the sample processing process is solved, and the subsequent training of the deep learning network is facilitated.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an image training sample generation method provided in an embodiment of the present application;

FIG. 2 is a schematic illustration of a three-dimensional model of a target object created in an embodiment of the present application;

FIG. 3 is a schematic illustration of texture of a target object acquired by an embodiment of the present application;

FIG. 4 is a schematic diagram of a three-dimensional texture model of a target object acquired by an embodiment of the present application;

FIG. 5 is a schematic diagram of a three-dimensional transformation model obtained by model deformation of the three-dimensional texture model shown in FIG. 4 according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a two-dimensional image obtained by projecting a three-dimensional transformation model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image training sample generating device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The current deep learning technology is widely applied to the CV field, in order to enhance the generalization capability of the deep learning network and reduce the overfitting performance of the deep learning network, massive image training samples are generally required to be used for training the deep learning network, and how to acquire massive image training samples becomes a problem to be solved when the deep learning technology is applied to the CV field. Through the technical scheme provided by the embodiment of the application, a large number of image training samples meeting the requirements can be rapidly generated. For example, a large number of image training samples of people can be quickly generated, and the image training samples can be used for training a deep learning network to obtain a deep learning model which can be used for character recognition and the like, and for example, a large number of image training samples of automobiles can be generated for training the deep learning network to obtain a deep learning model which can be used for vehicle detection and the like, and for example, a large number of image training samples of indoor decoration can be generated for training the deep learning network to obtain a deep learning model which can be used for environment structure detection and the like.

The image training sample generation method provided in the embodiment of the present application is explained in detail below.

Fig. 1 is a flowchart of an image training sample generating method according to an embodiment of the present application. Referring to fig. 1, the method includes the following steps.

Step 101: a three-dimensional texture model of the target object is obtained.

In the embodiment of the application, in order to expand and generate a large number of image training samples of specified types, one or more two-dimensional images of a target object can be acquired first, and a three-dimensional texture model of the target object can be acquired through a three-dimensional reconstruction method. The specific type may be specified according to an actual requirement, and the target object is an actual object of the specific type, for example, if the deep learning network is required to be trained currently for vehicle identification, the specific type may be a vehicle, and the target object may be an actual automobile.

In the embodiment of the present application, the specified type may be any type of object, for example, a human body, a vehicle, a table and a chair, a landscape, an interior decoration, and the like.

In some embodiments, after one or more two-dimensional images of the target object are acquired, a three-dimensional model of the target object may be created first, and then the three-dimensional model may be texture mapped according to the texture of the target object, to obtain a three-dimensional texture model.

In the embodiment of the present application, the three-dimensional model of the target object, that is, the three-dimensional geometric structure of the target object, may be reconstructed by a three-dimensional reconstruction algorithm. For example, a three-dimensional model may be reconstructed from the contours of the target object. The three-dimensional reconstruction algorithm may be Multi-View solid geometry (MVS) reconstruction, contour-based shape recovery (Shape from Silhouette, sfS), and the like, which is not limited in this embodiment.

After the three-dimensional model of the target object is created, texture mapping can be carried out on the corresponding area of the three-dimensional model according to the acquired textures of each area of the target object, so as to obtain the three-dimensional texture model of the target object.

For example, assuming that the target object is a car, after the three-dimensional model of the car is created, the three-dimensional model of the car may be mapped according to the acquired texture of the car, so as to obtain the three-dimensional texture model of the car.

As another example, assuming that the target object is a person, fig. 2 is a three-dimensional model created from a plurality of acquired two-dimensional images of the target object, fig. 3 is a plurality of acquired textures of the target object, and fig. 4 is a three-dimensional texture model of the target object obtained by mapping the textures shown in fig. 3 to corresponding regions of the three-dimensional model shown in fig. 2.

In other embodiments, the three-dimensional texture model of the target object may be derived directly from one or more two-dimensional images of the target object. For example, one or more two-dimensional images of the target object may be input into an end-to-end deep learning model, which may directly output a three-dimensional texture model of the target object.

In other embodiments, a three-dimensional texture model of a specified type may be obtained from a library of three-dimensional texture models. For example, a three-dimensional texture model of a vehicle may be obtained from a library of three-dimensional texture models.

In this embodiment of the present application, in addition to the above-mentioned capturing one or more two-dimensional images of the target object to reconstruct a three-dimensional texture model of the target object, one or more depth images of the target object and textures of the target object may be captured, three-dimensional reconstruction may be performed on the captured depth images to obtain a three-dimensional model of the target object, and then texture mapping may be performed on the three-dimensional model according to the textures of the target object to obtain the three-dimensional texture model of the target object. The method for performing three-dimensional reconstruction on the depth image may be a three-dimensional reconstruction method based on deep learning, or may be other methods, which is not limited in this application.

Or, the three-dimensional point cloud of the target object can be acquired through the laser equipment, the texture of the target object is acquired through the camera, then the three-dimensional reconstruction is carried out on the acquired three-dimensional point cloud, the three-dimensional model of the target object is obtained, and then the three-dimensional model is subjected to texture mapping according to the texture of the target object. The method for performing three-dimensional reconstruction on the three-dimensional point cloud may be a three-dimensional reconstruction method based on deep learning, or may be other methods, which is not limited in this application.

Step 102: and carrying out model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models. Wherein the model transformation includes at least one of a model deformation and a texture transformation.

In an embodiment of the present application, after the three-dimensional texture model of the target object is acquired, a model transformation may be performed on the three-dimensional texture model, where the model transformation includes at least one of model deformation and texture transformation. Four implementations of model transformation of a three-dimensional texture model provided by embodiments of the present application will be described.

In a first implementation, when the model transformation includes texture transformation, semantic segmentation may be performed on the three-dimensional texture model to obtain a plurality of local feature regions, where each of the plurality of local feature regions corresponds to a semantic meaning. And then acquiring a plurality of target texture materials corresponding to each local feature region from the stored texture materials according to semantics corresponding to the local feature regions. And then, carrying out texture transformation on the three-dimensional texture model according to a plurality of target texture materials respectively corresponding to the plurality of local characteristic areas to obtain a plurality of three-dimensional transformation models.

In the embodiment of the application, texture transformation can be performed only on the three-dimensional texture model of the target object, so as to obtain a plurality of three-dimensional transformation models. For example, first, the three-dimensional texture model may be semantically segmented to obtain a plurality of local feature regions corresponding to semantics, that is, the three-dimensional texture model is divided into a plurality of local feature regions with different semantics. The semantic feature extraction method can be a deep learning-based semantic feature extraction method, a semantic segmentation method, a model parameterization method and the like, and can also be other semantic segmentation methods.

In this embodiment of the present application, the semantic segmentation may be directly performed on the three-dimensional texture model of the target object, for example, the semantic segmentation may be directly performed on the three-dimensional texture model of the human body to obtain local feature regions such as hair, face, limbs, and the like, and the semantic labeling may be performed on these regions on the three-dimensional texture model to obtain a plurality of local feature regions corresponding to the semantics. In addition, in the embodiment of the application, the three-dimensional texture model of the target object may be projected to multiple view directions to obtain multiple two-dimensional images, then the two-dimensional images are subjected to semantic segmentation to obtain multiple local feature areas corresponding to semantics, and then the corresponding areas of the three-dimensional texture model are subjected to semantic labeling according to the multiple local feature areas. For example, a three-dimensional texture model of a vehicle may be projected to 6 different viewing directions to obtain 6 different two-dimensional images, then the 6 two-dimensional images are respectively semantically segmented to obtain a plurality of local feature areas corresponding to semantics, and then the corresponding areas of the three-dimensional texture model are semantically labeled according to the plurality of local feature areas, so that each local feature area corresponds to one semantics.

After semantic segmentation is performed to obtain a plurality of local feature areas with corresponding semantics, a plurality of target texture materials corresponding to each local feature area can be obtained from a plurality of stored texture materials according to the semantics corresponding to the plurality of local feature areas.

In this embodiment of the present application, a plurality of texture materials may be stored on a computer device, where the stored texture materials may be stored according to semantics, each of which correspondingly stores one or more texture materials, for example, may be stored according to semantics of eyes, eyebrows, a desktop, leather, etc., when the semantics are desktops, the semantics may correspondingly store texture materials of a plurality of different desktops, and color, size, highlight, texture pattern, brightness, etc. of each desktop may be different.

In this embodiment of the present application, a plurality of target texture materials corresponding to each local feature region may be obtained from a plurality of stored texture materials according to the determined semantics corresponding to the plurality of local feature regions, and for any local feature region, the obtained semantics of the plurality of target texture materials may be the same as the semantics of the local feature region or may be related to the semantics of the local feature region.

It should be noted that the number of target texture materials corresponding to any two obtained local feature regions may be the same or different. In addition, for any local feature region, all the target texture materials with the same or related semantics can be obtained from the stored multiple texture materials, and the target texture materials with the number not exceeding the preset number can also be obtained.

After a plurality of target texture materials corresponding to each local characteristic region are obtained, a plurality of target texture materials corresponding to the local characteristic regions can be used for carrying out texture transformation on the corresponding regions of the three-dimensional texture model, so as to obtain a plurality of three-dimensional transformation models.

For each local feature region, each of the obtained target texture materials in the corresponding target texture materials can be used for carrying out one-time texture transformation on the local feature region to obtain one texture transformation of the local feature region, so that multiple texture transformations of the local feature region can be obtained. In the embodiment of the present application, each texture transformation of any one of the local feature regions may be optionally combined with a plurality of texture transformations of the local feature regions remaining except for the local feature region, to obtain a plurality of three-dimensional transformation models.

For example, assuming that the three-dimensional texture model is determined to have 5 local feature regions corresponding to semantics, each local feature region may correspond to 3 target texture materials, so that the 3 th power, i.e., 243 three-dimensional transformation models, may be obtained by combining. Assuming that the three-dimensional texture model is determined to have 4 local feature regions corresponding to semantics, the 4 local feature regions respectively correspond to 2, 3 and 5 target texture materials, so that 2×3×3×5=90 three-dimensional transformation models can be obtained by combining.

In a second implementation manner, when the model transformation includes model deformation, multiple groups of structure editing parameters can be obtained first, and then the geometric structure of the three-dimensional texture model is edited according to the multiple groups of structure editing parameters, so as to obtain multiple three-dimensional transformation models.

In the embodiment of the application, a plurality of different model deformations can be performed on the three-dimensional texture model of the target object only, so as to obtain a plurality of three-dimensional transformation models.

In this embodiment of the present application, a plurality of groups of structure editing parameters may be preset, where each group of structure editing parameters may be used to edit the geometry of the three-dimensional texture model once, so as to obtain a three-dimensional transformation model. For example, according to any set of structure editing parameters, vertex coordinates, normal directions, patch topology, etc. of the three-dimensional texture model may be changed to obtain a three-dimensional transformation model.

The method of model deformation may be an optical flow method, an iterative closest point, a parameterized template driving method, or other methods, which is not limited in the embodiment of the present application.

Illustratively, after model deformation of the three-dimensional texture model shown in FIG. 4 according to a set of structural editing parameters, a three-dimensional transformation model as shown in FIG. 5 may be obtained.

In a third implementation manner, when the model transformation includes model deformation and texture transformation, the three-dimensional texture model may be subjected to texture transformation to obtain a plurality of texture transformation models, and then each texture transformation model in the plurality of texture transformation models is subjected to model deformation to obtain a three-dimensional transformation model corresponding to each texture transformation model.

In the embodiment of the application, the three-dimensional texture model can be subjected to texture transformation and then subjected to model deformation so as to obtain more three-dimensional transformation models. The texture transformation and model deformation methods may be described with reference to the foregoing descriptions, and will not be repeated here.

In a fourth implementation manner, when the model transformation includes model deformation and texture transformation, the model deformation may be performed on the three-dimensional texture model to obtain a plurality of three-dimensional deformation models, and then the texture transformation may be performed on each of the plurality of three-dimensional deformation models to obtain a three-dimensional transformation model corresponding to each of the three-dimensional deformation models.

In the embodiment of the application, the three-dimensional texture model can be subjected to the texture transformation after being subjected to the model deformation so as to obtain more three-dimensional transformation models. The texture transformation and model deformation methods may be described with reference to the foregoing descriptions, and will not be repeated here.

Step 103: one or more two-dimensional images corresponding to each three-dimensional transformation model in the plurality of three-dimensional transformation models are generated, and the generated two-dimensional images are used as image training samples.

In the embodiment of the application, after obtaining a plurality of three-dimensional transformation models, each three-dimensional transformation model may be projected to one or more different viewing directions, one or more corresponding two-dimensional images are generated, and the generated two-dimensional images are used as image training samples.

In the embodiment of the application, the reference camera parameters can be acquired, and each three-dimensional transformation model is projected from one or more different view directions according to the reference camera parameters to obtain one or more two-dimensional images corresponding to the corresponding three-dimensional transformation model. The reference camera parameters may be camera parameters used in creating the three-dimensional texture model, or may be other camera parameters specified by the user.

In order to ensure the fidelity of the two-dimensional image obtained by projection, the projection of the three-dimensional transformation model is performed in a mode of simulating camera shooting, that is, the three-dimensional transformation model is projected according to the parameters of the reference camera, so that the fidelity of the obtained two-dimensional image is high. The reference camera parameters may include one or more sets of camera parameters, each of which may include camera parameters, distortion coefficients, image resolution, view angle range, and the like. In addition, the projected viewing directions may be one or more, and each viewing direction may be the direction of a ray passing through one camera viewpoint. When there are a plurality of view angles, each view angle may correspond to a set of camera parameters, and since a set of camera parameters may determine a view angle range, each view angle corresponds to a view angle range, and the view angle ranges corresponding to the view angles may overlap or not overlap, and the view angle ranges corresponding to the view angles may be the same or different.

It should be noted that, according to the camera internal parameters and the image resolution in the reference camera parameters, the view angle ranges corresponding to the respective view angle directions may be determined, and therefore, in the embodiment of the present application, each view angle range may or may not be specified.

As shown in fig. 6, assuming that 6 viewing angles from 6 camera viewpoints to a center point of the three-dimensional transformation model are 6 viewing angles, the 6 camera viewpoints are located on the same horizontal plane, and the viewing angle ranges corresponding to the 6 viewing angles are bisected by 360 degrees of the horizontal plane, that is, the viewing angle ranges corresponding to each viewing angle direction are 60 degrees, there is no overlap between the 6 viewing angle ranges, and one three-dimensional transformation model is projected to the 6 viewing angle directions, so that 6 different two-dimensional images can be obtained. In the embodiment of the present application, the number of view directions, the position of the camera viewpoint, the line-of-sight direction, the camera internal parameters, the distortion coefficient, the image resolution, and the like may be specified as required. For example, one viewing angle direction may be one direction from top to bottom, and the corresponding viewing angle range may be designated 30 degrees or 120 degrees, or the like.

In the embodiment of the application, the model deformation can generate various models of different three-dimensional geometric structures, and the number of image training samples is greatly expanded from the dimension of the three-dimensional geometric structures. The texture transformation can carry out semantic structure analysis on the three-dimensional texture model, replaces corresponding areas with texture materials with the same semantic, and carries out combination transformation of various texture replacement, thereby greatly expanding the number of image training samples from the dimension of texture content. And finally, generating a two-dimensional image according to a camera projection mode, and taking the two-dimensional image as an image training sample, so that the generated image training sample has high fidelity, has higher cost performance than an actual camera acquisition sample, and can flexibly cope with the user demands.

In summary, in the embodiment of the present application, by performing model transformation on the obtained three-dimensional texture model of the target object, a plurality of three-dimensional transformation models may be obtained, and one or more two-dimensional images corresponding to each three-dimensional transformation model may be regenerated, so that a large number of image training samples similar to the target object may be obtained. That is, the scheme that this application provided can be fast automatic generate a large amount of image training samples that satisfy the demand, compares in actual camera collection sample, has saved equipment input, human cost and time cost, can produce tens of thousands times or more samples in the same time. In addition, because the image training samples in the scheme are obtained according to the three-dimensional texture model of the target object, the image training samples and the target object belong to the same type and have higher fidelity. In this way, compared with the image training samples grabbed from the Internet, the sample processing process can be simplified, so that the problem of sample quality reduction caused by manual operation in the sample processing process is solved, and the subsequent training of the deep learning network is facilitated.

After the sample library is built based on the three-dimensional texture model of the target object in the mode, the samples can be used for training the deep learning network, and the trained deep learning network is used for recognizing faces, vehicles, actions and the like. Because the two-dimensional image sample generated by the three-dimensional texture model is closer to a real object, and the snapshot process is saved, the training cost is lower, and the accuracy of the trained deep learning network identification is higher.

Fig. 7 is a schematic structural diagram of an image training sample generating apparatus provided in an embodiment of the present application, where the image training sample generating apparatus may be implemented as part or all of a computer device by software, hardware, or a combination of both. Referring to fig. 7, the apparatus includes: an acquisition module 701, a transformation module 702 and a generation module 703.

An obtaining module 701, configured to obtain a three-dimensional texture model of a target object;

a transformation module 702, configured to perform a model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models, where the model transformation includes at least one of model deformation and texture transformation;

the generating module 703 is configured to generate one or more two-dimensional images corresponding to each of the plurality of three-dimensional transformation models, and use the generated two-dimensional images as the image training samples.

Optionally, the acquiring module 701 includes:

the creation sub-module is used for creating a three-dimensional model of the target object;

and the mapping sub-module is used for performing texture mapping on the three-dimensional model according to the texture of the target object to obtain a three-dimensional texture model.

Optionally, when the model transform comprises a texture transform, the transform module 702 comprises:

The first acquisition submodule is used for acquiring a plurality of target texture materials corresponding to each local feature region from the stored texture materials according to semantics corresponding to the local feature regions;

and the transformation submodule is used for carrying out texture transformation on the three-dimensional texture model according to a plurality of target texture materials corresponding to the local characteristic areas respectively to obtain a plurality of three-dimensional transformation models.

Alternatively, when the model transformation includes model deformation, transformation module 702 includes:

and the editing sub-module is used for editing the geometric structures of the three-dimensional texture model according to the plurality of groups of structure editing parameters to obtain a plurality of three-dimensional transformation models.

Optionally, the generating module 703 includes:

the third acquisition sub-module is used for acquiring reference camera parameters, wherein the reference camera parameters are camera parameters adopted when the three-dimensional texture model is created;

In the embodiment of the application, by performing model transformation on the obtained three-dimensional texture model of the target object, a plurality of three-dimensional transformation models can be obtained, and one or more two-dimensional images corresponding to each three-dimensional transformation model are regenerated, so that a large number of image training samples similar to the target object can be obtained. That is, the scheme that this application provided can be fast automatic generate a large amount of image training samples that satisfy the demand, compares in actual camera collection sample, has saved equipment input, human cost and time cost, can produce tens of thousands times or more samples in the same time. In addition, because the image training samples in the scheme are obtained according to the three-dimensional texture model of the target object, the image training samples and the target object belong to the same type and have higher fidelity. In this way, compared with the image training samples grabbed from the Internet, the sample processing process can be simplified, so that the problem of sample quality reduction caused by manual operation in the sample processing process is solved, and the subsequent training of the deep learning network is facilitated.

It should be noted that: the image training sample generating device provided in the above embodiment only uses the division of the above functional modules to illustrate when generating the image training sample, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the image training sample generating device and the image training sample generating method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments, which are not repeated herein.

Fig. 8 is a block diagram of a computer device 800 according to an embodiment of the present application. The computer device 800 may be a smart phone, tablet computer, notebook computer, desktop computer, or the like.

In general, the computer device 800 includes: a processor 801 and a memory 802.

Processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 801 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 801 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 801 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and rendering of content required to be displayed by the display screen. In some embodiments, the processor 801 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one instruction for execution by processor 801 to implement the image training sample generation method provided by the method embodiments herein.

In some embodiments, the computer device 800 may optionally further include: a peripheral interface 803, and at least one peripheral. The processor 801, the memory 802, and the peripheral interface 803 may be connected by a bus or signal line. Individual peripheral devices may be connected to the peripheral device interface 803 by buses, signal lines, or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 804, a touch display 805, a camera 806, audio circuitry 807, a positioning component 808, and a power supply 809.

Peripheral interface 803 may be used to connect at least one Input/Output (I/O) related peripheral to processor 801 and memory 802. In some embodiments, processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 804 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 804 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 804 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 804 may communicate with other computer devices via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 804 may also include NFC (Near Field Communication ) related circuitry, which is not limited in this application.

The display 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to collect touch signals at or above the surface of the display 805. The touch signal may be input as a control signal to the processor 801 for processing. At this time, the display 805 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, the display 805 may be a front panel provided to the computer device 800; in other embodiments, the display 805 may be at least two different surfaces or in a folded configuration, each of which is disposed on the computer device 800; in other embodiments, the display 805 may be a flexible display disposed on a curved surface or a folded surface of the computer device 800. Even more, the display 805 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The display 805 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 806 is used to capture images or video. Optionally, the camera assembly 806 includes a front camera and a rear camera. Typically, the front camera is disposed on a front panel of the computer device and the rear camera is disposed on a rear surface of the computer device. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, the camera assembly 806 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

Audio circuitry 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, inputting the electric signals to the processor 801 for processing, or inputting the electric signals to the radio frequency circuit 804 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple, each disposed at a different location of the computer device 800. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 807 may also include a headphone jack.

The location component 808 is used to locate the current geographic location of the computer device 800 for navigation or LBS (Location Based Service, location-based services). The positioning component 808 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, or the Galileo system of Russia.

The power supply 809 is used to power the various components in the computer device 800. The power supply 809 may be an alternating current, direct current, disposable battery, or rechargeable battery. When the power supply 809 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the computer device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyroscope sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815, and proximity sensor 816.

The acceleration sensor 811 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the computer device 800. For example, the acceleration sensor 811 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 801 may control the touch display screen 805 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 811. Acceleration sensor 811 may also be used for the acquisition of motion data of a game or user.

The gyro sensor 812 may detect a body direction and a rotation angle of the computer device 800, and the gyro sensor 812 may collect a 3D motion of the user on the computer device 800 in cooperation with the acceleration sensor 811. The processor 801 may implement the following functions based on the data collected by the gyro sensor 812: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

Pressure sensor 813 may be disposed on a side frame of computer device 800 and/or on an underlying layer of touch display 805. When the pressure sensor 813 is disposed on a side frame of the computer device 800, a grip signal of the computer device 800 by a user may be detected, and the processor 801 performs left-right hand recognition or quick operation according to the grip signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at the lower layer of the touch display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 805. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 814 is used to collect a fingerprint of a user, and the processor 801 identifies the identity of the user based on the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 814 may be provided on the front, back, or side of the computer device 800. When a physical key or vendor Logo is provided on the computer device 800, the fingerprint sensor 814 may be integrated with the physical key or vendor Logo.

The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of the touch display screen 805 based on the intensity of ambient light collected by the optical sensor 815. Specifically, when the intensity of the ambient light is high, the display brightness of the touch display screen 805 is turned up; when the ambient light intensity is low, the display brightness of the touch display screen 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera module 806 based on the ambient light intensity collected by the optical sensor 815.

A proximity sensor 816, also referred to as a distance sensor, is typically provided on the front panel of the computer device 800. The proximity sensor 816 is used to collect the distance between the user and the front of the computer device 800. In one embodiment, when the proximity sensor 816 detects a gradual decrease in the distance between the user and the front of the computer device 800, the processor 801 controls the touch display 805 to switch from the bright screen state to the off screen state; when the proximity sensor 816 detects that the distance between the user and the front of the computer device 800 gradually increases, the touch display 805 is controlled by the processor 801 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is not limiting and that more or fewer components than shown may be included or that certain components may be combined or that a different arrangement of components may be employed.

In some embodiments, there is also provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of the image training sample generation method of the above embodiments. For example, the computer readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It is noted that the computer readable storage medium mentioned in the present application may be a non-volatile storage medium, in other words, may be a non-transitory storage medium.

It should be understood that all or part of the steps to implement the above-described embodiments may be implemented by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

That is, in some embodiments, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the steps of the image training sample generation method described above.

The above embodiments are provided for the purpose of not limiting the present application, but rather, any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of generating an image training sample, the method comprising:

acquiring a three-dimensional texture model of a target object based on a plurality of two-dimensional images acquired for an actual target object in specified object types, wherein the three-dimensional texture model is obtained by performing texture mapping on corresponding areas of a three-dimensional geometric structure of the target object according to textures of all areas of the target object, and the specified object types comprise a human body, a vehicle, a table and chair, a landscape and an indoor decoration;

performing model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models, wherein the model transformation comprises model deformation and texture transformation, and the model deformation is used for transforming the geometric structure of the three-dimensional texture model;

Obtaining reference camera parameters, wherein the reference camera parameters comprise a plurality of groups of camera parameters, each group of camera parameters comprises camera internal parameters, distortion coefficients, image resolution and view angle ranges, each three-dimensional transformation model is projected from a plurality of different view angle directions according to the reference camera parameters, one or more two-dimensional images corresponding to each three-dimensional transformation model in the plurality of three-dimensional transformation models are obtained, and the generated two-dimensional images are used as image training samples aiming at the appointed object types;

when the model transformation includes the texture transformation, the model transformation is performed on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models, including:

according to the semantics corresponding to the local feature areas, a plurality of target texture materials corresponding to each local feature area are obtained from the stored texture materials; the stored texture materials are stored according to semantics, each semantic corresponds to a plurality of texture materials, and the target texture materials comprise texture materials with the same or related semantics as those of the corresponding local characteristic areas;

According to a plurality of target texture materials respectively corresponding to the plurality of local feature areas, performing texture transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models; and for each local characteristic region, performing texture transformation on the corresponding local characteristic region by using each target texture material in the obtained corresponding multiple target texture materials to obtain one texture transformation of the corresponding local characteristic region, further obtaining multiple texture transformations of the local characteristic region, and for any local characteristic region in the multiple local characteristic regions, randomly combining each texture transformation of the any local characteristic region with multiple texture transformations of the residual local characteristic regions except the any local characteristic region to obtain multiple three-dimensional transformation models.

2. The method of claim 1, wherein the acquiring the three-dimensional texture model of the target object comprises:

creating a three-dimensional model of the target object;

3. The method of claim 1, wherein when the model transformation includes the model deformation, the model transforming the three-dimensional texture model to obtain a plurality of three-dimensional transformed models, comprising:

Obtaining a plurality of groups of structure editing parameters;

4. An image training sample generation apparatus, the apparatus comprising:

the acquisition module is used for acquiring a three-dimensional texture model of a target object based on a plurality of two-dimensional images acquired for an actual target object in specified object types, wherein the three-dimensional texture model is obtained by performing texture mapping on corresponding areas of a three-dimensional geometric structure of the target object according to textures of all areas of the target object, and the specified object types comprise a human body, a vehicle, a table and a chair, a scenery and indoor decoration;

the transformation module is used for carrying out model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models, wherein the model transformation comprises model deformation and texture transformation, and the model deformation is used for transforming the geometric structure of the three-dimensional texture model;

the generation module is used for acquiring reference camera parameters, wherein the reference camera parameters comprise a plurality of groups of camera parameters, each group of camera parameters comprise camera internal parameters, distortion coefficients, image resolution and view angle ranges, each three-dimensional transformation model is projected from a plurality of different view angle directions according to the reference camera parameters to obtain one or more two-dimensional images corresponding to each three-dimensional transformation model in the plurality of three-dimensional transformation models, and the generated two-dimensional images are used as image training samples aiming at the appointed object types;

Wherein when the model transform comprises the texture transform, the transform module comprises:

the first acquisition submodule is used for acquiring a plurality of target texture materials corresponding to each local feature region from the stored texture materials according to the semantics corresponding to the local feature regions; the stored texture materials are stored according to semantics, each semantic corresponds to a plurality of texture materials, and the target texture materials comprise texture materials with the same or related semantics as those of the corresponding local characteristic areas;

the transformation submodule is used for carrying out texture transformation on the three-dimensional texture model according to a plurality of target texture materials corresponding to the local feature areas respectively to obtain a plurality of three-dimensional transformation models; and for each local characteristic region, performing texture transformation on the corresponding local characteristic region by using each target texture material in the obtained corresponding multiple target texture materials to obtain one texture transformation of the corresponding local characteristic region, further obtaining multiple texture transformations of the local characteristic region, and for any local characteristic region in the multiple local characteristic regions, randomly combining each texture transformation of the any local characteristic region with multiple texture transformations of the residual local characteristic regions except the any local characteristic region to obtain multiple three-dimensional transformation models.

5. The apparatus of claim 4, wherein the acquisition module comprises:

6. The apparatus of claim 4, wherein when the model transformation comprises the model deformation, the transformation module comprises: