CN111680758A

CN111680758A - Image training sample generation method and device

Info

Publication number: CN111680758A
Application number: CN202010543613.5A
Authority: CN
Inventors: 许娅彤
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2020-09-18
Anticipated expiration: 2040-06-15
Also published as: CN111680758B

Abstract

The application discloses an image training sample generation method and device, and belongs to the field of computer vision. According to the method and the device, a plurality of three-dimensional transformation models can be obtained by performing model transformation on the obtained three-dimensional texture model of the target object, and one or more two-dimensional images corresponding to each three-dimensional transformation model are generated, so that a large number of image training samples similar to the target object can be obtained. That is, the scheme can quickly and automatically generate a large number of image training samples meeting the requirements, and saves labor and time cost. And because the image training samples in the scheme are obtained according to the three-dimensional texture model of the target object, the image training samples and the target object belong to the same type and have higher fidelity. Therefore, compared with an image training sample captured from the Internet, the method can simplify the sample processing process, reduce the problem of quality reduction of the sample introduced by manual operation, and is beneficial to subsequent training of a deep learning network.

Description

Image training sample generation method and device

Technical Field

The present disclosure relates to the field of Computer Vision (CV), and in particular, to a method and an apparatus for generating an image training sample.

Background

The current deep learning technology is widely applied to the CV field, in order to enhance the generalization ability of the deep learning network and reduce the overfitting property of the deep learning network, a large number of image training samples are generally required to be used for training the deep learning network, and how to obtain the large number of image training samples becomes a problem to be solved urgently when the deep learning technology is applied to the CV field.

In the related art, an image training sample may be acquired by a method of capturing an image by a camera or capturing an image from the internet. However, since the method of camera acquisition is limited by hardware, labor cost, and the like, it is desirable to acquire an image of a sufficient magnitude and an acquisition period is long. Most of the images captured from the internet cannot be directly used, the captured images need to be cleaned and labeled manually, sample enhancement work such as sample enhancement work is needed, time cost and labor cost are high, and manual operation may cause sample quality reduction, so that training of a deep learning network is affected.

Disclosure of Invention

The application provides an image training sample generation method and device, which can save the time cost and the labor cost for obtaining an image training sample and can improve the sample quality. The technical scheme is as follows:

in one aspect, a method for generating an image training sample is provided, where the method includes:

acquiring a three-dimensional texture model of a target object;

performing model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models, wherein the model transformation comprises at least one of model deformation and texture transformation;

and generating one or more two-dimensional images corresponding to each three-dimensional transformation model in the plurality of three-dimensional transformation models, and taking the generated two-dimensional images as image training samples.

Optionally, the obtaining a three-dimensional texture model of the target object includes:

creating a three-dimensional model of the target object;

and performing texture mapping on the three-dimensional model according to the texture of the target object to obtain the three-dimensional texture model.

Optionally, when the model transformation includes the texture transformation, the performing model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models includes:

performing semantic segmentation on the three-dimensional texture model to obtain a plurality of local feature regions, wherein each local feature region in the plurality of local feature regions corresponds to a semantic;

according to the semantics corresponding to the local feature regions, acquiring a plurality of target texture materials corresponding to each local feature region from a plurality of stored texture materials;

and performing texture transformation on the three-dimensional texture model according to a plurality of target texture materials corresponding to the plurality of local characteristic regions respectively to obtain a plurality of three-dimensional transformation models.

Optionally, when the model transformation includes the model deformation, the performing model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models includes:

acquiring a plurality of groups of structure editing parameters;

and editing the geometric structures of the three-dimensional texture models respectively according to the multiple groups of structure editing parameters to obtain the multiple three-dimensional transformation models.

Optionally, the generating one or more two-dimensional images corresponding to each of the plurality of three-dimensional transformation models comprises:

acquiring reference camera parameters;

and projecting each three-dimensional transformation model from one or more different view angle directions according to the reference camera parameters to obtain one or more two-dimensional images corresponding to the corresponding three-dimensional transformation models.

In another aspect, an image training sample generating apparatus is provided, the apparatus including:

the acquisition module is used for acquiring a three-dimensional texture model of the target object;

the transformation module is used for carrying out model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models, and the model transformation comprises at least one of model deformation and texture transformation;

and the generating module is used for generating one or more two-dimensional images corresponding to each three-dimensional transformation model in the plurality of three-dimensional transformation models and taking the generated two-dimensional images as image training samples.

Optionally, the obtaining module includes:

a creating submodule for creating a three-dimensional model of the target object;

and the mapping submodule is used for performing texture mapping on the three-dimensional model according to the texture of the target object to obtain the three-dimensional texture model.

Optionally, when the model transform comprises the texture transform, the transform module comprises:

the semantic segmentation submodule is used for performing semantic segmentation on the three-dimensional texture model to obtain a plurality of local feature areas, and each local feature area in the plurality of local feature areas corresponds to a semantic;

the first obtaining submodule is used for obtaining a plurality of target texture materials corresponding to each local characteristic region from a plurality of stored texture materials according to the semantics corresponding to the local characteristic regions;

and the transformation submodule is used for carrying out texture transformation on the three-dimensional texture model according to a plurality of target texture materials respectively corresponding to the local feature areas to obtain a plurality of three-dimensional transformation models.

Optionally, when the model transformation includes the model deformation, the transformation module includes:

the second obtaining submodule is used for obtaining a plurality of groups of structure editing parameters;

and the editing submodule is used for respectively editing the geometric structures of the three-dimensional texture models according to the multiple groups of structure editing parameters to obtain the multiple three-dimensional transformation models.

Optionally, the generating module includes:

the third acquisition sub-module is used for acquiring reference camera parameters;

and the projection submodule is used for projecting each three-dimensional transformation model from one or more different view angle directions according to the reference camera parameters to obtain one or more two-dimensional images corresponding to the corresponding three-dimensional transformation models.

In another aspect, a computer device is provided, which includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus, the memory is used for storing computer programs, and the processor is used for executing the programs stored in the memory to implement the steps of the image training sample generation method.

In another aspect, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, implements the steps of the image training sample generation method described above.

In another aspect, a computer program product containing instructions is provided, which when run on a computer, causes the computer to perform the steps of the image training sample generation method described above.

The technical scheme provided by the application can at least bring the following beneficial effects:

by performing model transformation on the obtained three-dimensional texture model of the target object, a plurality of three-dimensional transformation models can be obtained, and one or more two-dimensional images corresponding to each three-dimensional transformation model are generated, so that a large number of image training samples similar to the target object can be obtained. That is, the scheme that this application provided can be fast automatic generation a large amount of image training samples that satisfy the demand, compares in the actual camera and gathers the sample, has saved equipment input, human cost and time cost, can produce tens of thousands of times or more sample in the same time. And because the image training samples in the scheme are obtained according to the three-dimensional texture model of the target object, the image training samples and the target object belong to the same type and have higher fidelity. Therefore, compared with an image training sample captured from the Internet, the method can simplify the sample processing process, thereby reducing the problem of sample quality reduction caused by manual operation in the sample processing process and being beneficial to subsequent training of a deep learning network.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of an image training sample generation method provided in an embodiment of the present application;

FIG. 2 is a schematic illustration of a three-dimensional model of a target object created in an embodiment of the present application;

FIG. 3 is a schematic diagram of a texture of a target object acquired by an embodiment of the present application;

FIG. 4 is a schematic diagram of a three-dimensional texture model of a target object obtained by an embodiment of the present application;

FIG. 5 is a schematic diagram of a three-dimensional transformation model obtained by performing model transformation on the three-dimensional texture model shown in FIG. 4 according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a projection of a three-dimensional transformation model to obtain a two-dimensional image according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image training sample generating apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The current deep learning technology is widely applied to the CV field, in order to enhance the generalization ability of the deep learning network and reduce the overfitting property of the deep learning network, a large number of image training samples are generally required to be used for training the deep learning network, and how to obtain the large number of image training samples becomes a problem to be solved urgently when the deep learning technology is applied to the CV field. By the technical scheme provided by the embodiment of the application, a large number of image training samples meeting requirements can be generated quickly. For example, a large number of image training samples of people can be generated quickly, and can be used for training a deep learning network to obtain a deep learning model which can be used for character recognition and the like, and for example, a large number of image training samples of automobiles can be generated and can be used for training the deep learning network to obtain a deep learning model which can be used for vehicle detection and the like, and for example, a large number of image training samples of interior decoration can be generated and can be used for training the deep learning network to obtain a deep learning model which can be used for environment structure detection and the like.

Next, a detailed explanation is given of the image training sample generation method provided in the embodiment of the present application.

Fig. 1 is a flowchart of an image training sample generation method according to an embodiment of the present disclosure. Referring to fig. 1, the method includes the following steps.

Step 101: and acquiring a three-dimensional texture model of the target object.

In the embodiment of the application, in order to generate a large number of image training samples of a specified type by expansion, one or more two-dimensional images of a target object may be acquired first, and a three-dimensional texture model of the target object may be obtained by a three-dimensional reconstruction method. The specified type can be specified according to actual requirements, and the target object is an actual object of the specified type, for example, if the deep learning network needs to be trained currently for vehicle identification, the specified type can be a vehicle, and the target object can be an actual automobile.

It should be noted that, in the embodiment of the present application, the specified type may be any object type, for example, a human body, a vehicle, a table and chair, a landscape, an interior decoration, and the like.

In some embodiments, after acquiring one or more two-dimensional images of the target object, a three-dimensional model of the target object may be created, and then texture mapping may be performed on the three-dimensional model according to the texture of the target object, so as to obtain a three-dimensional texture model.

In the embodiment of the present application, a three-dimensional model of the target object, that is, a three-dimensional geometric structure of the target object may be reconstructed through a three-dimensional reconstruction algorithm. For example, a three-dimensional model may be reconstructed from the contours of the target object. The three-dimensional reconstruction algorithm may be Multi-View stereogeometric (MVS) reconstruction, contour-based Shape recovery (SfS), and the like, which is not limited in this embodiment of the present application.

After the three-dimensional model of the target object is created, texture mapping can be performed on the corresponding region of the three-dimensional model according to the collected texture of each region of the target object, so that the three-dimensional texture model of the target object is obtained.

For example, assuming that the target object is a vehicle, after the three-dimensional model of the vehicle is created, texture mapping may be performed on the three-dimensional model of the vehicle according to the acquired texture of the vehicle, so as to obtain the three-dimensional texture model of the vehicle.

For another example, assuming that the target object is a person, fig. 2 is a three-dimensional model created from a plurality of two-dimensional images of the target object, fig. 3 is a plurality of textures of the target object, and fig. 4 is a three-dimensional texture model of the target object obtained by mapping the textures shown in fig. 3 on corresponding regions of the three-dimensional model shown in fig. 2.

In other embodiments, a three-dimensional texture model of the target object may be derived directly from one or more two-dimensional images of the target object. For example, one or more two-dimensional images of the target object may be input into an end-to-end deep learning model, which may directly output a three-dimensional texture model of the target object.

In other embodiments, a three-dimensional texture model of a specified type may be obtained from a library of three-dimensional texture models. For example, a three-dimensional texture model for a vehicle may be obtained from a library of three-dimensional texture models.

In this embodiment, in addition to the above-mentioned acquiring of one or more two-dimensional images of the target object to reconstruct the three-dimensional texture model of the target object, one or more depth images of the target object and the texture of the target object may also be acquired, the acquired depth images are three-dimensionally reconstructed to obtain the three-dimensional model of the target object, and then the three-dimensional model is texture-mapped according to the texture of the target object to obtain the three-dimensional texture model of the target object. The method for performing three-dimensional reconstruction on the depth image may be a three-dimensional reconstruction method based on depth learning, or may be other methods, which is not limited in this application.

Or, the three-dimensional point cloud of the target object can be acquired through laser equipment, the texture of the target object is acquired through a camera, then, the acquired three-dimensional point cloud is subjected to three-dimensional reconstruction to obtain a three-dimensional model of the target object, and then, texture mapping is carried out on the three-dimensional model according to the texture of the target object. The method for three-dimensional reconstruction of the three-dimensional point cloud may be a three-dimensional reconstruction method based on deep learning, or may be other methods, which is not limited in the present application.

Step 102: and carrying out model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models. Wherein the model transformation includes at least one of a model deformation and a texture transformation.

In the embodiment of the present application, after obtaining the three-dimensional texture model of the target object, the three-dimensional texture model may be subjected to model transformation, where the model transformation includes at least one of model deformation and texture transformation. Next, four implementations of model transformation of the three-dimensional texture model provided in the embodiment of the present application will be described.

In a first implementation manner, when the model transformation includes texture transformation, semantic segmentation may be performed on the three-dimensional texture model to obtain a plurality of local feature regions, and each local feature region in the plurality of local feature regions corresponds to a semantic. And then, according to the semantics corresponding to the local feature regions, acquiring a plurality of target texture materials corresponding to each local feature region from the stored texture materials. And then carrying out texture transformation on the three-dimensional texture model according to a plurality of target texture materials corresponding to the local feature areas respectively to obtain a plurality of three-dimensional transformation models.

In the embodiment of the present application, only the three-dimensional texture model of the target object may be subjected to texture transformation, so as to obtain a plurality of three-dimensional transformation models. For example, firstly, the three-dimensional texture model may be subjected to semantic segmentation to obtain a plurality of local feature regions corresponding to semantics, that is, the three-dimensional texture model is divided into a plurality of local feature regions with different semantics. The semantic feature extraction method can be a deep learning-based semantic feature extraction method, a semantic segmentation method, a model parameterization method and the like, and can also be other semantic segmentation methods.

It should be noted that, in the embodiment of the present application, semantic segmentation may be directly performed on the three-dimensional texture model of the target object, for example, semantic segmentation may be directly performed on the three-dimensional texture model of the human body to obtain local feature regions such as hair, face, limbs, and the like, and semantic labeling is performed on these regions on the three-dimensional texture model, so as to obtain a plurality of local feature regions corresponding to semantics. In addition, in the embodiment of the application, the three-dimensional texture model of the target object may be projected to a plurality of view directions to obtain a plurality of two-dimensional images, then the two-dimensional images are subjected to semantic segmentation to obtain a plurality of local feature regions corresponding to semantics, and then the corresponding regions of the three-dimensional texture model are subjected to semantic annotation according to the plurality of local feature regions. For example, a three-dimensional texture model of a vehicle may be projected to 6 different view directions to obtain 6 different two-dimensional images, then the 6 two-dimensional images are subjected to semantic segmentation respectively to obtain a plurality of local feature regions corresponding to semantics, and then semantic labeling is performed on corresponding regions of the three-dimensional texture model according to the plurality of local feature regions, so that each local feature region corresponds to a semantic.

After the semantic segmentation is performed to obtain a plurality of local feature regions corresponding to semantics, a plurality of target texture materials corresponding to each local feature region can be obtained from a plurality of stored texture materials according to the semantics corresponding to the plurality of local feature regions.

In the embodiment of the application, a plurality of texture materials can be stored on a computer device, the stored texture materials can be stored according to semantics, one or more texture materials are correspondingly stored in each semantic, for example, the stored texture materials can be stored according to the semantics of eyes, eyebrows, a desktop, leather and the like, when the semantics is the desktop, the stored texture materials of a plurality of different desktops can be correspondingly stored, and the color, size, highlight, texture pattern, brightness and the like of each desktop can be different.

In this embodiment of the application, a plurality of target texture materials corresponding to each local feature region may be obtained from a plurality of stored texture materials according to the determined semantics corresponding to the plurality of local feature regions, and for any local feature region, the semantics of the obtained plurality of target texture materials may be the same as or related to the semantics of the local feature region.

It should be noted that the number of the acquired target texture materials corresponding to any two local feature regions may be the same or different. In addition, for any local feature region, all the target texture materials with the same or related semantics can be acquired from the stored multiple texture materials, and no more than a preset number of target texture materials can be acquired.

After obtaining a plurality of target texture materials corresponding to each local feature region, the plurality of target texture materials corresponding to the plurality of local feature regions may be used to perform texture transformation on the corresponding region of the three-dimensional texture model, so as to obtain a plurality of three-dimensional transformation models.

For each local feature region, each of the obtained corresponding multiple target texture materials may be used to perform a texture transformation on the local feature region to obtain a texture transformation of the local feature region, so that multiple texture transformations of the local feature region may be obtained. In the embodiment of the present application, each texture transformation of any one of the plurality of local feature regions may be randomly combined with a plurality of texture transformations of the remaining local feature regions except the local feature region to obtain a plurality of three-dimensional transformation models.

For example, assuming that the three-dimensional texture model has 5 corresponding local feature regions with semantics, each local feature region may correspond to 3 target texture materials, and thus, 3 powers of 5, that is, 243 three-dimensional transformation models may be obtained by combination. Assuming that the three-dimensional texture model has 4 corresponding local feature regions with semantics, and the 4 local feature regions correspond to 2, 3, and 5 target texture materials, respectively, in this way, 2 × 3 × 3 × 5 — 90 three-dimensional transformation models can be obtained by combination.

In a second implementation manner, when the model transformation includes model deformation, a plurality of groups of structure editing parameters may be obtained first, and then the geometric structures of the three-dimensional texture models are edited respectively according to the plurality of groups of structure editing parameters, so as to obtain a plurality of three-dimensional transformation models.

In the embodiment of the present application, a plurality of different model deformations may be performed only on the three-dimensional texture model of the target object to obtain a plurality of three-dimensional transformation models.

In the embodiment of the present application, a plurality of groups of structure editing parameters may be preset, and each group of structure editing parameters may be used to edit the geometric structure of the three-dimensional texture model at a time, so as to obtain a three-dimensional transformation model. For example, according to any group of structure editing parameters, the vertex coordinates, normal direction, patch topology and the like of the three-dimensional texture model can be changed, and a three-dimensional transformation model is obtained.

The model deformation method may be an optical flow method, an iterative closest point, a parameterized template driving method, or other methods, which is not limited in this embodiment of the present application.

Illustratively, after model deformation of the three-dimensional texture model shown in FIG. 4 according to a set of structure editing parameters, a three-dimensional transformation model as shown in FIG. 5 may be obtained.

In a third implementation manner, when the model transformation includes model deformation and texture transformation, the three-dimensional texture model may be subjected to texture transformation to obtain a plurality of texture transformation models, and then each texture transformation model in the plurality of texture transformation models is subjected to model deformation to obtain a three-dimensional transformation model corresponding to each texture transformation model.

In the embodiment of the application, the three-dimensional texture model can be subjected to texture transformation and then to model deformation, so that more three-dimensional transformation models can be obtained. The methods for texture transformation and model deformation can refer to the foregoing related descriptions, and are not described herein again.

In a fourth implementation manner, when the model transformation includes model transformation and texture transformation, the three-dimensional texture model may be subjected to model transformation to obtain a plurality of three-dimensional deformation models, and then each three-dimensional deformation model in the plurality of three-dimensional deformation models may be subjected to texture transformation to obtain a three-dimensional transformation model corresponding to each three-dimensional deformation model.

In the embodiment of the application, the three-dimensional texture model can be subjected to model deformation and then texture transformation, so that more three-dimensional transformation models can be obtained. The methods for texture transformation and model deformation can refer to the foregoing related descriptions, and are not described herein again.

Step 103: and generating one or more two-dimensional images corresponding to each three-dimensional transformation model in the plurality of three-dimensional transformation models, and taking the generated two-dimensional images as image training samples.

In this embodiment of the application, after obtaining a plurality of three-dimensional transformation models, each three-dimensional transformation model may be projected to one or more different view directions, so as to generate one or more corresponding two-dimensional images, and the generated two-dimensional images are used as image training samples.

In the embodiment of the application, reference camera parameters can be acquired, and each three-dimensional transformation model is projected from one or more different view angle directions according to the reference camera parameters to obtain one or more two-dimensional images corresponding to the corresponding three-dimensional transformation model. The reference camera parameters may refer to camera parameters used when creating the three-dimensional texture model, or may be other camera parameters specified by the user.

It should be noted that, in order to ensure the fidelity of the projected two-dimensional image, the projection of the three-dimensional transformation model is performed in a manner of simulating camera shooting, that is, the three-dimensional transformation model is projected according to the reference camera parameters, and the fidelity of the obtained two-dimensional image is very high. The reference camera parameters may include one or more sets of camera parameters, and each set of camera parameters may include camera parameters, distortion coefficients, image resolution, view angle range, and the like. In addition, the view direction of the projection may be one or more, and each view direction may be a direction of a ray passing through one camera viewpoint. When the projected view angle directions are multiple, each view angle direction can correspond to a group of camera parameters respectively, and a view angle range can be determined by a group of camera parameters, so that each view angle direction also corresponds to a view angle range, the view angle ranges corresponding to the view angle directions can be overlapped or not overlapped, and the view angle ranges corresponding to the view angle directions can be the same or different in size.

It should be noted that, according to the camera internal reference and the image resolution in the reference camera parameter, the view angle range corresponding to the corresponding view angle direction may be determined, and based on this, in the embodiment of the present application, each view angle range may be specified or may not be specified.

For example, as shown in fig. 6, it is assumed that the central points from the illustrated 6 camera viewpoints to the three-dimensional transformation model are 6 view angle directions, the illustrated 6 camera viewpoints are located on the same horizontal plane, and the view angle ranges corresponding to the 6 view angle directions bisect 360 degrees of the horizontal plane, that is, the view angle ranges corresponding to each view angle direction are 60 degrees, the 6 view angle ranges are not overlapped, and one three-dimensional transformation model is projected to the 6 view angle directions, so that 6 different two-dimensional images can be obtained. In the embodiment of the present application, the number of viewing angle directions, the position of the camera viewpoint, the viewing direction, camera parameters, distortion coefficients, image resolution, and the like may be specified as required. For example, one viewing angle direction may also be a direction from top to bottom, and the corresponding viewing angle range may specify 30 degrees or 120 degrees, or the like.

In the embodiment of the application, the model deformation can generate various models with different three-dimensional geometrical structures, and the number of image training samples is greatly expanded from the dimension of the three-dimensional geometrical structures. The texture transformation can carry out semantic structure analysis on the three-dimensional texture model, replace corresponding regions by texture materials with the same semantics, carry out combined transformation of various texture replacements, and greatly expand the number of image training samples from the dimensionality of texture contents. And finally, generating a two-dimensional image according to a camera projection mode, and taking the two-dimensional image as an image training sample, so that the generated image training sample has high fidelity, higher cost performance than the actual camera for collecting the sample, and can flexibly meet the requirements of users.

In summary, in the embodiment of the present application, a plurality of three-dimensional transformation models can be obtained by performing model transformation on the obtained three-dimensional texture model of the target object, and then one or more two-dimensional images corresponding to each three-dimensional transformation model are generated, so that a large number of image training samples similar to the target object can be obtained. That is, the scheme that this application provided can be fast automatic generation a large amount of image training samples that satisfy the demand, compares in the actual camera and gathers the sample, has saved equipment input, human cost and time cost, can produce tens of thousands of times or more sample in the same time. And because the image training samples in the scheme are obtained according to the three-dimensional texture model of the target object, the image training samples and the target object belong to the same type and have higher fidelity. Therefore, compared with an image training sample captured from the Internet, the method can simplify the sample processing process, thereby reducing the problem of sample quality reduction caused by manual operation in the sample processing process and being beneficial to subsequent training of a deep learning network.

After the sample library is constructed based on the three-dimensional texture model of the target object in the manner, the samples can be adopted to train the deep learning network, and the trained deep learning network is used for recognizing human faces, vehicles, actions and the like. Because the two-dimensional image sample generated by the three-dimensional texture model is closer to a real object and the snapshot process is saved, the training cost is lower, and the accuracy of the trained deep learning network identification is higher.

Fig. 7 is a schematic structural diagram of an image training sample generation apparatus provided in an embodiment of the present application, which may be implemented as part or all of a computer device by software, hardware, or a combination of the two. Referring to fig. 7, the apparatus includes: an obtaining module 701, a transforming module 702 and a generating module 703.

An obtaining module 701, configured to obtain a three-dimensional texture model of a target object;

a transformation module 702, configured to perform model transformation on the three-dimensional texture model to obtain a plurality of three-dimensional transformation models, where the model transformation includes at least one of model deformation and texture transformation;

the generating module 703 is configured to generate one or more two-dimensional images corresponding to each of the multiple three-dimensional transformation models, and use the generated two-dimensional images as image training samples.

Optionally, the obtaining module 701 includes:

Optionally, when the model transformation comprises a texture transformation, the transformation module 702 comprises:

the first obtaining submodule is used for obtaining a plurality of target texture materials corresponding to each local feature region from a plurality of stored texture materials according to the corresponding semantics of the local feature regions;

and the transformation submodule is used for carrying out texture transformation on the three-dimensional texture model according to a plurality of target texture materials respectively corresponding to the local characteristic regions to obtain a plurality of three-dimensional transformation models.

Optionally, when the model transformation includes model deformation, the transformation module 702 includes:

and the editing submodule is used for respectively editing the geometric structure of the three-dimensional texture model according to the multiple groups of structure editing parameters to obtain a plurality of three-dimensional transformation models.

Optionally, the generating module 703 includes:

the third acquisition sub-module is used for acquiring reference camera parameters, wherein the reference camera parameters refer to camera parameters adopted when the three-dimensional texture model is created;

In the embodiment of the application, a plurality of three-dimensional transformation models can be obtained by performing model transformation on the obtained three-dimensional texture model of the target object, and one or more two-dimensional images corresponding to each three-dimensional transformation model are generated, so that a large number of image training samples similar to the target object can be obtained. That is, the scheme that this application provided can be fast automatic generation a large amount of image training samples that satisfy the demand, compares in the actual camera and gathers the sample, has saved equipment input, human cost and time cost, can produce tens of thousands of times or more sample in the same time. And because the image training samples in the scheme are obtained according to the three-dimensional texture model of the target object, the image training samples and the target object belong to the same type and have higher fidelity. Therefore, compared with an image training sample captured from the Internet, the method can simplify the sample processing process, thereby reducing the problem of sample quality reduction caused by manual operation in the sample processing process and being beneficial to subsequent training of a deep learning network.

It should be noted that: in the image training sample generation apparatus provided in the above embodiment, when generating an image training sample, only the division of each functional module is illustrated, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the above described functions. In addition, the image training sample generation apparatus and the image training sample generation method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail, and are not described herein again.

Fig. 8 is a block diagram of a computer device 800 according to an embodiment of the present disclosure. The computer device 800 may be a smart phone, a tablet computer, a notebook computer, or a desktop computer, etc.

Generally, the computer device 800 includes: a processor 801 and a memory 802.

The processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 801 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 801 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 801 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 801 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 802 may include one or more computer-readable storage media, which may be non-transitory. Memory 802 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 802 is used to store at least one instruction for execution by processor 801 to implement the image training sample generation method provided by method embodiments herein.

In some embodiments, the computer device 800 may further optionally include: a peripheral interface 803 and at least one peripheral. The processor 801, memory 802 and peripheral interface 803 may be connected by bus or signal lines. Various peripheral devices may be connected to peripheral interface 803 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 804, a touch screen display 805, a camera 806, an audio circuit 807, a positioning component 808, and a power supply 809.

The peripheral interface 803 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 801 and the memory 802. In some embodiments, the processor 801, memory 802, and peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 801, the memory 802, and the peripheral interface 803 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.

The Radio Frequency circuit 804 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 804 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 804 converts an electrical signal into an electromagnetic signal to be transmitted, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 804 may communicate with other computer devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 804 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 805 is a touch display, the display 805 also has the ability to capture touch signals on or above the surface of the display 805. The touch signal may be input to the processor 801 as a control signal for processing. At this point, the display 805 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 805 may be a front panel disposed on the computer device 800; in other embodiments, the display 805 may be at least two separate displays disposed on different surfaces of the computer device 800 or in a folded design; in other embodiments, the display 805 may be a flexible display, disposed on a curved surface or on a folded surface of the computer device 800. Even further, the display 805 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 805 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-emitting diode), and the like.

The camera assembly 806 is used to capture images or video. Optionally, camera assembly 806 includes a front camera and a rear camera. Generally, a front camera is disposed on a front panel of a computer apparatus, and a rear camera is disposed on a rear surface of the computer apparatus. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 806 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 801 for processing or inputting the electric signals to the radio frequency circuit 804 to realize voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and located at different locations on the computer device 800. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 801 or the radio frequency circuit 804 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 807 may also include a headphone jack.

The Location component 808 is used to locate the current geographic Location of the computer device 800 to implement navigation or LBS (Location Based Service). The positioning component 808 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, or the galileo System in russia.

A power supply 809 is used to power the various components in the computer device 800. The power supply 809 can be ac, dc, disposable or rechargeable. When the power supply 809 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the computer device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyro sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815 and proximity sensor 816.

The acceleration sensor 811 may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the computer apparatus 800. For example, the acceleration sensor 811 may be used to detect the components of the gravitational acceleration in three coordinate axes. The processor 801 may control the touch screen 805 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 811. The acceleration sensor 811 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 812 may detect a body direction and a rotation angle of the computer device 800, and the gyro sensor 812 may cooperate with the acceleration sensor 811 to acquire a 3D motion of the user with respect to the computer device 800. From the data collected by the gyro sensor 812, the processor 801 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 813 may be disposed on the side bezel of computer device 800 and/or underneath touch display 805. When the pressure sensor 813 is arranged on the side frame of the computer device 800, the holding signal of the user to the computer device 800 can be detected, and the processor 801 performs left-right hand identification or shortcut operation according to the holding signal collected by the pressure sensor 813. When the pressure sensor 813 is disposed at a lower layer of the touch display screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 805. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 814 is used for collecting a fingerprint of the user, and the processor 801 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user according to the collected fingerprint. Upon identifying that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying for and changing settings, etc. Fingerprint sensor 814 may be disposed on the front, back, or side of computer device 800. When a physical key or vendor Logo is provided on the computer device 800, the fingerprint sensor 814 may be integrated with the physical key or vendor Logo.

The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of the touch screen 805 based on the ambient light intensity collected by the optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 805 is increased; when the ambient light intensity is low, the display brightness of the touch display 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of the camera assembly 806 based on the ambient light intensity collected by the optical sensor 815.

A proximity sensor 816, also known as a distance sensor, is typically provided on the front panel of the computer device 800. The proximity sensor 816 is used to capture the distance between the user and the front of the computer device 800. In one embodiment, the processor 801 controls the touch display 805 to switch from a bright screen state to a dark screen state when the proximity sensor 816 detects that the distance between the user and the front face of the computer device 800 is gradually reduced; when the proximity sensor 816 detects that the distance between the user and the front of the computer device 800 is gradually increasing, the touch display 805 is controlled by the processor 801 to switch from a breath-screen state to a bright-screen state.

Those skilled in the art will appreciate that the configuration illustrated in FIG. 8 is not intended to be limiting of the computer device 800 and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components may be employed.

In some embodiments, a computer-readable storage medium is also provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the image training sample generation method in the above embodiments. For example, the computer readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It is noted that the computer-readable storage medium referred to herein may be a non-volatile storage medium, in other words, a non-transitory storage medium.

It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

That is, in some embodiments, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of the image training sample generation method described above.

The above-mentioned embodiments are provided not to limit the present application, and any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An image training sample generation method, characterized in that the method comprises:

acquiring a three-dimensional texture model of a target object;

2. The method of claim 1, wherein obtaining the three-dimensional texture model of the target object comprises:

creating a three-dimensional model of the target object;

3. The method of claim 1, wherein when the model transformation comprises the texture transformation, the model transforming the three-dimensional texture model to obtain a plurality of three-dimensional transformation models comprises:

4. The method of claim 1, wherein when the model transformation comprises the model deformation, the model transforming the three-dimensional texture model to obtain a plurality of three-dimensional transformation models comprises:

acquiring a plurality of groups of structure editing parameters;

5. The method of any of claims 1-4, wherein generating one or more two-dimensional images corresponding to each of the plurality of three-dimensional transformation models comprises:

acquiring reference camera parameters;

6. An image training sample generation apparatus, characterized in that the apparatus comprises:

7. The apparatus of claim 6, wherein the obtaining module comprises:

8. The apparatus of claim 6, wherein when the model transform comprises the texture transform, the transform module comprises:

9. The apparatus of claim 6, wherein when the model transformation comprises the model deformation, the transformation module comprises:

10. The apparatus according to any one of claims 6-9, wherein the generating means comprises: