CN111161387A

CN111161387A - Method and system for synthesizing image in stacked scene, storage medium and terminal equipment

Info

Publication number: CN111161387A
Application number: CN201911408992.0A
Authority: CN
Inventors: 易建军; 张雅君; 谷彦颉; 张佳豪; 田杰; 王晓蕾; 盛涛; 郑金华
Original assignee: East China University of Science and Technology; Shanghai Composite Material Science and Technology Co Ltd
Current assignee: East China University of Science and Technology; Shanghai Composite Material Science and Technology Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-15
Anticipated expiration: 2039-12-31
Also published as: CN111161387B

Abstract

The application provides a method and a system for synthesizing images in a stacked scene, a storage medium and a terminal device. The utility model provides a through OpenGL and bull physics engine, the relation of piling up between the object is fully considered in the synthetic process of picture, simulate the pile up position appearance of part under the real condition, construct the pixel level annotation image that is used for degree of depth study fast, very big reduction the work load of manual annotation image, and can change the scale that generates the image data set through changing corresponding parameter, the number of the part that single image contains, the threshold value of object shading rate etc. make and pile up under the scene and synthesize image data and have extremely strong expansibility.

Description

Method and system for synthesizing image in stacked scene, storage medium and terminal equipment

Technical Field

The present application relates to the field of image synthesis, and in particular, to a method and system for synthesizing an image in a stacked scene, a storage medium, and a terminal device.

Background

With the continuous improvement of computer computing capability, various deep learning models are more and more widely applied. The deep learning model for image processing is one of deep learning models having an important role, and the deep learning model for image processing has a very important position.

When the deep learning models are trained, a large number of image samples need to be acquired, that is, a target object is shot at various angles and positions, and then a large number of images are acquired, wherein the target object is an object which needs to be actually detected, for example, an object which needs to be grabbed by a mechanical arm, a license plate of a vehicle and the like. In these images, the position of the target object needs to be marked, and the marked image is used as an image sample for training the deep learning model.

The method for labeling the position of the target object is generally manual labeling, that is, in the acquired image, the position of the target object is determined through human eyes, and then labeling is performed to obtain an image book. It can be seen that this approach is very labor and time consuming and inefficient for labeling.

Therefore, the present application provides a method and a system for synthesizing images in a stacked scene, a storage medium, and a terminal device to solve the above problems.

Content of application

The embodiment of the application provides a method for synthesizing images in a stacked scene, which comprises the following steps: establishing a model file; importing the model file to a three-dimensional simulation scene through OpenGL; acquiring a stacking scene in the three-dimensional simulation scene through a bull physical engine; assigning pose information to the model of the model file; loading a virtual camera to the three-dimensional simulation scene; and drawing the image.

In some embodiments, the step of creating a model file comprises mapping the model by UV mapping to texture the surface of the model.

In some embodiments, the step of importing the model file into the three-dimensional simulation scene through OpenGL includes importing the model file into a world coordinate system of the three-dimensional simulation scene through OpenGL, converting a data structure of the model file into a universal data structure of asipm through an asipm library to obtain universal data, and loading the model file and the universal data into the three-dimensional simulation scene.

In some embodiments, in the step of obtaining the stacking scenario in the three-dimensional simulation scenario through the bull physical engine, a virtual container is built in the three-dimensional simulation scenario, and a virtual scenario conforming to a physical motion rule is created in the virtual container through the bull physical engine, so that the models can fall from a fixed point according to gravity and collide with each other to form the stacking scenario.

In some embodiments, the pose information is six degree of freedom pose information that includes a rotation matrix R and a translational vector t.

In some embodiments, in the step of loading a virtual camera into the three-dimensional simulation scene, the virtual camera converts pose information of the model in the world coordinate system into pose information in the virtual camera coordinate system.

In some embodiments, the information of the image includes an RGB map, a segmentation annotation map, and a depth map.

The embodiment of the application provides a system for synthesizing images in a stacked scene, which comprises: the establishing module is used for establishing a model file; the import module is used for importing the model file into a three-dimensional simulation scene through OpenGL; the obtaining module is used for obtaining a stacking scene in the three-dimensional simulation scene through a bull physical engine; the distribution module is used for distributing pose information to the model of the model file; the loading module is used for loading a virtual camera to the three-dimensional simulation scene; and a rendering module for rendering the image.

The embodiment of the application provides a storage medium, wherein a plurality of instructions are stored in the storage medium, and the instructions are suitable for being loaded by a processor to execute the method for synthesizing the images in the stacking scene.

The embodiment of the application provides a terminal device, which comprises a processor and a memory, wherein the processor is electrically connected with the memory, the memory is used for storing instructions and data, and the processor is used for executing the steps in the method for synthesizing the images under the stacking scene.

According to the method and the system for synthesizing the images in the stacking scene, the storage medium and the terminal device, through the OpenGL and the bull physical engine, the stacking relation among objects is fully considered in the synthesis process of the images, the stacking pose of the parts under the real condition is simulated, the pixel-level labeled images for deep learning are quickly constructed, the workload of manually labeling the images is greatly reduced, the scale of an image data set can be changed by changing corresponding parameters, the number of the parts contained in a single image, the threshold value of the object shielding rate and the like are changed, and the synthesized image data in the stacking scene has strong expansibility.

Drawings

The technical solution and other advantages of the present application will become apparent from the detailed description of the embodiments of the present application with reference to the accompanying drawings.

Fig. 1 is a schematic flowchart illustrating steps of a method for synthesizing an image in a stacked scene according to an embodiment of the present disclosure.

Fig. 2 is a schematic structural diagram of a system for synthesizing images in stacked scenes according to an embodiment of the present disclosure.

Fig. 3 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," and the like in the description and in the claims of the present application and in the above-described drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the objects so described are interchangeable under appropriate circumstances. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.

In particular embodiments, the drawings discussed below and the various embodiments used to describe the principles of the present disclosure are by way of illustration only and should not be construed to limit the scope of the present disclosure. Those skilled in the art will understand that the principles of the present application may be implemented in any suitably arranged system. Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. Further, a terminal according to an exemplary embodiment will be described in detail with reference to the accompanying drawings. Like reference symbols in the various drawings indicate like elements.

The terminology used in the detailed description is for the purpose of describing particular embodiments only and is not intended to be limiting of the concepts of the present application. Unless the context clearly dictates otherwise, expressions used in the singular form encompass expressions in the plural form. In the present specification, it will be understood that terms such as "including," "having," and "containing" are intended to specify the presence of the features, integers, steps, acts, or combinations thereof disclosed in the specification, and are not intended to preclude the presence or addition of one or more other features, integers, steps, acts, or combinations thereof. Like reference symbols in the various drawings indicate like elements.

Referring to FIG. 1, the present embodiment provides a method for synthesizing images in stacked scenes, including steps S11-S16.

Step S11, a model file is created. In this embodiment, the model is mapped by UV mapping so that the surface of the model acquires texture.

Specifically, in Solidworks, the ratio of 1: 1, a 3D model of the object is built, 1mm chamfers are formed at some object edges to fit the illumination results in the real scene, and then obj files are exported for post-processing.

If the object surface has texture, the model needs to be further mapped. Model files are imported into 3DS MAX software, and bitmaps are selected in texture rendering by means of UV mapping to be applied to the maps. The selection of the map has two ideas, namely finding the map made of similar metal materials in a picture library or directly shooting the appearance of the actually purchased part to obtain the map. And after the mapping rendering is finished, exporting a standard 3D model file format (. obj), a material library file (. mtl) and a corresponding mapping file (. tga). The three files contain all automatically generated vertex coordinates, vertex normals and texture coordinates, model data and material information, such as model colors, diffuse reflection maps and mirror light maps, which are in one-to-one correspondence, so that data information is not lost in a post-importing scene.

And step S12, importing the model file into a three-dimensional simulation scene through OpenGL. In this embodiment, the model file is imported into a world coordinate system of a three-dimensional simulation scene through OpenGL, a data structure of the model file is converted into a universal data structure of asipm through an asipm library to obtain universal data, and the model file and the universal data are loaded into the three-dimensional simulation scene.

Specifically, when a model is imported into an OpenGL three-dimensional simulation scene, it is necessary to parse a model file exported in the preliminary preparation work and extract all useful information, and store them in a format that can be understood by OpenGL. The file formats of the models are many, each exporting model data in their own way, and in the present application an asimp model import library is used that is capable of importing many different model file formats and loading all the model data into the asimp's common data structure. When asimp finishes loading the model, it may extract all the required data from the asimp's data structure. Since the data structure of asimp remains unchanged, it can be abstracted from these different file formats, regardless of the imported file format, to access the desired data in the same manner.

When using the asimp import model, the entire model will typically be loaded into the scene object, which will contain the imported model and all the data in the scene. Asimp loads a scene into a series of nodes, where a node contains an index of the data stored in the scene object, and a node may contain any number of children.

And step S13, acquiring a stacking scene in the three-dimensional simulation scene through the bull physics engine. In this embodiment, a virtual container is built in the three-dimensional simulation scene, and a virtual scene conforming to a physical motion rule is created in the virtual container through a bull physical engine, so that the models can fall from fixed points according to gravity and collide with each other to form the stacked scene.

Specifically, after the model file is loaded to the OpenGL three-dimensional simulation scene by using the asimp library, the model is only loaded to a preset fixed position in the scene, and elements such as gravity, collision, deformation and the like in the real world cannot be simulated. A virtual scene which accords with a physical motion rule is created in a container (similar to an industrial Bin) built in an OpenGL scene based on a Bullet physical engine.

Bullet is an open source physical engine, written in C + +. The use of the Bullet physics engine can ensure that the physical impossibility of stacking is avoided, and the authenticity of image output is ensured. And then may combine this information to render in OpenGL and output the final image. The order of using the bulleted physics engine in conjunction with OpenGL code includes: (1) initializing a physical engine; (2) loading of goods; (3) stepping of an engine; (4) unloading the goods; (5) destruction of the engine.

Meanwhile, the json output file in the data set also comprises IDs (identity) corresponding to the models, pose information of six degrees of freedom and other information required to be used for subsequent segmentation identification or part grabbing. And the Bullet can calculate and output physical information data such as the pose of the object in the process, and assigns an image segmentation ID corresponding to the model. Namely, when the information data is output, the final state of the model reserved in the engine can be directly accessed to obtain the related information data, and convenience is provided for the subsequent steps.

And step S14, allocating pose information to the model of the model file. In this embodiment, the pose information is pose information with six degrees of freedom, and the pose information with six degrees of freedom includes a rotation matrix R and a translational vector t.

Specifically, each part in the virtual scene has six-degree-of-freedom pose information of the model under a world coordinate system, and the information plays a vital role in coordinate system conversion, subsequent deep learning identification, point cloud registration and capturing. In the application, 6-degree-of-freedom pose information of a part is described by a translation vector t and a rotation matrix R.

In a three-dimensional scene, when a point P (x, y, z) is rotated around the x, y, z axis, the transformation of the point P (x ', y ', z ') is as follows:

the translation of the rotation about the x-axis is shown in equation (1).

The translation of the rotation about the y-axis is shown in equation (2).

The translation of the rotation about the z-axis is shown in equation (3).

The above rotation matrix R is thus shown in equation (4).

The calculation expression of the three-dimensional translation transformation is obtained by left-multiplying the vector representing the position coordinates by the translation matrix as shown in formula (5).

The above-mentioned translation vector t is thus shown in equation (6).

The pose of the model in the scene is described by a rotation matrix R and a translation vector t, and each point in the model can realize the conversion from the midpoint of the model to a world coordinate system through Rx + t. And after the models are loaded, collision detection is carried out through a Bullet physical engine to obtain a stacking scene, and the Bullet physical engine also reserves the position information and the pose information of each model. The translation vector t is the position of the current model, the rotation matrix corresponds to pose information, and the pose information only needs to be directly output from information retained by the engine.

And step S15, loading a virtual camera to the three-dimensional simulation scene. In this embodiment, the virtual camera converts the pose information of the model in the world coordinate system into the pose information in the virtual camera coordinate system.

Specifically, similar to the real world, in order to obtain an image of the stacked scene as a data set, a camera needs to be configured at a certain position in the virtual three-dimensional simulation scene, and the stacked scene needs to be photographed to obtain the image.

OpenGL itself does not have the concept of a camera, but it can simulate a camera by moving all objects in a scene in opposite directions, creating a sense of movement, rather than the scene moving. In a popular way, the camera becomes the position of the current viewport in the window, the scene rendered by the viewport in the window becomes the scene shot by the camera, and the current scene is stored as an image to be output, namely, the whole content of scene drawing after the current scene is read is finished.

Configuring a camera in OpenGL to change the position of the view port of the scene with the input of the external device, that is, configuring a FPS style camera so that it can move freely in the 3D scene. Although the camera configured in OpenGL can realize free movement, there is a specific fixed position requirement when acquiring the virtual data set finally constructed. Therefore, the final output has two project files, namely an observer and a generator, the observer has the characteristic of freely moving the position of the camera, the generator has no interaction after the initialization of parameters, and the file of the virtual data set is automatically constructed. The initial position and orientation of the camera needs to be set in advance at a suitable position, in this experiment at a position high just above the container.

In step S16, an image is drawn. In this embodiment, the information of the image includes an RGB map, a segmentation label map, and a depth map.

Specifically, the rendered scene in the current viewport needs to be saved as a picture. During the experiment, various values of the buffer area, such as depth, color and the like, are read by calling functions in OpenGL.

The tasks at this stage are the storage of RGB maps, segmentation labels, depth maps and other extension elements. The approximate storage process of these images has similar structural order, and there is a slight difference between drawing different images. The rough process is as follows: (1) specifying a buffer to be read that indicates that the display mode will apply to the forward facing surface of the object (i.e., the surface that the object can see); (2) reading the cache appointed in the previous step, starting the cache at this time, and storing the sequence of the color and the depth in an array by selecting different parameters in the step; (3) then distributing a new bitmap to bitmap; (4) writing the color or the depth buffer stored in the array on each pixel point corresponding to the bitmap according to the sequence of rows and columns; (5) storing the bitmap, and returning the information whether the bitmap is successfully stored; (6) and finally, releasing the bitmap and deleting the array to release the memory, otherwise, causing memory leakage.

In the experiment, an RGB (red, green and blue) graph, a segmentation annotation graph and a depth graph of a stacked scene are saved. The method provided by the application can be used for rapidly extracting the three types of images, but is not limited to the three types.

Referring to fig. 2, the present embodiment provides a system for synthesizing images in a stacked scene, which includes a creating module 11, an importing module 12, an obtaining module 13, a distributing module 14, a loading module 15, and a drawing module.

The building module 11 is used for building a model file. In this embodiment, the model is mapped by UV mapping so that the surface of the model acquires texture.

The import module 12 is configured to import the model file into the three-dimensional simulation scene through OpenGL. In this embodiment, the model file is imported into a world coordinate system of a three-dimensional simulation scene through OpenGL, a data structure of the model file is converted into a universal data structure of asipm through an asipm library to obtain universal data, and the model file and the universal data are loaded into the three-dimensional simulation scene.

The obtaining module 13 is configured to obtain a stacking scene in the three-dimensional simulation scene through a bull physics engine. In this embodiment, a virtual container is built in the three-dimensional simulation scene, and a virtual scene conforming to a physical motion rule is created in the virtual container through a bull physical engine, so that the models can fall from fixed points according to gravity and collide with each other to form the stacked scene.

The assigning module 14 is configured to assign pose information to the model of the model file. In this embodiment, the pose information is pose information with six degrees of freedom, and the pose information with six degrees of freedom includes a rotation matrix R and a translational vector t.

the translation of the rotation about the x-axis is shown in equation (1).

The translation of the rotation about the y-axis is shown in equation (2).

The translation of the rotation about the z-axis is shown in equation (3).

The above rotation matrix R is thus shown in equation (4).

The above-mentioned translation vector t is thus shown in equation (6).

The loading module 15 is configured to load a virtual camera into the three-dimensional simulation scene. In this embodiment, the virtual camera converts the pose information of the model in the world coordinate system into the pose information in the virtual camera coordinate system.

The rendering module 16 is used to render images. In this embodiment, the information of the image includes an RGB map, a segmentation label map, and a depth map.

Referring to fig. 3, an embodiment of the present application further provides a terminal device 200, where the terminal device 200 may be a computer or other devices. As shown in fig. 3, the terminal device 200 includes a processor 201 and a memory 202. The processor 201 is electrically connected to the memory 202.

The processor 201 is a control center of the terminal device 200, connects various parts of the entire terminal device by using various interfaces and lines, and performs various functions of the terminal device and processes data by running or loading an application program stored in the memory 202 and calling data stored in the memory 202, thereby performing overall monitoring of the terminal device.

In this embodiment, the terminal device 200 is provided with a plurality of memory partitions, the plurality of memory partitions includes a system partition and a target partition, the processor 201 in the terminal device 200 loads instructions corresponding to processes of one or more application programs into the memory 202 according to the following steps, and the processor 201 runs the application programs stored in the memory 202, so as to implement various functions:

establishing a model file;

importing the model file to a three-dimensional simulation scene through OpenGL;

acquiring a stacking scene in the three-dimensional simulation scene through a bull physical engine;

assigning pose information to the model of the model file;

loading a virtual camera to the three-dimensional simulation scene; and

and drawing the image.

In specific implementation, the above modules may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and specific implementation of the above modules may refer to the foregoing method embodiments, which are not described herein again.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by instructions controlling associated hardware, and the instructions may be stored in a computer-readable storage medium and loaded and executed by a processor. To this end, the present application provides a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any method for synthesizing images in stacked scenes provided by the embodiments of the present application.

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium may execute the steps in the method for synthesizing an image in any stacked scene provided in the embodiments of the present application, beneficial effects that can be achieved by the method for synthesizing an image in any stacked scene provided in the embodiments of the present application may be achieved, for details, see the foregoing embodiments, and are not described herein again. The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The utility model provides a through OpenGL and bull physics engine, the relation of piling up between the object is fully considered in the synthetic process of picture, simulate the pile up position appearance of part under the real condition, construct the pixel level annotation image that is used for degree of depth study fast, very big reduction the work load of manual annotation image, and can change the scale that generates the image data set through changing corresponding parameter, the number of the part that single image contains, the threshold value of object shading rate etc. make and pile up under the scene and synthesize image data and have extremely strong expansibility.

The method and system for synthesizing images in a stacked scene, the storage medium, and the terminal device provided in the embodiments of the present application are described in detail above, and a specific example is applied in the description to explain the principle and the implementation of the present application, and the description of the embodiments above is only used to help understand the technical solution and the core idea of the present application; those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present disclosure as defined by the appended claims.

Claims

1. A method for composing an image in a stacked scene, comprising:

establishing a model file;

assigning pose information to the model of the model file;

loading a virtual camera to the three-dimensional simulation scene; and

and drawing the image.

2. A method of constructing as claimed in claim 1, wherein in the step of creating a model file, the step of mapping the model by UV mapping is included to texture the surface of the model.

3. The building method of claim 1, wherein the step of importing the model file into the three-dimensional simulation scene through OpenGL comprises importing the model file into a world coordinate system of the three-dimensional simulation scene through OpenGL, converting a data structure of the model file into a universal data structure of asipm through an asipm library to obtain universal data, and loading the model file and the universal data into the three-dimensional simulation scene.

4. The construction method according to claim 1, wherein in the step of obtaining the stack scene in the three-dimensional simulation scene through the bull physics engine, a virtual container is built in the three-dimensional simulation scene, and a virtual scene conforming to a physical motion rule is created in the virtual container through the bull physics engine, so that the models can fall from a fixed point according to gravity and collide with each other to form the stack scene.

5. The construction method according to claim 1, wherein the pose information is six-degree-of-freedom pose information including a rotation matrix R and a translational vector t.

6. The construction method according to claim 1, wherein in the step of loading a virtual camera to the three-dimensional simulation scene, the virtual camera converts pose information of the model in the world coordinate system into pose information in the virtual camera coordinate system.

7. The construction method according to claim 1, wherein the information of the image includes an RGB map, a segmentation annotation map, and a depth map.

8. A system for composing an image in a stacked scene, comprising:

the establishing module is used for establishing a model file;

the import module is used for importing the model file into a three-dimensional simulation scene through OpenGL;

the obtaining module is used for obtaining a stacking scene in the three-dimensional simulation scene through a bull physical engine;

the distribution module is used for distributing pose information to the model of the model file;

the loading module is used for loading a virtual camera to the three-dimensional simulation scene; and

and the drawing module is used for drawing the image.

9. A storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor for performing the method of composing images under stacked scenes according to any one of claims 1 to 7.

10. A terminal device comprising a processor and a memory, the processor being electrically connected to the memory, the memory being configured to store instructions and data, the processor being configured to perform the steps of the method for composing an image under stacked scenes according to any one of claims 1 to 7.