WO2024008081A1 - 图像生成方法和模型训练方法 - Google Patents
图像生成方法和模型训练方法 Download PDFInfo
- Publication number
- WO2024008081A1 WO2024008081A1 PCT/CN2023/105730 CN2023105730W WO2024008081A1 WO 2024008081 A1 WO2024008081 A1 WO 2024008081A1 CN 2023105730 W CN2023105730 W CN 2023105730W WO 2024008081 A1 WO2024008081 A1 WO 2024008081A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- instance
- annotation
- synthesized
- objects
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 131
- 238000000034 method Methods 0.000 title claims abstract description 117
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 15
- 230000011218 segmentation Effects 0.000 claims description 94
- 238000002372 labelling Methods 0.000 claims description 49
- 230000015572 biosynthetic process Effects 0.000 claims description 24
- 238000003786 synthesis reaction Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 10
- 238000013473 artificial intelligence Methods 0.000 description 9
- 238000013135 deep learning Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000003062 neural network model Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 239000003086 colorant Substances 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000019901 Anxiety disease Diseases 0.000 description 1
- 230000036506 anxiety Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Definitions
- the present disclosure relates to the field of artificial intelligence, and in particular, to an image generation method and a model training method.
- artificial intelligence technology With the development of artificial intelligence technology, its application fields are becoming wider and wider. For example, with the popularity of online shopping, taking the logistics field as an example, artificial intelligence technology can realize intelligent sorting of items that does not rely on artificial intelligence to improve the efficiency of manual sorting and labor costs.
- an image generation method including:
- the instance group includes at least one first instance.
- the first instance includes an image of a single object.
- the image of the single object includes a point cloud image constructed based on an annotation image and a depth image corresponding to the annotation image. ;
- a virtual image is obtained by synthesis, and the objects in the virtual image include objects in the image to be synthesized and objects corresponding to the instance group.
- the present disclosure also provides a model training method that uses virtual images obtained by the image generation method as described in any one of the image generation methods as training samples, and the training method includes:
- the model For the first training of the model, select the collected real images and the annotations corresponding to the real images as training samples to train the model; wherein, the objects placed in the image content of the real images are all real objects;
- images and corresponding annotations of the images are randomly selected from the collected real images and synthesized virtual images as training samples to train the model.
- the present disclosure also provides an image generation device, including:
- An acquisition module configured to acquire an image to be synthesized, where the objects in the image to be synthesized at least include real objects;
- the acquisition module is also used to acquire an instance group.
- the instance group includes at least one first instance.
- the first instance includes an image of a single object.
- the image of a single object includes a corresponding image based on an annotation image and the annotation image.
- Point cloud image constructed from depth image;
- a synthesis module configured to synthesize and obtain a virtual image based on the image to be synthesized and the instance group, where the objects in the virtual image include objects in the image to be synthesized and objects corresponding to the instance group.
- the present disclosure also provides an electronic device, including:
- processors one or more processors
- a memory configured to store executable instructions that, when executed by the one or more processors, implement the image generation method or the model training method as described above.
- the present disclosure also provides a chip.
- the chip includes a memory and a processor. Codes and data are stored in the memory.
- the memory is coupled to the processor.
- the processor runs the program in the memory so that the The chip is used to perform any of the above image generation methods or model training methods.
- the present disclosure also provides a program product, including: a computer program, which when the program product is run on a computer, causes the computer to execute any one of the above image generation methods or model training methods.
- the present disclosure also provides a computer program, which when the computer program is executed by a processor, is used to perform any one of the above image generation methods or model training methods.
- the present disclosure provides an image generation method and a model training method.
- the image generation method is obtained by obtaining at least an image to be synthesized including a real object and an instance group including at least one first instance, and synthesizing at least one first instance with the image to be synthesized.
- the virtual image wherein the instance group includes at least one first instance, the first instance includes an image of a single object, and the image of the single object includes a point cloud image constructed based on an annotation image and a depth image corresponding to the annotation image.
- the virtual images obtained by this solution can be used as training samples. Compared with obtaining real images as training samples through acquisition means, the number of training samples obtained by synthesizing virtual images is more sufficient, and the sample scenes are richer, which can be efficiently and conveniently Achieve effective expansion of training samples.
- Figure 1 is a schematic diagram of an application scenario of an example of this disclosure
- FIG. 2 is a schematic flowchart of an image generation method provided by Embodiment 1 of the present disclosure
- FIG. 3 is a schematic flowchart of another image generation method provided by Embodiment 1 of the present disclosure.
- FIG. 4 is a schematic flowchart of yet another image generation method provided by Embodiment 1 of the present disclosure.
- Figure 5 is an example of a scene for image synthesis
- FIG. 6 is a schematic flowchart of yet another image generation method provided by Embodiment 1 of the present disclosure.
- Figure 7 is a schematic flow chart of a model training method provided in Embodiment 3 of the present disclosure.
- Figure 8 is a network architecture corresponding to the image database construction method provided by the embodiment of the present disclosure.
- Figure 9 is a schematic flowchart of an image generation method provided in Embodiment 4 of the present disclosure.
- Figure 10 is a schematic flow chart of another image generation method provided by Embodiment 4 of the present disclosure.
- Figure 11 shows a marking method that uses color to distinguish different areas
- Figure 12 is another way of labeling different objects by color
- Figure 13 is a schematic flow chart of yet another image generation method provided by Embodiment 4 of the present disclosure.
- Figure 14 is a schematic flowchart of yet another image generation method provided in Embodiment 4 of the present disclosure.
- FIG. 15 is a schematic flowchart of yet another image generation method provided by Embodiment 4 of the present disclosure.
- Figure 16 is another way of labeling different objects by color
- FIG 17 is a schematic flowchart of yet another image generation method provided by Embodiment 4 of the present disclosure.
- Figure 18 is a schematic flowchart of yet another image generation method provided by Embodiment 4 of the present disclosure.
- Figure 19 is a schematic structural diagram of an image generation device provided in Embodiment 5 of the present disclosure.
- Figure 20 is a schematic structural diagram of a model training device provided in Embodiment 5 of the present disclosure.
- Figure 21 is a schematic structural diagram of an electronic device provided in Embodiment 5 of the present disclosure.
- FIG. 22 is a schematic structural diagram of another electronic device provided in Embodiment 5 of the present disclosure.
- artificial intelligence is usually implemented through machine learning, which is trained based on learning samples to establish a complete intelligent model and realize intelligent operation of the machine.
- machine learning which is trained based on learning samples to establish a complete intelligent model and realize intelligent operation of the machine.
- a large number of images of objects are used as training samples to establish an intelligent model that can intelligently identify objects and sort them, thereby realizing intelligent sorting of objects by machines.
- one means of obtaining a sample is to collect an image of a real object through a collection device. Since the accuracy of the model depends on a large number of training samples, when a large number of samples need to be obtained, the above methods have certain problems, such as high cost and long time consumption. Moreover, limited by the number of real objects, the number of samples that can be obtained will also be limited, which is not conducive to sample expansion.
- FIG. 1 is a schematic diagram of an application scenario of an example of this disclosure. As shown in Figure 1, the image to be synthesized and the instance group can be obtained, and a virtual image is obtained by compositing the image to be synthesized and the instance group. This virtual image can subsequently be used as a training sample for training artificial intelligence models.
- Deep learning tasks are usually able to quickly complete the training of complex models.
- sufficient image data sets are required as training samples, and the size and quality of the training samples directly determine the performance of the final image understanding engine.
- image databases usually use manual collection to manually collect various sample images to build a sample database.
- the inventor found in the research that before generating the virtual image, the annotated image and the depth image corresponding to the annotated image can also be obtained.
- the annotated image and The corresponding depth image determines the depth information of the object's marked area. Based on the depth information of the object's marked area, a point cloud image of the object is generated, and then an image database is constructed based on the point cloud image of the object.
- the above process does not require manual collection of sample data, and can automatically obtain point cloud images based on annotated images and depth images, thereby building an image database based on point cloud images, improving the efficiency of building an image database, and reducing the labor cost of building an image database.
- the point cloud image can be obtained from the image database as the image of a single object in the instance group to provide original data support for the generated virtual image.
- FIG 2 is a schematic flowchart of an image generation method provided by Embodiment 1 of the present disclosure. As shown in Figure 2, the image generation method includes:
- Step 21 Obtain the image to be synthesized, and the objects in the image to be synthesized at least include real objects;
- Step 22 Obtain an instance group.
- the instance group includes at least one first instance.
- the first instance includes an image of a single object.
- the image of a single object includes a point cloud image constructed based on the annotation image and the depth image corresponding to the annotation image;
- Step 23 Synthesize and obtain a virtual image based on the image to be synthesized and the instance group.
- the objects in the virtual image include objects in the image to be synthesized and objects corresponding to the instance group.
- the execution subject of this embodiment is an image generation device.
- the image generation device can be implemented through a computer program, such as application software, etc.; or, it can also be implemented as a medium storing relevant computer programs, such as a U disk, cloud disk, etc.; Alternatively, it can also be implemented through a physical device integrated or installed with relevant computer programs, such as a chip, etc.
- the image to be synthesized refers to an image in which at least real objects exist.
- Real objects here refer to objects that actually exist.
- the method of image collection is not limited, for example, it can be collected through an image collection device such as a camera or camera.
- the number of real objects in the image to be synthesized is not limited, for example, it can include one or more.
- the objects in the image to be synthesized may also include objects that are not directly collected by the image acquisition device, for example, objects placed in the image through synthesis means.
- the first instance in the instance group may be one or multiple, that is to say, the objects added to the image to be synthesized may be one or multiple.
- Examples include images of a single object, and the single object image may be the above-mentioned point cloud image.
- multiple instances can be established in advance.
- an instance group can be selected from the multiple pre-established instances to be synthesized with the image to be synthesized.
- the method of creating an instance is not limited.
- the instance can be obtained by collecting an image of a single object through an image acquisition device.
- the instance can also be obtained by identifying an image of a single object from an image including multiple objects.
- there are many ways to identify a single object for example, the outer contour of a single object in an image can be identified based on the outer contour recognition technology of the object, and then an image of the single object can be obtained.
- the image of a single object may include a point cloud image constructed based on the annotation image and the depth image corresponding to the annotation image.
- the implementation of the point cloud image is described based on the following embodiments.
- step 22 may specifically include: obtaining an instance group from an instance database, where the instance database includes multiple instances. Combined with the scenario example, after each instance is created, the created instance can be stored in the instance database. When a virtual image needs to be generated later, the image to be synthesized can be obtained, at least one instance is selected from the instance database as an instance group, and the image to be synthesized and the instance group are synthesized to obtain a virtual image.
- an image to be synthesized including at least a real object and an instance group including at least one first instance are obtained, and a virtual image is obtained by synthesizing the at least one first instance with the image to be synthesized. Compared with the method of obtaining images by collecting real objects, the method of obtaining virtual images through synthesis is more efficient and convenient.
- the implementation scenarios of this embodiment are not limited.
- the final virtual image can be obtained by compositing multiple times.
- the implementation scene of step 21 is the scene synthesized for the first time, that is, no instance group has been added to the image to be synthesized at this time.
- the objects in the image to be synthesized during the initial synthesis may all be real objects.
- the image to be synthesized may be an image directly collected by an image acquisition device.
- step 21 specifically includes: obtaining the collected real image as the image to be synthesized.
- step 21 specifically includes: synthesizing the virtual image obtained from history as the image to be synthesized.
- Examples of scenes that have been synthesized multiple times First, obtain a real image through an image acquisition device. As the image to be synthesized for the first time, the objects in the real image are all real objects; obtain the instance group that needs to be added to the image to be synthesized, where The method of obtaining the instance group is not limited, including but not limited to random selection; then, the current image to be synthesized and the instance are combined to obtain the first synthesized virtual image. Subsequently, the virtual image is used as the image to be synthesized for the next synthesis, and the instance group is again selected for the second synthesis to obtain the second synthesized virtual image. By analogy, in each subsequent synthesis, the virtual image synthesized in the previous history is used as the image to be synthesized this time, and the instance group is obtained for synthesis until the final virtual image is obtained after multiple synthesis.
- a virtual image is obtained by obtaining at least an image to be synthesized including a real object and an instance group including at least one first instance, and synthesizing at least one first instance with the image to be synthesized, where the instance group At least one first instance is included.
- the first instance includes an image of a single object, and the image of the single object includes a point cloud image constructed based on an annotation image and a depth image corresponding to the annotation image.
- the virtual images obtained by this solution can be used as training samples. Compared with obtaining real images as training samples through acquisition means, the number of training samples obtained by synthesizing virtual images is more sufficient, and the sample scenes are richer, which can be efficiently and conveniently Achieve effective expansion of training samples.
- FIG. 3 is a schematic flowchart of another image generation method provided by Embodiment 1 of the present disclosure. Based on any example, step 23 includes:
- Step 31 Determine the placement position of the object corresponding to the instance group in the image to be synthesized
- Step 32 Detect whether the object corresponding to the instance group collides with other surrounding objects at the placement position;
- Step 33 If no collision occurs, the virtual image is synthesized by placing the object corresponding to the instance group at the placement position in the image to be synthesized.
- the placement position As an example, if all objects corresponding to the first instance do not collide with other objects, it is determined that the objects corresponding to the instance group do not collide at the current placement position; if there is an object corresponding to any first instance and other objects If a collision occurs, it is determined that the object corresponding to the instance group collides at the current placement position.
- the placement position of the object corresponding to the instance group in the image to be synthesized is adjusted, thereby improving the authenticity and reliability of the virtual image obtained by synthesis. Then, the model obtained by training with virtual images as samples has higher accuracy and reliability.
- Figure 4 is a schematic flowchart of yet another image generation method provided by Embodiment 1 of the present disclosure. After step 32, it also includes:
- Step 41 If a collision occurs, re-execute the step of determining the placement position of the object corresponding to the instance group in the image to be synthesized, until the number of collision detections reaches the preset first threshold, re-execute the step of obtaining the instance group, where The reacquired instance group is different from the previously acquired instance group.
- Figure 5 is an example of a scene for image synthesis.
- the instance group includes a cylinder and a camera. Determine the placement position this time including the placement position of the cylinder. Position 1, the camera is placed at position 2; then, detect whether the cylinder collides with other objects at position 1, and detects whether the camera collides with other objects at position 2; based on the example shown in the figure, if the cylinder A collision occurred, the camera did not. Since there are instances of collision, the placement position is re-determined.
- the placement position of the cylinder is re-determined as position 3 and the placement position of the camera is position 2.
- the collision between the cylinder and the camera and surrounding objects is detected again. Assume that the cylinder Neither the camera nor the object collides with other objects.
- the instance group and the image to be synthesized are synthesized to obtain the virtual image as shown in the figure.
- the placement position of the instance group can be updated by exchanging the positions of each first instance, or by changing part or The position of all first instances to update the placement of the instance group.
- the number of times the position is repeatedly determined does not exceed a predetermined first threshold.
- the first threshold is 20 times.
- an upper limit is set for the number of times the placement position is repeatedly determined. For a certain instance group, assuming that the number of collisions detected exceeds the first threshold, it is determined that the object in the instance group is waiting to be placed. There is no suitable placement in the composite image. Therefore, return to step 22 to reselect a new instance group, and the new instance group is different from the previous instance group. It should be noted that the difference here includes the case where the first instance in the instance group is partially different and the first instances in the instance group are all different.
- Scenario 1 The instance in the previously selected instance group is instance A, and the instance in the new instance group is instance B;
- Scenario 2 The instance in the previously selected instance group is instance B. A, the instances in the new instance group are instance B and other instances, and other instances do not include A;
- Scenario 3 The instances in the previously selected instance group are instances A, B, and C, and the instances in the new instance group are instances B, C, and D are the instances of the new instance group. Some instances are different from the instances in the previously selected instance group.
- Scenario 4 The instances in the previously selected instance group are instances A, B, and C.
- the new instances are The instances in the instance group are instances D and E, that is, the instances in the new instance group are completely different from the instances in the previously selected instance group.
- FIG. 6 is a schematic flowchart of yet another image generation method provided by Embodiment 1 of the present disclosure.
- determining the placement position of the object corresponding to the instance group in the image to be synthesized includes:
- Step 61 Obtain the placement height of the object corresponding to the instance group based on the average height of the objects placed in the image content of the image to be synthesized.
- the placement height is determined based on the average height and the predetermined fluctuation range;
- Step 62 Randomly select a position on the plane corresponding to the placement height as the placement position of the object corresponding to the instance group in the image to be synthesized.
- the placement position can first determine the placement height.
- the placement height can also be called the depth of the object corresponding to the instance group in the image to be synthesized.
- the placement height can be confirmed by calculating the average height of the existing objects in the image to be synthesized, and then using the average height as the benchmark, selecting a height fluctuation range, and using the height values within the fluctuation range as the objects corresponding to the instance group in the image to be synthesized.
- the placement height in .
- the fluctuation range of the height can be selected from 5mm to 10mm, that is, the fluctuation range of 5mm to 10mm above and below the average height can be used as the placement height of the object corresponding to the instance group in the image to be synthesized.
- a position is randomly selected on the horizontal plane and used as the placement position of the object corresponding to the instance group in the image to be synthesized.
- the number of objects in the image content of the virtual image does not exceed a predetermined second threshold.
- the second threshold may be preset, for example, the second threshold may be 30.
- the second threshold that specifies the number of objects can be set in advance.
- the second threshold is the maximum number of objects in the virtual image, and the second threshold can be optionally 30.
- the number of first instances in the instance group can be determined according to the number of objects in the image to be synthesized. For example, assuming that the number of objects in the image to be synthesized is 20, the first instance in the instance group can be set The number does not exceed 10.
- the second threshold can also be used to determine whether the composition is complete.
- each instance group synthesized includes multiple objects.
- first instance before each synthesis, it can be determined whether the sum of the two objects exceeds the second threshold based on the number of objects in the current image to be synthesized and the number of first instances in the current instance group. If the second threshold is exceeded, the number of first instances may be reduced.
- each synthesized instance group includes only one first instance, it can be detected before each synthesis whether the number of objects in the current image to be synthesized reaches a second threshold, and if it reaches the second threshold, then This image is used as the final image and will not be composited further.
- the quantity of the instance group is adjusted so that the adjusted sum of the quantities does not exceed the second threshold.
- a second threshold that specifies the number of objects can be set in advance.
- the second threshold is the maximum number of objects in the virtual image. After combining the image to be synthesized with the instance Before obtaining the virtual image, it may be first calculated whether the sum of the number of objects in the image to be synthesized and the number of objects in the instance group exceeds the second threshold. If the second threshold is not exceeded, the synthesis operation can be continued. If the second threshold is exceeded, the number of first instances in the instance group is adjusted to ensure that the sum of the number of objects in the image to be synthesized and the number of objects in the instance group will not exceeds the second threshold.
- the virtual image after the virtual image is synthesized, it also includes:
- the virtual image is used as the current image to be synthesized, and the step of obtaining the instance group is performed again until the number of objects placed in the image content of the currently obtained virtual image reaches the second threshold, or The instances in the current instance database are traversed.
- the number of objects in the virtual image can be counted. If the number of objects in the virtual image does not exceed the second threshold, the obtained virtual image can be used as a new image to be synthesized, a new instance group is selected from the instance database, and the new image to be synthesized and the new instance are combined into a new virtual image, and count the number of objects in the new virtual image again.
- the new virtual image will continue to be used as the image to be synthesized until the number of objects in the obtained virtual image reaches the second threshold, or even if the number of objects in the obtained virtual image reaches the second threshold, The number of objects does not reach the second threshold, but all instances are traversed until there are no new suitable instances for subsequent synthesis.
- the number of objects in the virtual image can be flexibly adjusted to improve the efficiency and authenticity of virtual image generation.
- the generated virtual images can be used as training samples.
- the first example also includes annotations corresponding to a single object, and the annotations include point cloud data and gripper annotations.
- the virtual images obtained in this embodiment can be used as training samples for the intelligent sorting model to achieve intelligent sorting that does not rely on manual labor.
- intelligent sorting usually uses intelligent models to control robotic arms to capture and sort items. Therefore, in one example, the virtual image can be applied to a gripper generation model, which is used to generate a gripper for grabbing the object based on the input object image, so that the robot arm can refer to the gripper generated by the model to execute the corresponding The pose realizes object grabbing.
- Each instance is pre-labeled for the samples required for model training.
- the labeling includes point cloud data and gripper labeling.
- objects in the virtual image synthesized based on the image to be synthesized and the instance carry annotations.
- the virtual images carrying annotations can be used for model training.
- memory training can be performed on the image of the object corresponding to each first instance in the virtual image and the label corresponding to the object.
- the label includes the gripper label and point cloud data for the object.
- the corresponding grippers can be generated based on the training memory to simulate the objects for grabbing. In this way, when targeting such objects, the grippers can be generated based on the outer contour of the objects to complete the grabbing of the objects.
- an image to be synthesized including at least a real object and an instance group including at least one first instance are obtained, and a virtual image is obtained by synthesizing the at least one first instance and the image to be synthesized.
- the virtual images obtained by this solution can be used as training samples. Compared with obtaining real images as training samples through acquisition means, the number of training samples obtained by synthesizing virtual images is more sufficient, and the sample scenes are richer, which can be efficiently and conveniently Achieve effective expansion of training samples.
- the present disclosure also provides a method for establishing a training sample library, where the training sample library at least includes virtual images obtained as in any of the above examples.
- a training sample library can be established, which mainly provides training sample data for model training.
- the training sample library of intelligent robotic arms includes a large number of virtual object images and corresponding annotations of virtual objects.
- the generated virtual images can be specially created to create a sample library, or can be added to the original sample library to form a training sample library including virtual objects and real objects.
- the training database includes a large number of images and annotations of single objects.
- the latest training sample corresponding to the generated gripper is stored in the training sample library.
- This embodiment uses a large number of images and annotations of virtual objects as training samples in the training sample library, and generates corresponding annotations for the images of real objects in the images to be synthesized and stores them together as training samples in the training sample library, so that The training sample library is richer and more complete, and the training data available is more diverse.
- FIG. 7 is a schematic flowchart of a model training method provided in Embodiment 3 of the present disclosure.
- the training method includes:
- Step 71 For the first training of the model, select the collected real images and the annotations corresponding to the real images as training samples to train the model; among them, the objects placed in the image content of the real images are all real objects;
- Step 72 For non-first-time training of the model, randomly select images and their corresponding annotations as training samples from the collected real images and synthesized virtual images to train the model.
- machine learning can first determine the model, and then train the model based on the training sample data.
- the model can simply be understood as a function.
- the training model uses existing training sample data and determines the parameters of the function through some optimization methods and other methods.
- the function after the parameters are determined is the result of the training, and finally the new data is Input it into the trained model to get the result.
- the model training of the robot can input the image of the object and the corresponding annotation as a training sample into the robot training model, and determine the parameters of the robot training model.
- the first training of the model should select images of real objects and the annotations corresponding to the real objects as training samples, and perform memory training on the images of real objects and the annotations corresponding to the objects.
- training samples can randomly select images and corresponding annotations from real images and virtual images for memory training.
- the model is used to generate a gripper corresponding to an object in the image to be processed based on the input image to be processed.
- the training model of the robotic arm determines the parameters of the training model based on the training samples
- the image of the object is used as the input sample of the model, and the corresponding gripper is generated based on the trained model for the input object image.
- the accuracy of the training model can be improved by completing model training on images and annotations of real objects and virtual objects respectively.
- the point cloud image Before the above step 22, the point cloud image also needs to be constructed, that is, it can be the construction of an image of a single object in the instance group. It should be understood that the point cloud image can be stored in an image database, as shown in Figure 8 ( Figure 8 shows the present disclosure
- the network architecture corresponding to the image database construction method is first described, including: terminal device 1 and server 2.
- the terminal device 1 communicates with the server 2.
- the user performs image annotation on the two-dimensional RGB image on the terminal device 1.
- the two-dimensional RGB image includes at least one object, such as annotating the ungraspable area of the object, the overlapping area of the object, and the unpressed area of the object in the two-dimensional RGB image.
- the labeled two-dimensional RGB image is the labeled image, in which each two-dimensional RGB image has its corresponding depth image.
- Server 2 obtains the annotated image and the depth image corresponding to the annotated image; determines the depth information of the object annotation area based on the annotated image and the corresponding depth image; generates a point cloud image of the object based on the depth information of the object annotation area; based on the point cloud image of the object Build an image database. Manual collection of sample data is required, but point cloud images can be automatically obtained based on annotated images and depth images, thereby building an image database based on point cloud images, improving the efficiency of building an image database, and reducing the labor cost of building an image database.
- FIG. 9 is a schematic flow chart of an image generation method provided in Embodiment 4 of the present disclosure.
- the execution subject of the image generation method provided in this embodiment is an image generation device, and the image generation device is located in an electronic device.
- the image generation method provided by this embodiment includes the following steps:
- Step 91 Obtain the annotated image and the depth image corresponding to the annotated image.
- the annotated image is a two-dimensional image.
- an annotation image and a depth image corresponding to the annotation image are obtained, where the annotation image is a two-dimensional image, and the area where the object is located is pre-marked in the two-dimensional image to form an annotation image, and the annotation image includes at least one object annotation area.
- the depth image (English: depth image), also known as the range image (English: range image), refers to an image in which the distance (depth) from the image collector to each point in the scene is used as a pixel value. It directly reflects The geometry of the visible surface of a scene.
- Step 92 Determine the depth information of the object labeling area based on the labeling image and the corresponding depth image.
- the depth information of the object's annotation area is determined based on the annotation image and the depth image corresponding to the annotation image.
- the depth information is a depth value. Specifically, the depth information is determined by the annotation pixel information.
- Step 93 Generate a point cloud image of the object based on the depth information of the object annotation area.
- a point cloud image of the object is generated based on the depth information of the object annotation area, and the point cloud image of the object can be used as a training sample for the neural network model.
- Step 94 Construct an image database based on the point cloud image of the object.
- an image database is constructed based on the point cloud image of the object, which is used as a neural network sample database, and the corresponding neural network model is trained using the point cloud image of the object.
- the image database includes instances in the above instance database.
- the annotated image and the depth image corresponding to the annotated image are obtained, the depth information of the object annotation area is determined based on the annotation image and the corresponding depth image, and a point cloud image of the object is generated based on the depth information of the object annotation area, so as to generate a point cloud image of the object according to the object annotation area.
- the image database is used to provide training samples for subsequent capture of deep learning. There is no need to manually collect sample data.
- Point cloud images can be automatically obtained based on annotated images and depth images, thereby building an image database based on point cloud images, improving the efficiency of building an image database and reducing The labor cost of building an image database.
- the depth information of the object annotation area can be determined in a variety of ways to obtain a point cloud image.
- semantic segmentation and instance segmentation are used as examples to explain in detail. For other implementation methods , this embodiment is not particularly limited.
- FIG. 10 is a schematic flowchart of another image generation method provided in Embodiment 4 of the present disclosure. Based on the first embodiment, this embodiment takes semantic segmentation as an example to illustrate the implementation method of determining the depth information of the object labeling area. Specifically, it includes the following steps:
- Step 101 Determine the object annotation area corresponding to the semantic segmentation annotation image.
- the standard image includes a semantic segmentation annotated image, where semantic segmentation refers to the process of assigning each pixel in the image to a class label.
- semantic segmentation refers to the process of assigning each pixel in the image to a class label.
- Semantic segmentation is considered to be a pixel-level image classification, and pixels belonging to the same class must be are grouped together, so semantic segmentation understands the image from the pixel level.
- the semantic segmentation annotated image is an annotated image that has been semantically segmented.
- Semantic segmentation annotated images include 2D semantic segmentation annotated images and 3D semantic segmentation annotated images.
- the method of generating annotated images can be any of the following: a two-dimensional image is subjected to 2D semantic segmentation to generate a 2D semantic segmentation annotated image; a two-dimensional image is subjected to 3D semantic segmentation to generate a 3D semantic segmentation annotated image; a two-dimensional image is subjected to 2D instances Segmentation generates 2D instance segmentation annotated images.
- the two-dimensional image is generated through 2D semantic segmentation to generate a 2D semantic segmentation annotation image: the ungraspable area of the object, the area where the object is pressed, and the area where the object is not pressed are marked in advance.
- different areas can be marked Fill different colors to distinguish, or fill different areas with different patterns to distinguish.
- Figure 11 is a marking method that uses color to distinguish different areas. For each surface object in the figure, the ungraspable area of the object, the area where the object is pressed, and/or the area where the object is not pressed are respectively marked.
- red represents the ungraspable area of an object, that is, Not Graspable
- green represents the overlaid area of the object, that is, Graspable
- yellow represents the ungraspable area of the object, that is, Overlap.
- the labeled two-dimensional image is 2D semantic segmentation annotation image, the ungraspable area of the object, the area where the object is overlaid and the area where the object is not pressed are the object annotation areas.
- the 2D image is subjected to 3D semantic segmentation to generate a 3D semantic segmentation annotation image: the overlaid objects, non-overlapping objects, slightly overlaid objects and objects that cannot be judged to be overlaid are annotated in advance in order to distinguish them. Different objects can be distinguished by filling them with different colors, or by filling different objects with different patterns. See Figure 12. Figure 12 is another way of marking different objects by color. All objects on the surface of the figure are marked to determine whether the objects are overlaid objects, non-overlapping objects, slightly overlapping objects, or cannot be judged. Whether the object is overlaid.
- red represents objects that are overlapped, that is, Overlap
- blue represents objects that are not overlapped, that is, Non-Overlap
- green represents objects that cannot be judged by the naked eye as to whether they are overlapped or slight overlap that does not affect the suction.
- the object that is, Uncertain
- the annotated 2D image is a 3D semantic segmentation annotated image.
- the area where the unpressed object is located is the object marking area, and/or the area where the slightly pressed object is located is the object marking area.
- Step 102 Determine the depth information of the object labeling area based on the object labeling area corresponding to the semantic segmentation annotation image and the depth image corresponding to the semantic segmentation annotation image.
- the semantic segmentation annotation image corresponds to the depth image one-to-one
- the depth information that is, the depth value, of the object annotation area is determined based on the object annotation area corresponding to the semantic segmentation annotation image and the depth image corresponding to the semantic segmentation annotation image.
- the depth information of the object annotation area can be accurately determined based on the object annotation area and depth image corresponding to the semantic segmentation annotation image.
- FIG. 13 is a schematic flowchart of yet another image generation method provided in Embodiment 4 of the present disclosure. Based on the first embodiment, step 101 is further refined, specifically including the following steps:
- Step 101a Obtain an annotation file corresponding to the semantic segmentation annotation image, and parse the annotation file to obtain annotation pixel information.
- the annotation file corresponding to the semantic segmentation annotation image is obtained.
- the annotation file records the pixel information of the area where the annotated object is located, that is, the annotation pixel information.
- the annotation file is parsed to obtain the annotation pixel information.
- Step 101b Determine the object annotation area corresponding to the semantic segmentation annotation image based on the annotation pixel information.
- the object annotation area in the semantic segmentation annotation image is determined based on annotation pixel information, where the annotation pixel information includes annotation pixel positions, and the object annotation area in the semantic segmentation annotation image can be determined based on the annotation pixel positions.
- the object annotation area in the image can be accurately identified based on the pre-recorded annotation file.
- Figure 14 is a schematic flowchart of another image generation method provided in Embodiment 4 of the present disclosure. Based on the first embodiment, step 102 is further refined, specifically including the following steps:
- Step 102a Determine a second position in the depth image corresponding to the first position according to the first position of the object labeling area in the labeling image.
- the first position of the object annotation area corresponding to the semantic segmentation annotation image in the semantic segmentation annotation image is obtained.
- the first position is the annotation pixel position of the object annotation area corresponding to the semantic segmentation annotation image.
- the first position of the object annotation area in the semantic segmentation annotation image determines the second position corresponding to the first position in the depth image corresponding to the semantic segmentation annotation image, and the second position is the annotation pixel position of the object annotation area corresponding to the depth image, The first position corresponds to the second position.
- Step 102b Determine the depth information of the second position in the depth image as the depth information of the object labeling area.
- the depth information of the second position in the depth image corresponding to the semantic segmentation annotation image is determined as the depth information of the object annotation area.
- the annotation image corresponds to the depth image one-to-one. According to the one-to-one correspondence between the annotation image and the depth image, the second position corresponding to the first position in the depth image can be accurately determined, thereby obtaining depth information.
- FIG. 15 is a schematic flowchart of yet another image generation method provided in Embodiment 4 of the present disclosure. Based on the first embodiment, this embodiment takes instance segmentation as an example to illustrate the implementation method of determining the depth information of the object labeling area. Specifically, it includes the following steps:
- Step 151 Determine the object annotation area corresponding to the instance segmentation annotation image.
- the standard image includes an instance segmentation annotation image, where instance segmentation not only requires pixel-level classification, but also distinguishes different instances on the basis of specific categories.
- the instance segmentation annotation image is an instance segmentation annotation image.
- the two-dimensional image is segmented through 2D instances to generate a 2D instance segmentation annotation image: the overlaid objects, non-overlapping objects, slightly overlapping objects and objects that cannot be judged to be overlaid in the two-dimensional image are pre-annotated in order to distinguish them. Different objects can be distinguished by filling them with different colors, or by filling different objects with different patterns.
- Figure 16 is another way of marking different objects by color. All objects on the surface of the figure are marked to determine whether the objects are overlaid objects, non-overlapping objects, slightly overlapping objects, or cannot be judged. Whether the object is overlaid.
- red represents objects that are overlaid
- blue represents objects that are not overlaid
- green represents objects that cannot be judged by the naked eye as to whether they are overlaid or objects that are slightly overlaid that do not affect the suction.
- the two-dimensional image after labeling Segment and label images for 2D instances, where the area where unpressed objects are located is the object labeling area, and/or the area where slightly overlapping objects are located is the object labeling area.
- Step 152 Determine the depth information of the object labeling area based on the object labeling area corresponding to the instance segmentation labeling image and the depth image corresponding to the instance segmentation labeling image.
- the instance segmentation annotation image corresponds to the depth image one-to-one
- the depth information that is, the depth value, of the object annotation area is determined based on the object annotation area corresponding to the instance segmentation annotation image and the depth image corresponding to the instance segmentation annotation image.
- the depth information of the object labeling area can be accurately determined based on the object labeling area and depth image corresponding to the segmented labeling image.
- step 151 is further refined, specifically including the following steps:
- Step 151a Obtain the annotation file corresponding to the instance segmentation annotation image, and parse the annotation file to obtain annotation pixel information.
- the annotation file corresponding to the instance segmentation annotation image is obtained.
- the annotation file records the pixel information of the area where the annotated object is located, that is, the annotation pixel information.
- the annotation file is parsed to obtain the annotation pixel information.
- Step 151b Determine the object annotation area corresponding to the instance segmentation annotation image based on the annotation pixel information.
- the object annotation area in the instance segmentation annotation image is determined based on the annotation pixel information, where the annotation pixel information includes annotation pixel positions, and the object annotation area in the instance segmentation annotation image can be determined based on the annotation pixel positions.
- the object annotation area in the image can be accurately identified based on the pre-recorded annotation file.
- step 152 is further refined, specifically including the following steps:
- Step 152a Determine the second position corresponding to the first position in the depth image based on the first position of the object labeling area in the labeling image.
- the first position of the object annotation area corresponding to the instance segmentation annotation image in the instance segmentation annotation image is obtained.
- the first position is the annotation pixel position of the object annotation area corresponding to the instance segmentation annotation image.
- the first position of the object annotation area in the instance segmentation annotation image determines the second position corresponding to the first position in the depth image corresponding to the instance segmentation annotation image, and the second position is the annotation pixel position of the object annotation area corresponding to the depth image, The first position corresponds to the second position.
- Step 152b Determine the depth information of the second position in the depth image as the depth information of the object labeling area.
- the depth information of the second position in the depth image corresponding to the instance segmentation annotation image is determined as the depth information of the object annotation area.
- the annotation image corresponds to the depth image one-to-one. According to the one-to-one correspondence between the annotation image and the depth image, the second position corresponding to the first position in the depth image can be accurately determined, thereby obtaining depth information.
- Figure 17 is a schematic flowchart of yet another image generation method provided in Embodiment 4 of the present disclosure. Based on the first embodiment, step 93 is further refined, specifically including the following steps:
- Step 931 Obtain the camera internal parameters, and generate point cloud information of the object labeling area based on the camera internal parameters and depth information.
- camera intrinsic parameters are obtained, where the camera intrinsic parameters include the camera focal length and the offset of the camera optical axis in the image coordinate system, where the unit of camera anxiety is pixels, and the offset of the camera optical axis in the image coordinate system
- the unit of quantity is pixels.
- the point cloud information of the object labeling area is further generated based on the camera internal parameters and the depth information of the object labeling area.
- Step 932 Generate a point cloud image based on the point cloud information of the object label area.
- a point cloud image is generated based on the point cloud information of the object's annotation area, and the corresponding neural network model is trained using the point cloud image of the object.
- Figure 18 is a schematic flowchart of yet another image generation method provided in Embodiment 4 of the present disclosure. Based on the first embodiment, step 94 is further refined, specifically including the following steps:
- Step 941 Obtain the annotation type of the annotated image, and establish a corresponding image sub-database according to the annotation type.
- the annotation type of the annotated image is obtained, where the annotation type of the annotated image includes a semantic segmentation type image and an instance segmentation type image, and a corresponding image sub-database is established according to the annotation type, such as establishing at least one semantic segmentation type image sub-database. and establishing at least one instance segmentation class image subdatabase.
- Step 942 Establish a correspondence between the annotated image and the point cloud image of the corresponding object.
- each annotated image and the point cloud image of the object corresponding to each annotated image is established, and each annotated image and the point cloud image of the object corresponding to each annotated image are associated with each other.
- Step 943 Store the annotated images of the same annotation type and the corresponding point cloud images into the same image sub-database.
- annotated images of the same type and corresponding point cloud images are stored in the same image sub-database, and annotated images and corresponding point cloud images of semantic segmentation class images are stored in the same semantic segmentation class.
- the image sub-database the annotated image of the instance segmentation class image and the corresponding point cloud image are stored in the same instance segmentation class image sub-database.
- images of the same type and corresponding point cloud images are stored in the same image sub-database.
- step 94 is further refined, specifically including the following steps:
- Step 94a Determine the corresponding object pose information based on the point cloud image of the object.
- the corresponding object pose information is determined based on the object's point cloud image, and the object pose information is the position and attitude of the object in the coordinate system.
- Pose estimation plays a very important role in the field of computer vision. It has great applications in using visual sensors to estimate robot pose for control and robot navigation.
- Step 94b Construct an image database based on the point cloud image of the object and the corresponding object pose information.
- an image database is constructed based on the object's point cloud image and corresponding object pose information, and the object's point cloud image and corresponding object pose information are stored in the image database.
- the image database is used to provide training samples for subsequent capture of deep learning. There is no need to manually collect sample data.
- Point cloud images can be automatically obtained based on annotated images and depth images, thereby constructing an image database based on the point cloud image of the object and the corresponding object pose information. , improve the efficiency of building image database and reduce the labor cost of building image database.
- Figure 19 is a schematic structural diagram of an image generation device provided in Embodiment 5 of the present disclosure. As shown in Figure 19, the device includes:
- the acquisition module 191 is used to acquire the image to be synthesized, and the objects in the image to be synthesized at least include real objects;
- the acquisition module 191 is also used to acquire an instance group.
- the instance group includes at least one first instance.
- the first instance includes an image of a single object.
- the image of a single object includes a point cloud image constructed based on the annotation image and the depth image corresponding to the annotation image;
- the synthesis module 192 is used to synthesize and obtain a virtual image based on the image to be synthesized and the instance group.
- the objects in the virtual image include objects in the image to be synthesized and objects corresponding to the instance group.
- the obtaining module 191 is also used to:
- annotated image and depth image Obtain the annotated image and depth image, and the annotated image is a two-dimensional image
- a point cloud image of the object is generated
- the annotated image includes: a semantic segmentation annotated image
- the determination module determines the depth information of the object's annotation area based on the annotation image and the corresponding depth image, and is used for:
- the annotated image includes: instance segmentation annotated image
- the determination module determines the depth information of the object's annotation area based on the annotation image and the corresponding depth image, and is used for:
- the depth information of the object labeling area is determined based on the object labeling area corresponding to the instance segmentation labeling image and the depth image corresponding to the instance segmentation labeling image.
- the determination module determines the depth information of the object labeling area based on the object labeling area corresponding to the semantic segmentation annotation image and the depth image corresponding to the semantic segmentation annotation image, and is used for:
- a module is determined for:
- a point cloud image is generated.
- the image database includes: a plurality of image sub-databases
- the determination module builds an image database based on the point cloud image of the object and is used for:
- the determination module constructs an image database based on the point cloud image of the object for:
- synthesis module 192 is used to:
- the virtual image is obtained by placing the object corresponding to the instance group at the placement position in the image to be synthesized;
- If a collision occurs re-execute the step of determining the placement position of the object corresponding to the instance group in the image to be synthesized, until the number of collision detections reaches the preset first threshold, re-execute the step of obtaining the instance group, in which the re-obtained The instance group is different from the previously obtained instance group.
- obtaining the instance group includes obtaining the instance group from an instance database, where the instance database includes a plurality of instances.
- the determination module determines the placement position of the object corresponding to the instance group in the image to be synthesized, and is used for:
- the placement height of the object corresponding to the instance group is obtained, and the placement height is determined based on the average height and the predetermined fluctuation range;
- the acquisition module 191 acquires the image to be synthesized for:
- the virtual image obtained by historical synthesis is used as the image to be synthesized.
- the number of objects in the image content of the virtual image does not exceed a predetermined second threshold
- the module is determined and is also used for:
- the quantity of the instance group is adjusted so that the adjusted sum of the quantities does not exceed the second threshold.
- the determining module is also used to:
- the virtual image is used as the current image to be synthesized, and the step of obtaining the instance group is performed again until the number of objects placed in the image content of the currently obtained virtual image reaches the second threshold, or The instances in the current instance database are traversed.
- the first example also includes annotations corresponding to a single object, and the annotations include point cloud data and gripper annotations.
- Figure 20 is a schematic structural diagram of a model training device provided in Embodiment 5 of the present disclosure. As shown in Figure 20, the device includes:
- the virtual image obtained by using any one of the image generation methods is used as a training sample.
- the training device includes:
- the acquisition module 2001 is used for the first training of the model, selecting the collected real images and the annotations corresponding to the real images as training samples to train the model; wherein, the objects placed in the image content of the real images are all real objects;
- the acquisition module 2001 is also used for non-first-time training of the model. From the collected real images and synthesized virtual images, randomly select images and their corresponding annotations as training samples to train the model.
- the model is used to generate a gripper corresponding to an object in the image to be processed based on the input image to be processed.
- the present disclosure also provides a computer storage medium, including computer executable instructions stored thereon, which when executed by a processor implement the image generation method or the model training method as described above.
- the present disclosure also relates to a computer storage medium including computer executable instructions stored thereon, which when executed by the processor 2101 implement the image generation method or the model training method as described above.
- the present disclosure also provides a computer program, which when the computer program is executed by a processor, is used to perform any one of the above image generation methods or model training methods.
- FIG 21 is a schematic structural diagram of an electronic device provided in Embodiment 5 of the present disclosure. As shown in Figure 21, the electronic device includes:
- One or more processors 2101 are connected to the memory 2102 through the system bus;
- the memory 2102 is used to store executable instructions that, when executed by the one or more processors 2101, implement the image generation method or the model training method as described above.
- FIG 22 is a schematic structural diagram of another electronic device provided in Embodiment 5 of the present disclosure.
- the electronic device 800 can be a computer, a digital broadcast terminal, a messaging device, a tablet device, a personal digital assistant, or a server. Server cluster, etc.
- the electronic device may include one or more of the following components: processing component 802 , memory 804 , power supply component 806 , multimedia component 808 , audio component 810 , input/output (I/O) interface 812 , sensor component 814 , and communications component 816 .
- Processing component 802 generally controls the overall operation of the electronic device, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
- the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method.
- processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components.
- processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.
- Memory 804 is configured to store various types of data to support operations in the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, etc.
- Memory 804 may be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EEPROM), Programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
- SRAM static random access memory
- EEPROM electrically erasable programmable read-only memory
- EEPROM erasable programmable read-only memory
- EPROM Programmable read-only memory
- PROM programmable read-only memory
- ROM read-only memory
- magnetic memory flash memory, magnetic or optical disk.
- Power supply component 806 provides power to various components of the electronic device.
- Power supply component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic devices.
- Multimedia component 808 includes a screen that provides an output interface between the electronic device and the user.
- the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
- the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. A touch sensor can not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action.
- multimedia component 808 includes a front-facing camera and/or a rear-facing camera.
- the front camera and/or the rear camera can receive external multimedia data.
- Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.
- Audio component 810 is configured to output and/or input audio signals.
- audio component 810 includes a microphone (MIC) configured to receive external audio signals when the electronic device is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 804 or sent via communication component 816 .
- audio component 810 also includes a speaker for outputting audio signals.
- the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.
- Sensor component 814 includes one or more sensors for providing various aspects of status assessment for the electronic device.
- the sensor component 814 can detect the open/closed state of the electronic device, the relative positioning of components, such as the display and keypad of the electronic device, the sensor component 814 can also detect the position change of the electronic device or a component of the electronic device, the user The presence or absence of contact with the electronic device, the orientation or acceleration/deceleration of the electronic device and the temperature changes of the electronic device.
- Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
- Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
- the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
- Communication component 816 is configured to facilitate wired or wireless communications between electronic devices and other devices. Electronic devices can access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, communications component 816 also includes a near field communications (NFC) module to facilitate short-range communications.
- NFC near field communications
- the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
- RFID radio frequency identification
- IrDA infrared data association
- UWB ultra-wideband
- Bluetooth Bluetooth
- electronic device 2200 may be configured by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable Programmed gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are implemented for executing the above image generation method or model training method.
- ASICs application specific integrated circuits
- DSPs digital signal processors
- DSPDs digital signal processing devices
- PLDs programmable logic devices
- FPGA field programmable Programmed gate array
- controller microcontroller, microprocessor or other electronic components are implemented for executing the above image generation method or model training method.
- This embodiment also provides a chip.
- the chip includes a memory and a processor. Codes and data are stored in the memory.
- the memory is coupled to the processor.
- the processor runs the program in the memory so that the The chip is used to execute the image generation method or model training method provided by the above various embodiments.
- This embodiment also provides a computer program, which, when executed by a processor, is used to execute the image generation method or model training method provided by the various embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Mechanical Engineering (AREA)
- Robotics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Processing Or Creating Images (AREA)
- Image Processing (AREA)
Abstract
本公开提供一种图像生成方法和模型训练方法,该图像生成方法通过获取至少包括真实物体的待合成图像以及包括至少一个第一实例的实例组,将至少一个第一实例与待合成图像合成获得虚拟图像,其中,实例组包括至少一个第一实例,第一实例包括单个物体的图像,单个物体的图像包括基于标注图像和标注图像对应的深度图像构建的点云图像。该方案获得的虚拟图像,可用作训练样本,相比于通过采集手段获得真实图像作为训练样本,通过合成虚拟图像获得的训练样本,数量更加充足,且样本场景更为丰富,能够高效便捷地实现训练样本的有效扩充。
Description
本申请要求于2022年07月04日提交中国专利局、申请号为202210780030.3、申请名称为“虚拟图像的生成方法、装置、设备、介质及产品”、以及2022年08月19日提交中国专利局、申请号为202211003504.X、申请名称为“图像数据库构建方法、装置、设备、存储介质及产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本公开涉及人工智能领域,尤其涉及一种图像生成方法和模型训练方法。
随着人工智能技术的发展,其应用领域越来越广。例如伴随网络购物的普及,以物流领域为例,基于人工智能技术可以实现不依赖人工的智能分拣物品,以改善人工分拣的效率和人力成本等问题。
实际应用中,人工智能模型需要基于训练样本进行训练,因此训练样本影响模型的质量。以智能分拣模型为例,如果缺乏充足的训练样本,会导致训练得到的模型灵活程度低,比如只能根据教程完成单一的抓取和安装,无法根据采集的图像中不同的位置的物体做出相应的判断等。因此如何有效扩充训练样本,成为亟待解决的问题。
背景技术部分的内容仅仅是公开发明人所知晓的技术,并不当然代表本领域的现有技术。
有鉴于现有的一个或多个缺陷,本公开提供一种图像生成方法,包括:
获取待合成图像,所述待合成图像中的物体至少包括真实物体;
获取实例组,所述实例组包括至少一个第一实例,所述第一实例包括单个物体的图像,所述单个物体的图像包括基于标注图像和所述标注图像对应的深度图像构建的点云图像;
基于所述待合成图像和所述实例组,合成获得虚拟图像,所述虚拟图像中的物体包括所述待合成图像中的物体和所述实例组对应的物体。
本公开还提供一种模型训练方法,采用如权图像生成方法中任一项所述的图像生成方法获得的虚拟图像作为训练样本,所述训练方法包括:
针对模型的首次训练,选取采集得到的真实图像以及所述真实图像对应的标注,作为训练样本对模型进行训练;其中,所述真实图像的图像内容中放置的物体均为真实物体;
针对所述模型的非首次训练,从采集得到的真实图像和合成获得的虚拟图像中,随机选取图像以及所述图像对应的标注作为训练样本,对所述模型进行训练。
本公开还提供一种图像生成装置,包括:
获取模块,用于获取待合成图像,所述待合成图像中的物体至少包括真实物体;
所述获取模块,还用于获取实例组,所述实例组包括至少一个第一实例,所述第一实例包括单个物体的图像,所述单个物体的图像包括基于标注图像和所述标注图像对应的深度图像构建的点云图像;
合成模块,用于基于所述待合成图像和所述实例组,合成获得虚拟图像,所述虚拟图像中的物体包括所述待合成图像中的物体和所述实例组对应的物体。
本公开还提供一种电子设备,包括:
一个或多个处理器;
存储器,用于存储可执行指令,所述可执行指令在被所述一个或多个处理器执行时,实施如上所述的图像生成方法或模型训练方法。
本公开还提供一种芯片,所述芯片包括存储器、处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的程序使得所述芯片用于执行上述任一项所述的图像生成方法或模型训练方法。
本公开还提供一种程序产品,包括:计算机程序,当所述程序产品在计算机上运行时,使得所述计算机执行上述任一项所述的图像生成方法或模型训练方法。
本公开还提供一种计算机程序,当所述计算机程序被处理器执行时,用于执行上述任一项所述的图像生成方法或模型训练方法。
本公开提供一种图像生成方法和模型训练方法,该图像生成方法通过获取至少包括真实物体的待合成图像以及包括至少一个第一实例的实例组,将至少一个第一实例与待合成图像合成获得虚拟图像,其中,实例组包括至少一个第一实例,第一实例包括单个物体的图像,单个物体的图像包括基于标注图像和标注图像对应的深度图像构建的点云图像。该方案获得的虚拟图像,可用作训练样本,相比于通过采集手段获得真实图像作为训练样本,通过合成虚拟图像获得的训练样本,数量更加充足,且样本场景更为丰富,能够高效便捷地实现训练样本的有效扩充。
构成本公开的一部分的附图用来提供对本公开的进一步理解,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1为本公开示例的应用场景示意图;
图2为本公开实施例一提供的一种图像生成方法的流程示意图;
图3为本公开实施例一提供的另一种图像生成方法的流程示意图;
图4为本公开实施例一提供的又一种图像生成方法的流程示意图;
图5为进行图像合成的场景示例图;
图6为本公开实施例一提供的又一种图像生成方法的流程示意图;
图7为本公开实施例三提供的一种模型训练方法的流程示意图;
图8为本公开实施例提供的图像数据库构建方法对应的网络架构;
图9为本公开实施例四提供的一种图像生成方法的流程示意图;
图10为本公开实施例四提供的另一种图像生成方法的流程示意图;
图11为一种以颜色区分不同区域的标注方式;
图12为另一种以颜色区分不同物体的标注方式;
图13为本公开实施例四提供的又一种图像生成方法的流程示意图;
图14为本公开实施例四提供的还一种图像生成方法的流程示意图;
图15为本公开实施例四提供的再一种图像生成方法的流程示意图;
图16为再一种以颜色区分不同物体的标注方式;
图17为本公开实施例四提供的又一种图像生成方法的流程示意图;
图18为本公开实施例四提供的又一种图像生成方法的流程示意图;
图19为本公开实施例五提供的一种图像生成装置的结构示意图;
图20为本公开实施例五提供的一种模型训练装置的结构示意图;
图21为本公开实施例五提供的一种电子设备的结构示意图;
图22为本公开实施例五提供的另一种电子设备的结构示意图。
在下文中,仅简单地描述了某些示例性实施例。正如本领域技术人员可认识到的那样,在不脱离本公开的精神或范围的情况下,可通过各种不同方式修改所描述的实施例。因此,附图和描述被认为本质上是示例性的而非限制性的。
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
目前,人工智能通常通过机器学习实现,所述机器学习根据学习样本进行训练,建立完善的智能模型,实现机器的智能操作。比如以智能分拣的场景举例,以大量物体的图片作为训练样本,建立能够智能识别物体并进行分拣的智能模型,从而实现机器对物体的智能分拣。作为示例,获取样本的一种手段为通过采集装置采集真实物体的图像。由于模型精度依赖大量的训练样本,因此,面对需获取大量样本的情况,上述手段存在一定问题,比如,成本较高,耗时较长。并且,受限于真实物体的数量,能获取的样本量也会存在限制,不利于样本扩充。
为了更加高效便捷地实现样本扩充,发明人通过研究发现,可以采用虚拟样本进行模型训练,通过生成虚拟样本可以有效地扩充样本,相应地,基于丰富的样本能够建立精度更高的人工智能模型。仍以智能分拣的场景举例,可以生成虚拟图像作为训练样本,实现样本的有效扩充。图1为本公开示例的应用场景示意图。如图1所示,可以获取待合成图像与实例组,通过将待合成图像与实例组进行合成,获得虚拟图像。该虚拟图像后续可作为训练样本,用于训练人工智能模型。
此外,随着深度学习的不断发展,已经被应用于包括计算机视觉、机器翻译、图像分析等领域,深度学习任务通常是能够快速完成复杂模型的训练。在训练过程中需要足够多的图像数据集作为训练样本,且训练样本的规模和质量直接决定了最终图像理解引擎的性能。目前对于图像数据库通常都是采用人工采集的方式,人为收集各种样本图像,从而构建样本数据库。
随着深度学习模型越复杂,涉及的参数越多,需要用户预先收集大量的样本图像以训练出精确度更高的模型,使得用户工作量较大,影响了深度学习进度,采用人工采集图像构建数据库时,效率较低且人力成本较高。
所以针对现有技术中深度学习时由人工构建数据库的方式效率较低的问题,发明人在研究中发现,生成虚拟图像之前,还可以获取标注图像以及标注图像对应的深度图像,根据标注图像以及对应的深度图像确定物体标注区域的深度信息,基于物体标注区域的深度信息,生成物体的点云图像,之后根据物体的点云图像构建图像数据库。上述过程无需人工收集样本数据,能够基于标注图像以及深度图像自动得到点云图像,从而基于点云图像构建图像数据库,提高构建图像数据库的效率,减少构建图像数据库的人力成本。进而,可以从图像数据库中获取点云图像作为实例组中的单个物体的图像,为生成的虚拟图像提供原始数据支持。
需要说明的是,本公开中对于术语的简要说明,仅是为了方便理解接下来描述的实施方式,而不是意图限定本公开的实施方式。除非另有说明,这些术语应当按照其普通和通常的含义理解。
下面以具体的实施例对本公开的技术方案以及本公开的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。在本公开的描述中,除非另有明确的规定和限定,各术语应在本领域内做广义理解。下面将结合附图,对本公开的实施例进行描述。
实施例一
图2为本公开实施例一提供的一种图像生成方法的流程示意图,如图2所示,所述图像生成方法,包括:
步骤21、获取待合成图像,待合成图像中的物体至少包括真实物体;
步骤22、获取实例组,实例组包括至少一个第一实例,第一实例包括单个物体的图像,单个物体的图像包括基于标注图像和标注图像对应的深度图像构建的点云图像;
步骤23、基于待合成图像和实例组,合成获得虚拟图像,虚拟图像中的物体包括待合成图像中的物体和实例组对应的物体。
本实施例的执行主体为图像生成装置,该图像生成装置可以通过计算机程序实现,例如,应用软件等;或者,也可以实现为存储有相关计算机程序的介质,例如,U盘、云盘等;再或者,还可以通过集成或安装有相关计算机程序的实体装置实现,例如,芯片等。
其中,待合成图像指图像中至少存在真实物体的图像。这里的真实物体指真实存在的物体。举例来说,假设某图像是通过对真实摆放的物体进行图像采集获得的,那么可理解为该图像中的物体包括真实物体。可选的,图像采集的方式不限,比如可以通过照相机或摄像头等图像采集装置进行采集。需要说明的是,待合成图像中真实物体的数量不限,比如可以包括一个或者多个。此外,待合成图像中的物体除了包括真实物体之外,也可以包括并非直接通过图像采集装置采集到的物体,例如,通过合成手段放置至图像中的物体。
可选的,实例组中的第一实例可以为一个也可以为多个,也就是说加入待合成图像中的物体可以为一个或者多个。其中,实例包括单个物体的图像,该单个物体图像可以是上述的点云图像。实际应用中,可以预先建立多个实例,后续当需要生成虚拟图像时,可以从预先建立的多个实例中选取实例组与待合成图像进行合成。其中,建立实例的方法不限。比如,可以通过图像采集装置采集单个物体的图像来获得实例。再比如,也可以从包括多个物体的图像中,识别获得其中单个物体的图像来获得实例。后一种方式中,识别单个物体的手段有多种,例如,可以基于物体的外轮廓识别技术,识别出图像中单个物体的外轮廓,进而获得该单个物体的图像。
可选的,单个物体的图像可以是包括基于标注图像和标注图像对应的深度图像所构建的点云图像,该点云图像的实现基于下述实施例进行说明。
实际应用中,可以建立实例数据库,以便于各个实例的存储和维护。故在一个示例中,步骤22具体可以包括:从实例数据库中获取实例组,实例数据库包括多个实例。结合场景示例,各个实例建立后,可将建立的实例存储在实例数据库中。后续当需要生成虚拟图像时,可以获取待合成图像,并从实例数据库中选取至少一个实例作为实例组,将待合成图像与实例组进行合成,获得虚拟图像。本实施例中,获取至少包括真实物体的待合成图像以及包括至少一个第一实例的实例组,将至少一个第一实例与待合成图像合成获得虚拟图像。相比于依赖采集真实物体获得图像的方式,通过合成获得虚拟图像的方式更加高效便捷。
需要说明的是,本实施例的实施场景不限。举例来说,实际应用中可以通过多次进行合成来获得最终的虚拟图像。作为一种示例,步骤21的实施场景为首次合成的场景,即此时的待合成图像中并未加入过实例组。可选的,初次合成时的待合成图像中的物体可均为真实物体,例如待合成图像可以为图像采集装置直接采集的图像。相应的,步骤21具体包括:获取采集得到的真实图像,作为待合成图像。作为另一种示例,步骤21的实施场景为非首次合成的场景,即此时的待合成图像中已加入过实例组。相应的,步骤21具体包括:将历史合成获得的虚拟图像,作为待合成图像。
结合多次进行合成的场景示例:首先,通过图像采集装置采集获得真实图像,作为首次合成的待合成图像,该真实图像中的物体均为真实物体;获取需要加入待合成图像的实例组,其中实例组的获取方式不限,比如包括但不限于随机选取等;之后将当前的待合成图像和实例组合成,获得首次合成的虚拟图像。后续,该虚拟图像作为下一次合成的待合成图像,并再一次选取实例组进行第二次合成,得到第二次合成的虚拟图像。以此类推,后续的每次合成中,以上一次历史合成的虚拟图像,作为本次的待合成图像,获取实例组进行合成,直至经过多次合成后得到最终的虚拟图像。
本实施例提供的图像生成方法中,通过获取至少包括真实物体的待合成图像以及包括至少一个第一实例的实例组,将至少一个第一实例与待合成图像合成获得虚拟图像,其中,实例组包括至少一个第一实例,第一实例包括单个物体的图像,单个物体的图像包括基于标注图像和标注图像对应的深度图像构建的点云图像。该方案获得的虚拟图像,可用作训练样本,相比于通过采集手段获得真实图像作为训练样本,通过合成虚拟图像获得的训练样本,数量更加充足,且样本场景更为丰富,能够高效便捷地实现训练样本的有效扩充。
具体的,在进行待合成图像与实例组的合成时,为了保证最终得到的虚拟图像的可靠性和真实性,需要考虑实例组对应的物体在待合成图像中的放置位置。在一种示例中,图3为本公开实施例一提供的另一种图像生成方法的流程示意图,在任一示例的基础上,步骤23包括:
步骤31、确定实例组对应的物体在待合成图像中的放置位置;
步骤32、检测实例组对应的物体在放置位置下,与周边的其它物体是否发生碰撞;
步骤33、若未发生碰撞,则通过将实例组对应的物体放置在待合成图像中的放置位置,合成获得虚拟图像。
结合场景示例:实际应用中,当把实例组中的第一实例与待合成图像进行合成时,如果发生物体间的位置碰撞,会影响合成的虚拟图像的质量。故本示例中,获取待合成图像和实例组后,先确定实例组对应的物体在待合成图像中的本次放置位置;基于确定的本次放置位置,检测实例组对应的物体放置在该位置时是否会与其它物体发生碰撞;若没有发生碰撞,则表明本次确定的放置位置较为合理,则相应的,将实例组对应的物体放置在该位置下,实现待合成图像和实例组的合成,获得虚拟图像。其中,确定放置位置的方式不限。例如,可以在待合成图像中随机选取第一实例的位置。
可选的,如果检测到实例组对应的物体在本次确定的放置位置下,与其它物体发生碰撞,则需要重新选取放置位置。作为示例,若所有第一实例对应的物体均未与其它物体发生碰撞,则判定为实例组对应的物体在当前的放置位置下未发生碰撞;若存在任一第一实例对应的物体与其它物体发生碰撞,则判定为实例组对应的物体在当前的放置位置下发生碰撞。通过检测实例组在放置位置下的碰撞情况,调整实例组对应的物体在待合成图像中的放置位置,从而提高合成获得的虚拟图像的真实性和可靠性。进而以虚拟图像为样本训练获得的模型的精度和可靠性更高。
在一种示例中,图4为本公开实施例一提供的又一种图像生成方法的流程示意图,步骤32之后,还包括:
步骤41、若发生碰撞,则重新执行确定实例组对应的物体在待合成图像中的放置位置的步骤,直至碰撞检测的次数达到预设的第一阈值时,重新执行获取实例组的步骤,其中重新获取的实例组不同于之前获取的实例组。
结合场景示例:如图5所示,图5为进行图像合成的场景示例图。在对待合成图像和实例组进行合成时,首先确定本次的放置位置,如图5所示,假设实例组包括一个圆柱体与一个照相机,确定本次的放置位置包括,圆柱体的放置位置为位置1,照相机的放置位置为位置2;之后,检测圆柱体在位置1下与其它物体是否发生碰撞,以及检测照相机在位置2下与其它物体是否发生碰撞;结合图示的举例,若圆柱体发生碰撞,照相机未发生碰撞。由于存在发生碰撞的实例,故重新确定放置位置,例如,重新确定圆柱体的放置位置为位置3,照相机的放置位置为位置2,再次检测圆柱体和照相机与周边物体的碰撞情况,假设圆柱体和照相机均未与其它物体发生碰撞,则基于将圆柱体放置在位置3下,将照相机放置在位置2下,进行实例组与待合成图像的合成,获得如图所示的虚拟图像。
其中,确定位置的具体方式不限,比如,针对包含多个第一实例的实例组,可以通过互换各第一实例的位置,更新实例组的摆放位置,或者,也可以通过改变部分或所有第一实例的位置,来更新实例组的摆放位置。
在一个示例中,设定重复确定位置的次数不超过预定的第一阈值。可选的,第一阈值为20次。具体的,为了提高处理效率,设定了重复确定放置位置的次数上限,对于某实例组来说,假设检测到其发生碰撞的次数超过了第一阈值,则判定为实例组中的物体在待合成图像中没有合适的放置位置。故返回执行步骤22,以重新选取新的实例组,且新的实例组不同于之前的实例组。需要说明的是,这里的不同包括实例组中的第一实例部分不同和实例组中的第一实例均不同的情形。举例来说,包括但不限于下列情形:情形一:之前选取的实例组中的实例为实例A,新的实例组中的实例为实例B;情形二:之前选取的实例组中的实例为实例A,新的实例组中的实例为实例B与其他实例,其他实例不包括A;情形三:之前选取的实例组中的实例为实例A、B、C,新的实例组中的实例为实例B、C、D,即新的实例组的实例,存在部分实例不同于与之前选取的实例组中的实例;情形四:之前选取的实例组中的实例为实例A、B、C,新的实例组中的实例为实例D、E,即新的实例组的实例完全不同于与之前选取的实例组中的实例。
在一种示例中,图6为本公开实施例一提供的又一种图像生成方法的流程示意图,在步骤31中,确定实例组对应的物体在待合成图像中的放置位置,包括:
步骤61、根据待合成图像的图像内容中放置的物体的平均高度,获得实例组对应的物体的放置高度,放置高度基于平均高度和预定的波动范围确定;
步骤62、在放置高度对应的平面上随机选取位置,作为实例组对应的物体在待合成图像中的放置位置。
结合场景示例,确定实例组对应的物体在待合成图像中的放置位置,放置位置可首先确定出放置高度,放置高度也可以称为实例组对应的物体在待合成图像中的深度,当确定出放置高度后,再确定出在放置高度对应的平面上的位置。放置高度的确认,可通过计算待合成图像中已有物体的平均高度,然后将平均高度为基准,选取一个高度的波动范围,将波动范围内的高度值作为实例组对应的物体在待合成图像中的放置高度。高度的波动范围可选为5mm-10mm,即在平均高度上下5mm-10mm的波动范围内均可作为实例组对应的物体在待合成图像中的放置高度。当确定出放置高度后,基于放置高度对应的水平面,再在水平面上随机选取位置,并将其作为实例组对应的物体在待合成图像中的放置位置。通过考虑高度和平面位置来确定物体在待合成图像中的摆放位置,能够生成更加自然贴合真实,且场景更为丰富的虚拟图像。
在一种示例中,虚拟图像的图像内容中的物体数量不超过预定的第二阈值。
其中,第二阈值可以预先设定,例如,第二阈值可以为30。结合场景示例:可提前设定规定物体数量的第二阈值,第二阈值为虚拟图像中物体数量的最大值,第二阈值可选为30。基于第二阈值,可以根据待合成图像中物体的数量确定实例组中第一实例的数量,比如,假设待合成图像中已有物体的数量为20个,则可设定实例组中第一实例的数量不超过10个。作为示例,第二阈值还可用于判定合成是否结束。比如,在多次合成获得虚拟图像的场景下,每次合成得到的虚拟图像中的会增加实例组中的第一实例对应的物体,故在一个示例中,假设每次合成的实例组包括多个第一实例,则可在每次合成前先根据当前待合成图像中的物体数量和当前实例组中第一实例的数量,判断两者之和是否会超过第二阈值。如果超过第二阈值,可以减小第一实例的数量。在另一个示例中,假设每次合成的实例组仅包括一个第一实例,则可在每次合成前检测当前待合成图像中的物体数量是否达到第二阈值,如果达到第二阈值,则将该图像作为最终图像,不再继续合成。
在一种可能的方式中,基于待合成图像和实例组,合成获得虚拟图像之前,还包括:
判断待合成图像中的物体数量与实例组的数量之和是否超过第二阈值;
若数量之和超过第二阈值,则调整实例组的数量,使得调整后的数量之和不超过第二阈值。
结合场景示例,为了避免得到的虚拟图像中的物体数量过多,可提前设定规定物体数量的第二阈值,第二阈值为虚拟图像中物体数量的最大值,在将待合成图像与实例组合成得到虚拟图像之前,可先计算待合成图像中物体的数量与实例组中物体数量之和是否超过第二阈值。若没有超过第二阈值,可继续合成的操作,若超过第二阈值,则将实例组中第一实例的数量进行调整,保障待合成图像中物体的数量与实例组中物体数量之和不会超过第二阈值。
在另一种可能的方式中,基于待合成图像和实例组,合成获得虚拟图像之后,还包括:
统计虚拟图像的图像内容中放置的物体数量;
若物体数量未超过预定的第二阈值,则将虚拟图像作为当前的待合成图像,再次执行获取实例组的步骤,直至当前获得的虚拟图像的图像内容中放置的物体数量达到第二阈值,或者当前实例数据库中的实例被遍历结束。
可选的,当待合成图像与实例组合成得到虚拟图像后,可统计虚拟图像中物体的数量。若虚拟图像中物体的数量没有超过第二阈值,可将得到的虚拟图像作为新的待合成图像,重新从实例数据库中选取新的实例组,将新的待合成图像与新的实例组合成新的虚拟图像,并再次统计新的虚拟图像中物体的数量。若新的虚拟图像中的物体数量仍然没有达到第二阈值,则继续将新的虚拟图像作为待合成图像,直到得到的虚拟图像中的物体数量达到第二阈值,或者即使得到的虚拟图像中的物体数量没有达到第二阈值,但是所有实例被遍历结束,即没有新的合适的实例进行后续合成为止。
通过上述示例可以灵活调整虚拟图像中的物体数量,提高虚拟图像生成的效率和真实性。
实际应用中,生成的虚拟图像可作为训练样本。在一种示例中,第一实例还包括单个物体对应的标注,标注包括点云数据和夹爪标注。
具体的,人工智能需要依靠训练数据进行训练,构建出完善的智能模型。作为示例,本实施例中获得的虚拟图像可作为智能分拣模型的训练样本,实现不依赖人工的智能分拣。实际应用中,智能分拣通常通过智能模型控制机械手臂实现物品的抓取分拣。故在一个示例中,虚拟图像可应用于夹爪生成模型,该模型用于根据输入的物体图像,生成用于抓取该物体的夹爪,以通过机械手臂参照模型生成的夹爪,执行相应的位姿实现物体的抓取。
结合上述场景示例:针对模型训练所需的样本,预先对各实例进行标注,该标注包括点云数据和夹爪标注。相应的,基于待合成图像和实例合成得到的虚拟图像中的物体携带标注。后续,当需要对模型进行训练时,可使用携带标注的虚拟图像进行模型训练。比如,当机械手臂需要训练时,可以对虚拟图像中每个第一实例对应的物体的图像以及物体对应的标注进行记忆训练,标注中包括针对物体的夹爪标注以及点云数据。当遇到类似物体时,可以根据训练记忆,对物体模拟生成对应的夹爪进行抓取,以此实现针对此类物体时,可根据物体的外轮廓生成夹爪,完成对物体的抓取。
本实施例提供的图像生成方法中,获取至少包括真实物体的待合成图像以及包括至少一个第一实例的实例组,将至少一个第一实例与待合成图像合成获得虚拟图像。该方案获得的虚拟图像,可用作训练样本,相比于通过采集手段获得真实图像作为训练样本,通过合成虚拟图像获得的训练样本,数量更加充足,且样本场景更为丰富,能够高效便捷地实现训练样本的有效扩充。
实施例二
本公开还提供一种训练样本库的建立方法,训练样本库至少包括如上任一项示例获得的虚拟图像。
在实际应用中,在机器学习中,需要首先确定出训练模型,然后根据训练样本确定出训练模型中的参数,比如可以建立训练样本库,训练样本库主要为模型训练提供训练样本数据。比如,智能机械手臂的训练样本库中包括大量的虚拟物体图像,以及虚拟物体对应标注。为了扩充,可以基于前述在训练样本库的基础上,生成的虚拟图像可以专门建立一个样本库,也可以加入到原来的样本库中,形成包括虚拟物体与真实物体的训练样本库。
结合场景示例,训练数据库中包括大量单个物体的图像与标注,训练样本库中存储的实例范围越大,那么可提供的训练数据就越丰富完善,所以可将得到的虚拟图像中真实物体的图像与对应生成夹爪最为新的训练样本存储到训练样本库中。本实施例将大量虚拟物体的图像与标注作为训练样本库中的训练样本,并将待合成图像中的真实物体的图像生成对应的标注后作为训练样本一起存入到训练样本库中,可使得训练样本库更加丰富完善,可提供的训练数据更加多样化。
实施例三
在一种示例中,图7为本公开实施例三提供的一种模型训练方法的流程示意图,训练方法包括:
步骤71、针对模型的首次训练,选取采集得到的真实图像以及真实图像对应的标注,作为训练样本对模型进行训练;其中,真实图像的图像内容中放置的物体均为真实物体;
步骤72、针对模型的非首次训练,从采集得到的真实图像和合成获得的虚拟图像中,随机选取图像以及图像对应的标注作为训练样本,对模型进行训练。
结合场景示例,机器学习可以先确定模型,然后根据训练样本数据训练模型,模型简单说可以理解为函数。确定模型是认为这些数据的特征符合哪个函数,训练模型就是用已有的训练样本数据,通过一些最优化方法其他方法确定函数的参数,参数确定后的函数就是训练的结果,最后把新的数据输入到训练好的模型中得到结果。相关技术中,机械手的模型训练可以将物体的图像以及对应的标注作为训练样本输入到机械手训练模型中,确定出机械手训练模型的参数。为了保障模型训练结果的正确性,针对模型的首次训练应该选取真实物体的图像以及真实物体对应的标注作为训练样本,对真实物体的图像以及物体对应的标注进行记忆训练。有了首次的训练记忆,在之后的训练中,训练样本可从真实图像与虚拟图像中随机选取图像以及对应的标注进行记忆训练,确定出模型的参数后,当模型遇到类似物体时,可以根据训练记忆,对物体模拟生成对应的夹爪进行抓取,完成机械手臂针对任意物品都能准确抓取的任务。
在一种示例中,模型用于根据输入的待处理图像,生成待处理图像中的物体对应的夹爪。
结合场景示例,当机械手臂的训练模型根据训练样本确定出训练模型的参数后,将物体的图像作为模型的输入样本,根据训练好的模型对输入的物体图像生成对应的夹爪。
本实施例通过对真实物体与虚拟物体的图像和标注分别完成模型的训练,可提高训练模型的正确率。
实施例四
在上述步骤22之前,还需要对点云图像进行构建,即可以是实例组中单个物体的图像的构建,应理解:点云图像可以存储在图像数据库中,如图8(图8为本公开实施例提供的图像数据库构建方法对应的网络架构)所示,先对图像数据库构建方法对应的网络架构进行说明,包括:终端设备1及服务器2。终端设备1与服务器2进行通信连接。用户在终端设备1上对二维RGB图像进行图像标注,二维RGB图像中包括至少一个物体,如对二维RGB图像中物体不可抓区域、物体被压叠区域、物体未被压区域进行标注,如对二维图像中被压物体、未被压物体以及轻微压叠物体进行标注,标注后的二维RGB图像为标注图像,其中,各二维RGB图像有其对应的深度图像。服务器2获取标注图像及标注图像对应的深度图像;根据标注图像及对应的深度图像确定物体标注区域的深度信息;基于物体标注区域的深度信息,生成物体的点云图像;根据物体的点云图像构建图像数据库。需人工收集样本数据,能够基于标注图像以及深度图像自动得到点云图像,从而基于点云图像构建图像数据库,提高构建图像数据库的效率,减少构建图像数据库的人力成本。
图9为本公开实施例四提供的一种图像生成方法的流程示意图,如图9所示,本实施例提供的图像生成方法的执行主体为图像生成装置,该图像生成装置位于电子设备中,则本实施例提供的图像生成方法包括以下步骤:
步骤91,获取标注图像及标注图像对应的深度图像,标注图像为二维图像。
本实施例中,获取标注图像以及标注图像对应的深度图像,其中,标注图像为二维图像,预先在二维图像中标注出物体所在区域形成标注图像,标注图像中至少包括一个物体标注区域。标注图像与深度图像一一对应,标注图像与深度图像的像素点之间具有一对一的对应关系。其中,深度图像(英文:depth image)也被称为距离影像(英文:range image),是指将从图像采集器到场景中各点的距离(深度)作为像素值的图像,它直接反映了景物可见表面的几何形状。
步骤92,根据标注图像及对应的深度图像确定物体标注区域的深度信息。
本实施例中,根据标注图像以及标注图像对应深度图像确定物体标注区域的深度信息,深度信息为深度值,具体地,深度信息由标注像素信息确定。
步骤93,基于物体标注区域的深度信息,生成物体的点云图像。
本实施例中,基于物体标注区域的深度信息生成物体的点云图像,物体的点云图像可作为神经网络模型的训练样本。
步骤94,根据物体的点云图像构建图像数据库。
本实施例中,根据物体的点云图像构建图像数据库,用于作为神经网络样本数据库,采用物体的点云图像训练相应的神经网络模型。
可选的,图像数据库包括上述的实例数据库中的实例。
本实施例中,获取标注图像以及标注图像对应的深度图像,根据标注图像以及对应的深度图像确定物体标注区域的深度信息,基于物体标注区域的深度信息,生成物体的点云图像,从而根据物体的点云图像构建图像数据库。图像数据库用于为后续抓取深度学习提供训练样本,无需人工收集样本数据,能够基于标注图像以及深度图像自动得到点云图像,从而基于点云图像构建图像数据库,提高构建图像数据库的效率,减少构建图像数据库的人力成本。
在本实施例中,可以通过多种方式来确定物体标注区域的深度信息,从而获得点云图像,在本实施例中以语义分割和实例分割为例,来进行详细说明,对于其他的实现方式,本实施例不做特别限制。
图10为本公开实施例四提供的另一种图像生成方法的流程示意图,在一实施例的基础上,本实施例以语义分割为例,来说明确定物体标注区域的深度信息的实现方式,具体包括以下步骤:
步骤101,确定语义分割标注图像对应的物体标注区域。
本实施例中,标准图像包括语义分割标注图像,其中,语义分割是指将图像中的每个像素归于类标签的过程,将语义分割认为是像素级别的图像分类,属于同一类的像素都要被归为一类,因此语义分割是从像素级别来理解图像的。语义分割标注图像为经语义分割的标注图像。语义分割标注图像包括2D语义分割标注图像以及3D语义分割标注图像。
可选地,生成标注图像的方式可以为以下任意一种,二维图像经2D语义分割生成2D语义分割标注图像,二维图像经3D语义分割生成3D语义分割标注图像,二维图像经2D实例分割生成2D实例分割标注图像。
二维图像经2D语义分割生成2D语义分割标注图像:预先对二维图像中的物体不可抓区域、物体被压叠区域及物体未被压区域进行标注,为了区分不同的区域,可以将不同区域填充不同的颜色加以区分,或将不同区域填充不同的图案加以区分。参见图11,图11为一种以颜色区分不同区域的标注方式,针对图中表层各物体分别标注出物体不可抓区域和/或物体被压叠区域和/或物体未被压区域。在二维图像中红色表示为某物体的物体不可抓区域,即Not Graspable,绿色表示为物体被压叠区域,即Graspable,黄色为物体未被压区域,即Overlap,标注后的二维图像为2D语义分割标注图像,物体不可抓区域、物体被压叠区域及物体未被压区域为物体标注区域。
二维图像经3D语义分割生成3D语义分割标注图像:预先对二维图像中的被压叠物体、未被压叠物体、轻微压叠物体及无法判断是否被压叠的物体进行标注,为了区分不同的物体,可以将不同物体填充不同的颜色加以区分,或将不同物体填充不同的图案加以区分。参见图12,图12为另一种以颜色区分不同物体的标注方式,针对图中表层全部物体进行标注,确定物体是属于被压叠物体或未被压叠物体或轻微压叠物体或无法判断是否被压叠的物体。在二维图像中红色表示为被压叠物体,即Overlap,蓝色表示为未被压叠物体,即Non-Overlap,绿色为肉眼无法判断是否被压叠的物体或不影响吸取的轻微压叠物体,即Uncertain,标注后的二维图像为3D语义分割标注图像。其中,未被压物体所在区域为物体标注区域,和/或轻微压叠物体所在区域为物体标注区域。
步骤102,根据语义分割标注图像对应的物体标注区域及语义分割标注图像对应的深度图像确定物体标注区域的深度信息。
本实施例中,语义分割标注图像与深度图像一一对应,根据语义分割标注图像对应的物体标注区域及语义分割标注图像对应的深度图像确定物体标注区域的深度信息,即深度值。
本实施例中,根据语义分割标注图像对应的物体标注区域及深度图像能够准确确定物体标注区域的深度信息。
进一步地,图13为本公开实施例四提供的又一种图像生成方法的流程示意图,在一实施例的基础上,对步骤101进行了进一步细化,具体包括以下步骤:
步骤101a,获取语义分割标注图像对应的标注文件,解析标注文件获取标注像素信息。
本实施例中,获取语义分割标注图像对应的标注文件,标注文件中记录了标注的物体所在的区域的像素信息即标注像素信息,解析标注文件获取标注像素信息。
步骤101b,根据标注像素信息确定语义分割标注图像对应的物体标注区域。
本实施例中,根据标注像素信息确定语义分割标注图像中的物体标注区域,其中,标注像素信息包括标注像素位置,根据标注像素位置能够确定语义分割标注图像中的物体标注区域。
本实施例中,基于预先记录的标注文件能够能够准确识别到图像中的物体标注区域。
进一步地,图14为本公开实施例四提供的还一种图像生成方法的流程示意图,在一实施例的基础上,对步骤102进行了进一步细化,具体包括以下步骤:
步骤102a,根据物体标注区域在标注图像中的第一位置,确定深度图像中与第一位置对应的第二位置。
本实施例中,获取语义分割标注图像对应的物体标注区域在语义分割标注图像中的第一位置,第一位置为语义分割标注图像对应的物体标注区域的标注像素位置,根据语义分割标注图像对应的物体标注区域在语义分割标注图像中的第一位置确定语义分割标注图像对应的深度图像中与第一位置对应的第二位置,第二位置为深度图像对应的物体标注区域的标注像素位置,第一位置与第二位置对应。
步骤102b,将深度图像中第二位置的深度信息确定为物体标注区域的深度信息。
本实施例中,将语义分割标注图像对应的深度图像中第二位置的深度信息确定为物体标注区域的深度信息。
本实施例中,标注图像与深度图像一一对应,根据标注图像与深度图像一一对应能够准确确定深度图像中与第一位置对应的第二位置,从而得到深度信息。
图15为本公开实施例四提供的再一种图像生成方法的流程示意图,在一实施例的基础上,本实施例以实例分割为例,来说明确定物体标注区域的深度信息的实现方式,具体包括以下步骤:
步骤151,确定实例分割标注图像对应的物体标注区域。
本实施例中,标准图像包括实例分割标注图像,其中,实例分割不但要进行像素级别的分类,还需在具体的类别基础上区别开不同的实例。实例分割标注图像为经实例分割的标注图像。
二维图像经2D实例分割生成2D实例分割标注图像:预先对二维图像中的被压叠物体、未被压叠物体、轻微压叠物体及无法判断是否被压叠的物体进行标注,为了区分不同的物体,可以将不同物体填充不同的颜色加以区分,或将不同物体填充不同的图案加以区分。
参见图16,图16为再一种以颜色区分不同物体的标注方式,针对图中表层全部物体进行标注,确定物体是属于被压叠物体或未被压叠物体或轻微压叠物体或无法判断是否被压叠的物体。在二维图像中红色表示为被压叠物体,蓝色表示为未被压叠物体,绿色为肉眼无法判断是否被压叠的物体或不影响吸取的轻微压叠物体,标注后的二维图像为2D实例分割标注图像,其中,未被压物体所在区域为物体标注区域,和/或轻微压叠物体所在区域为物体标注区域。
步骤152,根据实例分割标注图像对应的物体标注区域及实例分割标注图像对应的深度图像确定物体标注区域的深度信息。
本实施例中,实例分割标注图像与深度图像一一对应,根据实例分割标注图像对应的物体标注区域及实例分割标注图像对应的深度图像确定物体标注区域的深度信息,即深度值。
本实施例中,根据分割分割标注图像对应的物体标注区域及深度图像能够准确确定物体标注区域的深度信息。
进一步地,在一实施例的基础上,对步骤151进行了进一步细化,具体包括以下步骤:
步骤151a,获取实例分割标注图像对应的标注文件,解析标注文件获取标注像素信息。
本实施例中,获取实例分割标注图像对应的标注文件,标注文件中记录了标注的物体所在的区域的像素信息即标注像素信息,解析标注文件获取标注像素信息。
步骤151b,根据标注像素信息确定实例分割标注图像对应的物体标注区域。
本实施例中,根据标注像素信息确定实例分割标注图像中的物体标注区域,其中,标注像素信息包括标注像素位置,根据标注像素位置能够确定实例分割标注图像中的物体标注区域。
本实施例中,基于预先记录的标注文件能够能够准确识别到图像中的物体标注区域。
进一步地,在一实施例的基础上,对步骤152进行了进一步细化,具体包括以下步骤:
步骤152a,根据物体标注区域在标注图像中的第一位置,确定深度图像中与第一位置对应的第二位置。
本实施例中,获取实例分割标注图像对应的物体标注区域在实例分割标注图像中的第一位置,第一位置为实例分割标注图像对应的物体标注区域的标注像素位置,根据实例分割标注图像对应的物体标注区域在实例分割标注图像中的第一位置确定实例分割标注图像对应的深度图像中与第一位置对应的第二位置,第二位置为深度图像对应的物体标注区域的标注像素位置,第一位置与第二位置对应。
步骤152b,将深度图像中第二位置的深度信息确定为物体标注区域的深度信息。
本实施例中,将实例分割标注图像对应的深度图像中第二位置的深度信息确定为物体标注区域的深度信息。
本实施例中,标注图像与深度图像一一对应,根据标注图像与深度图像一一对应能够准确确定深度图像中与第一位置对应的第二位置,从而得到深度信息。
进一步地,图17为本公开实施例四提供的又一种图像生成方法的流程示意图,在一实施例的基础上,对步骤93进行了进一步细化,具体包括以下步骤:
步骤931,获取相机内参,并根据相机内参以及深度信息生成物体标注区域的点云信息。
本实施例中,获取相机内参,其中,相机内参包括相机焦距以及相机光轴在图像坐标系中的偏移量,其中,相机焦虑的单位为像素,相机光轴在图像坐标系中的偏移量的单位为像素。进一步根据相机内参以及物体标注区域的深度信息生成物体标注区域的点云信息。
步骤932,根据物体标注区域的点云信息,生成点云图像。
本实施例中,根据物体标注区域的点云信息,生成点云图像,采用物体的点云图像训练相应的神经网络模型。
本实施例中,无需用户收集样本数据,能够自动得到点云图像,从而基于点云图像构建图像数据库,减少了用户的工作量。
进一步地,图18为本公开实施例四提供的又一种图像生成方法的流程示意图,在一实施例的基础上,对步骤94进行了进一步细化,具体包括以下步骤:
步骤941,获取标注图像的标注类型,并根据标注类型建立对应的图像子数据库。
本实施例中,获取标注图像的标注类型,其中,标注图像的标注类型包括语义分割类图像以及实例分割类图像,根据标注类型建立对应的图像子数据库,如至少建立一个语义分割类图像子数据库以及至少建立一个实例分割类图像子数据库。
步骤942,建立标注图像及对应的物体的点云图像之间的对应关系。
本实施例中,建立各标注图像以及各标注图像对应物体的点云图像之间的对应关系,将各标注图像以及各标注图像对应物体的点云图像之间关联起来。
步骤943,将同一标注类型的标注图像及对应的点云图像存储至相同的图像子数据库。
本实施例中,为了便于查找图像,将同一类型的标注图像以及对应的点云图像存储在同一图像子数据库,将语义分割类图像的标注图像及对应的点云图像存储至相同的语义分割类图像子数据库中,将实例分割类图像的标注图像及对应的点云图像存储至相同的实例分割类图像子数据库中。
本实施例中,为了便于后续神经网络模型的训练,将同一类型的图像以及对应的点云图像存储至同一图像子数据库。
进一步地,在一实施例的基础上,对步骤94进行了进一步细化,具体包括以下步骤:
步骤94a,根据物体的点云图像确定对应的物体位姿信息。
本实施例中,根据物体的点云图像确定对应的物体位姿信息,物体位姿信息为物体在坐标系下的位置和姿态。位姿估计在计算机视觉领域十分重要的角色,在使用视觉传感器估计机器人位姿进行控制、机器人导航等方面都有着极大的应用。
步骤94b,基于物体的点云图像及对应的物体位姿信息构建图像数据库。
本实施例中,基于物体的点云图像及对应的物体位姿信息构建图像数据库,将物体的点云图像及对应的物体位姿信息存储至图像数据库中。图像数据库用于为后续抓取深度学习提供训练样本,无需人工收集样本数据,能够基于标注图像以及深度图像自动得到点云图像,从而基于物体的点云图像及对应的物体位姿信息构建图像数据库,提高构建图像数据库的效率,减少构建图像数据库的人力成本。
实施例五
图19为本公开实施例五提供的一种图像生成装置的结构示意图,如图19所示,该装置包括:
获取模块191,用于获取待合成图像,待合成图像中的物体至少包括真实物体;
获取模块191,还用于获取实例组,实例组包括至少一个第一实例,第一实例包括单个物体的图像,单个物体的图像包括基于标注图像和标注图像对应的深度图像构建的点云图像;
合成模块192,用于基于待合成图像和实例组,合成获得虚拟图像,虚拟图像中的物体包括待合成图像中的物体和实例组对应的物体。
根据本公开的一个方面,在获取实例组之前,获取模块191还用于:
获取标注图像及深度图像,标注图像为二维图像;
确定模块,用于:
根据标注图像及对应的深度图像确定物体标注区域的深度信息;
基于物体标注区域的深度信息,生成物体的点云图像;
根据物体的点云图像构建图像数据库。
根据本公开的一个方面,标注图像包括:语义分割标注图像;
确定模块,根据标注图像及对应的深度图像确定物体标注区域的深度信息,用于:
确定语义分割标注图像对应的物体标注区域;
根据语义分割标注图像对应的物体标注区域及语义分割标注图像对应的深度图像确定物体标注区域的深度信息;
或,标注图像包括:实例分割标注图像;
确定模块,根据标注图像及对应的深度图像确定物体标注区域的深度信息,用于:
确定实例分割标注图像对应的物体标注区域;
根据实例分割标注图像对应的物体标注区域及实例分割标注图像对应的深度图像确定物体标注区域的深度信息。
根据本公开的一个方面,确定模块,根据语义分割标注图像对应的物体标注区域及语义分割标注图像对应的深度图像确定物体标注区域的深度信息,用于:
根据物体标注区域在标注图像中的第一位置,确定深度图像中与第一位置对应的第二位置;
将深度图像中第二位置的深度信息确定为物体标注区域的深度信息。
根据本公开的一个方面,确定模块,用于:
获取相机内参,并根据相机内参以及深度信息生成物体标注区域的点云信息;
根据物体标注区域的点云信息,生成点云图像。
根据本公开的一个方面,图像数据库包括:多个图像子数据库;
确定模块根据物体的点云图像构建图像数据库,用于:
获取标注图像的标注类型,并根据标注类型建立对应的图像子数据库;
建立标注图像及对应的物体的点云图像之间的对应关系;
将同一标注类型的标注图像及对应的点云图像存储至相同的图像子数据库。
根据本公开的一个方面,确定模块根据物体的点云图像构建图像数据库,用于:
根据物体的点云图像确定对应的物体位姿信息;
基于物体的点云图像及对应的物体位姿信息构建图像数据库。
根据本公开的一个方面,合成模块192,用于:
确定实例组对应的物体在待合成图像中的放置位置;
检测实例组对应的物体在放置位置下,与周边的其它物体是否发生碰撞;
若未发生碰撞,则通过将实例组对应的物体放置在待合成图像中的放置位置,合成获得虚拟图像;
若发生碰撞,则重新执行确定实例组对应的物体在待合成图像中的放置位置的步骤,直至碰撞检测的次数达到预设的第一阈值时,重新执行获取实例组的步骤,其中重新获取的实例组不同于之前获取的实例组。
根据本公开的一个方面,获取实例组包括从实例数据库中获取实例组,实例数据库包括多个实例。
根据本公开的一个方面,确定模块,确定实例组对应的物体在待合成图像中的放置位置,用于:
根据待合成图像的图像内容中放置的物体的平均高度,获得实例组对应的物体的放置高度,放置高度基于平均高度和预定的波动范围确定;
在放置高度对应的平面上随机选取位置,作为实例组对应的物体在待合成图像中的放置位置。
根据本公开的一个方面,获取模块191,获取待合成图像,用于:
获取采集得到的真实图像,作为待合成图像;或者,
将历史合成获得的虚拟图像,作为待合成图像。
根据本公开的一个方面,虚拟图像的图像内容中的物体数量不超过预定的第二阈值;
在基于待合成图像和实例组,合成获得虚拟图像之前,确定模块,还用于:
判断待合成图像中的物体数量与实例组的数量之和是否超过第二阈值;
若数量之和超过第二阈值,则调整实例组的数量,使得调整后的数量之和不超过第二阈值。
根据本公开的一个方面,在基于待合成图像和实例组,合成获得虚拟图像之后,确定模块,还用于:
统计虚拟图像的图像内容中放置的物体数量;
若物体数量未超过预定的第二阈值,则将虚拟图像作为当前的待合成图像,再次执行获取实例组的步骤,直至当前获得的虚拟图像的图像内容中放置的物体数量达到第二阈值,或者当前实例数据库中的实例被遍历结束。
根据本公开的一个方面,第一实例还包括单个物体对应的标注,标注包括点云数据和夹爪标注。
实施例六
图20为本公开实施例五提供的一种模型训练装置的结构示意图,如图20所示,该装置包括:
采用如权图像生成方法中任一项的图像生成方法获得的虚拟图像作为训练样本,训练装置包括:
获取模块2001,用于针对模型的首次训练,选取采集得到的真实图像以及真实图像对应的标注,作为训练样本对模型进行训练;其中,真实图像的图像内容中放置的物体均为真实物体;
获取模块2001,还用于针对模型的非首次训练,从采集得到的真实图像和合成获得的虚拟图像中,随机选取图像以及图像对应的标注作为训练样本,对模型进行训练。
根据本公开的一个方面,模型用于根据输入的待处理图像,生成待处理图像中的物体对应的夹爪。
实施例七
本公开还提供一种计算机存储介质,包括存储于其上的计算机可执行指令,所述可执行指令在被处理器执行时实施如上所述的图像生成方法或模型训练方法。
本公开还涉及一种计算机存储介质,包括存储于其上的计算机可执行指令,所述可执行指令在被处理器2101执行时实施如上所述的图像生成方法或模型训练方法。
本公开还提供一种计算机程序,当所述计算机程序被处理器执行时,用于执行上述任一项所述的图像生成方法或模型训练方法。
图21为本公开实施例五提供的一种电子设备的结构示意图,如图21所示,该电子设备包括:
一个或多个处理器2101(图中以一个为例),该一个或多个处理器2101通过系统总线与存储器2102连接;
存储器2102,用于存储可执行指令,所述可执行指令在被所述一个或多个处理器2101执行时,实施如上所述的图像生成方法或模型训练方法。
图22为本公开实施例五提供的另一种电子设备的结构示意图,如图22所示,该电子设备800可以是计算机,数字广播终端,消息收发设备,平板设备,个人数字助理,服务器,服务器集群等。
电子设备可以包括以下一个或多个组件:处理组件802,存储器804,电源组件806,多媒体组件808,音频组件810,输入/输出(I/ O)接口812,传感器组件814,以及通信组件816。
处理组件802通常控制电子设备的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理组件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。
存储器804被配置为存储各种类型的数据以支持在电子设备的操作。这些数据的示例包括用于在电子设备上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件806为电子设备的各种组件提供电力。电源组件806可以包括电源管理系统,一个或多个电源,及其他与为电子设备生成、管理和分配电力相关联的组件。
多媒体组件808包括在电子设备和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。当电子设备处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当电子设备处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。
I/ O接口812为处理组件802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件814包括一个或多个传感器,用于为电子设备提供各个方面的状态评估。例如,传感器组件814可以检测到电子设备的打开/关闭状态,组件的相对定位,例如组件为电子设备的显示器和小键盘,传感器组件814还可以检测电子设备或电子设备一个组件的位置改变,用户与电子设备接触的存在或不存在,电子设备方位或加速/减速和电子设备的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件816被配置为便于电子设备和其他设备之间有线或无线方式的通信。电子设备可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信组件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,电子设备2200可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述图像生成方法或模型训练方法。
本实施例还提供一种芯片,所述芯片包括存储器、处理器,所述存储器中存储代码和数据,所述存储器与所述处理器耦合,所述处理器运行所述存储器中的程序使得所述芯片用于执行上述各种实施方式提供的图像生成方法或模型训练方法。
本实施例还提供一种计算机程序,当所述计算机程序被处理器执行时,用于执行前述各种实施方式提供的图像生成方法或模型训练方法。
最后应说明的是:以上所述仅为本公开的优选实施例而已,并不用于限制本公开,尽管参照前述实施例对本公开进行了详细的说明,对于本领域的技术人员来说,其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换。凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。
Claims (18)
- 一种图像生成方法,其特征在于,包括:获取待合成图像,所述待合成图像中的物体至少包括真实物体;获取实例组,所述实例组包括至少一个第一实例,所述第一实例包括单个物体的图像,所述单个物体的图像包括基于标注图像和所述标注图像对应的深度图像构建的点云图像;基于所述待合成图像和所述实例组,合成获得虚拟图像,所述虚拟图像中的物体包括所述待合成图像中的物体和所述实例组对应的物体。
- 根据权利要求1所述的方法,其特征在于,在所述获取实例组之前,所述方法还包括:获取所述标注图像及所述深度图像,所述标注图像为二维图像;根据所述标注图像及对应的深度图像确定物体标注区域的深度信息;基于所述物体标注区域的深度信息,生成物体的点云图像;根据所述物体的点云图像构建图像数据库。
- 根据权利要求2所述的方法,其特征在于,所述标注图像包括:语义分割标注图像;所述根据所述标注图像及对应的深度图像确定物体标注区域的深度信息,包括:确定语义分割标注图像对应的物体标注区域;根据所述语义分割标注图像对应的物体标注区域及所述语义分割标注图像对应的深度图像确定所述物体标注区域的深度信息;或,所述标注图像包括:实例分割标注图像;所述根据所述标注图像及对应的深度图像确定物体标注区域的深度信息,包括:确定实例分割标注图像对应的物体标注区域;根据所述实例分割标注图像对应的物体标注区域及所述实例分割标注图像对应的深度图像确定所述物体标注区域的深度信息。
- 根据权利要求3所述的方法,其特征在于,所述根据所述语义分割标注图像对应的物体标注区域及所述语义分割标注图像对应的深度图像确定所述物体标注区域的深度信息,包括:根据所述物体标注区域在所述标注图像中的第一位置,确定所述深度图像中与所述第一位置对应的第二位置;将所述深度图像中第二位置的深度信息确定为所述物体标注区域的深度信息。
- 根据权利要求2所述的方法,其特征在于,所述基于所述物体标注区域的深度信息,生成物体的点云图像,包括:获取相机内参,并根据所述相机内参以及所述深度信息生成所述物体标注区域的点云信息;根据所述物体标注区域的点云信息,生成所述点云图像。
- 根据权利要求2所述的方法,其特征在于,所述图像数据库包括:多个图像子数据库;所述根据所述物体的点云图像构建图像数据库,包括:获取标注图像的标注类型,并根据标注类型建立对应的图像子数据库;建立标注图像及对应的物体的点云图像之间的对应关系;将同一标注类型的标注图像及对应的点云图像存储至相同的图像子数据库。
- 根据权利要求2所述的方法,其特征在于,所述根据所述物体的点云图像构建图像数据库,包括:根据所述物体的点云图像确定对应的物体位姿信息;基于所述物体的点云图像及所述对应的物体位姿信息构建图像数据库。
- 根据权利要求1所述的方法,其特征在于,所述基于所述待合成图像和所述实例组,合成获得虚拟图像,包括:确定所述实例组对应的物体在所述待合成图像中的放置位置;检测所述实例组对应的物体在所述放置位置下,与周边的其它物体是否发生碰撞;若未发生碰撞,则通过将所述实例组对应的物体放置在所述待合成图像中的所述放置位置,合成获得所述虚拟图像;若发生碰撞,则重新执行所述确定所述实例组对应的物体在所述待合成图像中的放置位置的步骤,直至碰撞检测的次数达到预设的第一阈值时,重新执行获取实例组的步骤,其中重新获取的实例组不同于之前获取的实例组。
- 根据权利要求1所述的方法,其特征在于,所述获取实例组包括从实例数据库中获取实例组,所述实例数据库包括多个实例。
- 根据权利要求8所述的方法,其特征在于,所述确定所述实例组对应的物体在所述待合成图像中的放置位置,包括:根据所述待合成图像的图像内容中放置的物体的平均高度,获得所述实例组对应的物体的放置高度,所述放置高度基于所述平均高度和预定的波动范围确定;在所述放置高度对应的平面上随机选取位置,作为所述实例组对应的物体在所述待合成图像中的放置位置。
- 根据权利要求1所述的方法,其特征在于,所述获取待合成图像,包括:获取采集得到的真实图像,作为所述待合成图像;或者,将历史合成获得的虚拟图像,作为所述待合成图像。
- 根据权利要求1所述的方法,其特征在于,所述虚拟图像的图像内容中的物体数量不超过预定的第二阈值;在所述基于所述待合成图像和所述实例组,合成获得虚拟图像之前,还包括:判断所述待合成图像中的物体数量与所述实例组的数量之和是否超过所述第二阈值;若所述数量之和超过所述第二阈值,则调整所述实例组的数量,使得调整后的数量之和不超过所述第二阈值。
- 根据权利要求11所述的方法,其特征在于,在所述基于所述待合成图像和所述实例组,合成获得虚拟图像之后,还包括:统计所述虚拟图像的图像内容中放置的物体数量;若所述物体数量未超过预定的第二阈值,则将所述虚拟图像作为当前的待合成图像,再次执行所述获取实例组的步骤,直至当前获得的虚拟图像的图像内容中放置的物体数量达到所述第二阈值,或者当前实例数据库中的实例被遍历结束。
- 根据权利要求1-13任一项所述的方法,其特征在于,所述第一实例还包括单个物体对应的标注,所述标注包括点云数据和夹爪标注。
- 一种模型训练方法,其特征在于,采用如权利要求1-14中任一项所述的图像生成方法获得的虚拟图像作为训练样本,所述训练方法包括:针对模型的首次训练,选取采集得到的真实图像以及所述真实图像对应的标注,作为训练样本对模型进行训练;其中,所述真实图像的图像内容中放置的物体均为真实物体;针对所述模型的非首次训练,从采集得到的真实图像和合成获得的虚拟图像中,随机选取图像以及所述图像对应的标注作为训练样本,对所述模型进行训练。
- 根据权利要求15所述的模型训练方法,其特征在于,所述模型用于根据输入的待处理图像,生成所述待处理图像中的物体对应的夹爪。
- 一种计算机存储介质,其特征在于,包括存储于其上的计算机可执行指令,所述可执行指令在被处理器执行时实施如权利要求1-16中任一项所述的方法。
- 一种程序产品,其特征在于,包括:计算机程序,当所述程序产品在计算机上运行时,使得所述计算机执行上述权利要求1至16任一项所述的方法。
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210780030.3 | 2022-07-04 | ||
CN202210780030.3A CN115082795A (zh) | 2022-07-04 | 2022-07-04 | 虚拟图像的生成方法、装置、设备、介质及产品 |
CN202211003504.XA CN115408544A (zh) | 2022-08-19 | 2022-08-19 | 图像数据库构建方法、装置、设备、存储介质及产品 |
CN202211003504.X | 2022-08-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024008081A1 true WO2024008081A1 (zh) | 2024-01-11 |
Family
ID=89454408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/105730 WO2024008081A1 (zh) | 2022-07-04 | 2023-07-04 | 图像生成方法和模型训练方法 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024008081A1 (zh) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583509A (zh) * | 2018-12-12 | 2019-04-05 | 南京旷云科技有限公司 | 数据生成方法、装置及电子设备 |
CN111161387A (zh) * | 2019-12-31 | 2020-05-15 | 华东理工大学 | 堆叠场景下合成图像的方法及系统、存储介质、终端设备 |
CN111784774A (zh) * | 2020-07-06 | 2020-10-16 | 北京京东乾石科技有限公司 | 目标检测方法、装置、计算机可读介质及电子设备 |
CN112132213A (zh) * | 2020-09-23 | 2020-12-25 | 创新奇智(南京)科技有限公司 | 样本图像的处理方法及装置、电子设备、存储介质 |
US20210201077A1 (en) * | 2019-12-31 | 2021-07-01 | Plus One Robotics, Inc. | Systems and methods for creating training data |
CN113112504A (zh) * | 2021-04-08 | 2021-07-13 | 浙江大学 | 一种植物点云数据分割方法及系统 |
CN115082795A (zh) * | 2022-07-04 | 2022-09-20 | 梅卡曼德(北京)机器人科技有限公司 | 虚拟图像的生成方法、装置、设备、介质及产品 |
CN115408544A (zh) * | 2022-08-19 | 2022-11-29 | 梅卡曼德(北京)机器人科技有限公司 | 图像数据库构建方法、装置、设备、存储介质及产品 |
-
2023
- 2023-07-04 WO PCT/CN2023/105730 patent/WO2024008081A1/zh unknown
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109583509A (zh) * | 2018-12-12 | 2019-04-05 | 南京旷云科技有限公司 | 数据生成方法、装置及电子设备 |
CN111161387A (zh) * | 2019-12-31 | 2020-05-15 | 华东理工大学 | 堆叠场景下合成图像的方法及系统、存储介质、终端设备 |
US20210201077A1 (en) * | 2019-12-31 | 2021-07-01 | Plus One Robotics, Inc. | Systems and methods for creating training data |
CN111784774A (zh) * | 2020-07-06 | 2020-10-16 | 北京京东乾石科技有限公司 | 目标检测方法、装置、计算机可读介质及电子设备 |
CN112132213A (zh) * | 2020-09-23 | 2020-12-25 | 创新奇智(南京)科技有限公司 | 样本图像的处理方法及装置、电子设备、存储介质 |
CN113112504A (zh) * | 2021-04-08 | 2021-07-13 | 浙江大学 | 一种植物点云数据分割方法及系统 |
CN115082795A (zh) * | 2022-07-04 | 2022-09-20 | 梅卡曼德(北京)机器人科技有限公司 | 虚拟图像的生成方法、装置、设备、介质及产品 |
CN115408544A (zh) * | 2022-08-19 | 2022-11-29 | 梅卡曼德(北京)机器人科技有限公司 | 图像数据库构建方法、装置、设备、存储介质及产品 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10854006B2 (en) | AR-enabled labeling using aligned CAD models | |
US20200012888A1 (en) | Image annotating method and electronic device | |
US10339715B2 (en) | Virtual reality system | |
US9639988B2 (en) | Information processing apparatus and computer program product for processing a virtual object | |
US20210383096A1 (en) | Techniques for training machine learning | |
CN107301377B (zh) | 一种基于深度相机的人脸与行人感知系统 | |
CN108307214B (zh) | 用于控制装置的方法和设备 | |
CN105657272A (zh) | 一种终端设备及其拍摄方法 | |
CN104081307A (zh) | 图像处理装置、图像处理方法和程序 | |
CN104079926B (zh) | 一种远程桌面软件的视频性能测试方法 | |
CN111060118B (zh) | 场景地图建立方法、设备及存储介质 | |
JP2013164697A (ja) | 画像処理装置、画像処理方法、プログラム及び画像処理システム | |
CN110275532B (zh) | 机器人的控制方法及装置、视觉设备的控制方法及装置 | |
Voulodimos et al. | A threefold dataset for activity and workflow recognition in complex industrial environments | |
CN104850835A (zh) | 扫描方法和扫描终端 | |
CN107194968A (zh) | 图像的识别跟踪方法、装置、智能终端和可读存储介质 | |
CN111433809B (zh) | 行进路线及空间模型生成方法、装置、系统 | |
CN107016004A (zh) | 图像处理方法及装置 | |
CN115278084A (zh) | 图像处理方法、装置、电子设备及存储介质 | |
CN115278014A (zh) | 一种目标跟踪方法、系统、计算机设备及可读介质 | |
CN104813650B (zh) | 用于捕获和显示图像的方法和装置 | |
WO2024008081A1 (zh) | 图像生成方法和模型训练方法 | |
CN112199997A (zh) | 一种终端及工具处理方法 | |
JP2019148940A (ja) | 学習処理方法、サーバ装置及び反射検知システム | |
CN109981967A (zh) | 用于智能机器人的拍摄方法、装置、终端设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23834848 Country of ref document: EP Kind code of ref document: A1 |