CN114998166A

CN114998166A - Image data enhancement device and method

Info

Publication number: CN114998166A
Application number: CN202110223620.1A
Authority: CN
Inventors: 吴俊樟; 陈世泽
Original assignee: Realtek Semiconductor Corp
Current assignee: Realtek Semiconductor Corp
Priority date: 2021-03-01
Filing date: 2021-03-01
Publication date: 2022-09-02

Abstract

The invention provides an image data enhancement device, which comprises a memory and a processor. The memory is used for storing a plurality of instructions and a plurality of images; the processor is connected with the memory and used for reading a plurality of images, loading and executing a plurality of instructions so as to: identifying at least one object-related image including at least one object from the plurality of images; cutting out at least one object image from the at least one object-related image; and superposing at least one object image to a plurality of arbitrary positions in the plurality of images to generate a plurality of training sample images, and performing machine learning by using the plurality of training sample images. In addition, an image data enhancement method is also provided.

Description

Image data enhancement device and method

Technical Field

The present invention relates to data enhancement technology, and more particularly, to an apparatus and method for enhancing image data.

Background

In the prior art, when training a machine learning (machine learning) model, the most important thing is the integrity of the training data, except that the trained model architecture affects the recognition and identification capability. For the same model architecture, the more elements and more complete the training database, the higher the discriminative power of the model will typically be (i.e., the higher the accuracy). However, in practice, it is often impossible to collect a diversified and highly complete database because of limited manpower and data. Therefore, it is important to be able to adopt a data enhancement (DA) method for generating data by itself, and selecting an appropriate DA method will effectively enhance the performance of training data by the model.

In general, when training an image recognition model for object recognition, there are often a sufficient number of color images to be used as training data. However, when the image is taken at night, the image with the color information removed is often taken by using a night vision device. At this time, if the image captured by the night vision device is identified or detected by using the model trained by the color image, the accuracy of the identification is often not good. It should be noted that the number of images including a specific object captured at night is often insufficient (for example, the number of human-shaped images recognized at night is small), which also affects the recognition accuracy. Alternatively, when the number of images including the specific object stored in the database is insufficient, the recognition accuracy may also be poor by using the recognition model trained from the images stored in the database for recognition or detection.

In summary, how to train the recognition model of the images captured at night and how to solve the problem of insufficient number of images including the specific object stored in the database are problems that those skilled in the art will want to solve.

Disclosure of Invention

The embodiment of the invention provides an image data enhancement device, which comprises a memory and a processor. The memory is used for storing a plurality of instructions and a plurality of images; the processor is connected with the memory and used for reading a plurality of images, loading and executing a plurality of instructions so as to: identifying at least one object-related image including at least one object from the plurality of images; cutting out at least one object image from the at least one object-related image; and superposing at least one object image to a plurality of arbitrary positions in the plurality of images to generate a plurality of training sample images, and performing machine learning by using the plurality of training sample images.

The embodiment of the invention provides an image data enhancement method, which comprises the following steps: cutting at least one object image from at least one object-related image including at least one object in the plurality of images; and superposing at least one object image to a plurality of arbitrary positions in the plurality of images to generate a plurality of training sample images, and performing machine learning by using the plurality of training sample images.

The embodiment of the invention provides an image data enhancement device, which comprises a memory and a processor. The memory is used for storing a plurality of instructions and a plurality of images; the processor is connected with the memory and used for reading a plurality of images, loading and executing a plurality of instructions so as to: identifying at least one object-related image including at least one object from the plurality of images; determining whether the number of at least one object-related image in the plurality of images is not greater than an object image number threshold; when the number of the at least one object related image is not greater than the object image number threshold, cutting out at least one object image from the at least one object related image; and superposing at least one object image to a plurality of arbitrary positions in the plurality of images to generate a plurality of training sample images, and performing machine learning by using the plurality of training sample images.

The embodiment of the invention provides an image data enhancement method, which comprises the following steps: judging whether the number of at least one object related image including at least one object in the plurality of images is not greater than an object image number threshold value; when the number of at least one object related image including at least one object in the plurality of images is not greater than the object threshold value, cutting at least one object image from the at least one object related image; and superposing at least one object image to a plurality of arbitrary positions in the plurality of images to generate a plurality of training sample images, and performing machine learning by using the plurality of training sample images.

Based on the above, the embodiment of the invention can automatically or semi-automatically crop the object image from the pre-stored images to randomly tile the object image to the image obtained by the night vision device. Therefore, the problem of poor identification capability of the images shot by the night vision device in the prior art can be solved. In addition, the problem of poor recognition capability of the recognition model when the number of the stored images containing the specific object is insufficient can be solved.

Drawings

In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.

FIG. 1 is a block diagram of an image data enhancement device according to an embodiment of the present invention.

Fig. 2 is a flowchart of an image data enhancement method according to some exemplary embodiments of the invention.

Fig. 3A-3B are schematic diagrams of infrared images according to some exemplary embodiments of the invention.

Fig. 4A-4B are schematic diagrams of object-related images according to some exemplary embodiments of the invention.

Fig. 4C is a schematic diagram of a background image according to some exemplary embodiments of the invention.

Fig. 4D-4E are schematic diagrams of training sample images according to some exemplary embodiments of the invention.

Fig. 5A to 5C are schematic views of error images according to other exemplary embodiments of the invention.

FIG. 6 is a flowchart illustrating an image data enhancement method according to further exemplary embodiments of the present invention.

Detailed Description

FIG. 1 is a block diagram of an image data enhancement device according to an embodiment of the present invention. Referring to fig. 1, the image data enhancement device 100 may include a memory 110 and a processor 120. In some embodiments, the image data enhancement device 100 is, for example, an internet-enabled electronic device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., and is not particularly limited. In addition, the memory 110 can be used for storing a plurality of instructions and a plurality of images. Furthermore, the processor 120 is connected to the memory 110 and configured to read the images, and further load and execute the instructions.

In some embodiments, the memory 110 is, for example, any type of Random Access Memory (RAM), read-only memory (ROM), flash memory (flash memory), hard disk, or the like, or a combination thereof.

In some embodiments, the instructions stored in the memory 110 may include an image recognition module 1101, an image processing module 1103, and a training module 1105. In some embodiments, the image stored in the memory 110 may be an image captured by a night-vision device (NVD) or an image captured by a general camera. Examples of the images captured by the night vision device include digital night vision (digital night vision) images, active infrared night vision (active infrared vision) images, and thermal imaging (thermal) images. The image captured by a general imaging device is, for example, a gray scale image, a color image, a multispectral image, or the like.

In some embodiments, the processor 120 is, for example, a Central Processing Unit (CPU), or other programmable general purpose or special purpose microprocessor (microprocessor), Digital Signal Processor (DSP), programmable controller, Application Specific Integrated Circuit (ASIC), or other similar devices or combinations thereof.

Further, the processor 120 may be communicatively coupled to the memory 110. With respect to the above-mentioned communication connection method, the processor 120 may be connected to the memory 110 in a wired or wireless manner, and is not particularly limited herein.

For the wired manner, the processor 120 may be connected for wired communication by using a Universal Serial Bus (USB), an RS232, a universal asynchronous receiver/transmitter (UART), an internal integrated circuit (I2C), a Serial Peripheral Interface (SPI), a display port (display port), a thunderbolt interface (thunderbolt), or a Local Area Network (LAN) interface, which is not particularly limited. For the wireless mode, the processor 120 may utilize a wireless fidelity (Wi-Fi) module, a Radio Frequency Identification (RFID) module, a bluetooth module, an infrared module, a near-field communication (NFC) module, or a device-to-device (D2D) module to perform wireless communication connection, and is not limited in particular. In this embodiment, the processor 120 can load the instructions from the memory 120 to perform the image data enhancement method according to the following embodiments of the present invention.

Various usage scenarios of the image data enhancement apparatus 100 according to the embodiment of the present invention are described below. Taking the example of training images captured by a night vision device as an example, after a plurality of images are captured by the night vision device, the memory 110 stores the captured images, and the image data enhancement device 100 performs data expansion by using the captured images, thereby increasing data for training the recognition model.

Fig. 2 is a flowchart of an image data enhancement method according to some exemplary embodiments of the invention. The method of the embodiment shown in fig. 2 is suitable for the image data enhancement apparatus 100 of fig. 1, but not limited thereto. For convenience and clarity, the following description will refer to fig. 1 and fig. 2 together to describe the detailed steps of the image data enhancement method shown in fig. 2 by using the operation relationship between the components in the image data enhancement device 100.

First, in step S201, the processor 110 may recognize at least one object-related image including at least one object from the plurality of images through the image recognition module 1101.

In other words, the processor 110 may read the image recognition module 1101 and the plurality of images from the memory 110 to recognize at least one object-related image from the plurality of images through the image recognition module 1101, wherein the at least one object-related image may include at least one object.

In some embodiments, the processor 110 may periodically or non-periodically perform the step S201.

It should be noted that the at least one object may be one or more specific objects that the user wants to recognize from the image, and the specific object is various types of objects such as a human figure, a car, or a house, and is not particularly limited to the object.

In addition, the images stored in the memory 110 may include a plurality of background images other than the at least one object-related image, wherein none of the background images includes at least one object. Furthermore, for images captured by night vision devices, the number of object-related images is usually much smaller than the number of background images (e.g., the number of background images is ten times the number of object-related images).

For example, fig. 3A-3B are schematic diagrams of infrared images according to some exemplary embodiments of the invention. Referring to fig. 3A, the image of fig. 3A is an infrared image (i.e., the background image) that does not include a human-shaped object. Referring to fig. 3B, the image of fig. 3B is an infrared image (i.e., the above-described object-related image) including an object OBJ (i.e., a human-shaped object).

Referring back to fig. 1 and 2, in some embodiments, the processor 110 may perform object recognition (object recognition) on the plurality of images through the image recognition module 1101. Accordingly, the processor 110 can recognize at least one object-related image including at least one object from the plurality of images through the image recognition module 1101.

In further embodiments, the processor 110 may execute any type of computer vision (computer vision) algorithm for object recognition via the image recognition module 1101. For example, the computer vision algorithm may be a region-based convolutional neural network (R-CNN) algorithm, a Single Shot Detection (SSD) algorithm, a yolo (you only look) algorithm, or the like, or other similar algorithms or combinations thereof.

Next, in step S203, the processor 110 can cut at least one object image from the at least one object-related image through the image processing module 1103.

In other words, the processor 110 can further read the image processing module 1102 from the memory 110, so as to perform object segmentation from at least one object-related image through the image processing module 1103, thereby cutting out at least one object image.

In some embodiments, the processor 110 can recognize at least one object position of a plurality of objects in at least one object-related image through the image recognition module 1101. Accordingly, the processor 110 can cut out at least one object image from at least one object related image according to at least one object position through the image processing module 1103.

In a further embodiment, the processor 110 may perform object localization (e.g., identify corresponding pixel coordinates (coordinates) of an object in the object-related image) on at least one object-related image through the image recognition module 1101. Accordingly, the processor 110 can recognize at least one object position of the plurality of objects from the at least one object-related image through the image processing module 1103 to cut out at least one object image from the at least one object position in the at least one object-related image.

In further embodiments, the processor 110 may also execute any type of computer vision algorithm described above via the image recognition module 1101 for object localization.

For example, fig. 4A-4B are schematic diagrams of object-related images according to some exemplary embodiments of the invention. Referring to fig. 4A and 4B, when the user intends to recognize the human-shaped object, the object position of the object OBJ1 in the object-related image IMG1 can be recognized, and the object position of the object OBJ2 in the object-related image IMG2 can be recognized. In this way, the object OBJ1 is cut out at the object position of the object OBJ1 in the object-related image IMG1, and the object OBJ2 is cut out at the object position of the object OBJ2 in the object-related image IMG 2. Thus, the object OBJ1 and the object OBJ2 can be regarded as object images.

Finally, referring back to fig. 1 and fig. 2, in step S205, the processor 110 may superimpose at least one object image at a plurality of arbitrary positions in the plurality of images through the image processing module 1103 to generate a plurality of training sample images, and perform machine learning by using the plurality of training sample images through the training module 1105.

In other words, the processor 110 may further superimpose at least one object image onto the plurality of images randomly through the image processing module 1103, so as to superimpose at least one object image at a plurality of arbitrary positions in the plurality of images, thereby generating a plurality of training sample images for training. Thus, the processor 110 can read the training module 1105 from the memory 110 to perform machine learning by using a plurality of training sample images through the training module 1105, thereby training an identification model corresponding to an image captured by the night vision device.

It is noted that the processor 110 may utilize any machine learning algorithm for machine learning through the training module 1105, and there is no particular limitation on the machine learning algorithm.

In some embodiments, the processor 110 may recognize a plurality of background images from the plurality of images through the image recognition module 1101, wherein the plurality of background images do not include a plurality of objects. Accordingly, the processor 110 can superimpose at least one object image on a plurality of arbitrary positions in a plurality of background images through the image processing module 1103 to generate a plurality of training sample images.

In further embodiments, the processor 110 may also execute any of the above-described computer vision algorithms via the image recognition module 1101 to recognize a plurality of background images from a plurality of images.

For example, fig. 4C is a schematic diagram of a background image according to some exemplary embodiments of the invention. Referring to fig. 4, the background image IMG3 does not include any human-shaped objects.

Referring back to fig. 1 and 2, in some embodiments, the processor 110 may randomly select at least one overlay image from at least one object image through the image processing module 1103, and overlay the at least one overlay image to a plurality of arbitrary positions in the plurality of images to generate a plurality of training sample images. In other words, the processor 110 may randomly select at least one from the at least one object image through the image processing module 1103 to use the selected at least one as the at least one overlay image. In other embodiments, the processor 110 may superimpose at least one superimposed image on a plurality of arbitrary positions in the plurality of background images through the image processing module 1103 to generate a plurality of training sample images.

In some embodiments, the processor 110 may perform a plurality of geometric transformation processes on the at least one superimposed image through the image processing module 1103 to generate a plurality of variation images, and further superimpose the at least one superimposed image and the plurality of variation images at a plurality of arbitrary positions in the plurality of images to generate a plurality of training sample images. In other embodiments, the processor 110 may superimpose the at least one superimposed image and the plurality of variation images at a plurality of arbitrary positions in the plurality of background images through the image processing module 1103 to generate a plurality of training sample images.

Note that the geometric transformation process may be a rotation process, a mirror process, a scaling process, or the like, and is not particularly limited.

For example, fig. 4D-4E are schematic diagrams of training sample images according to some exemplary embodiments of the invention. Referring to fig. 4A, 4B and 4D, when the user wants to recognize the human-shaped object and cuts out the object OBJ1 and the object OBJ2 from the object-related image IMG1 and the object-related image IMG2, the object OBJ1 and the object OBJ2 can be used as the object images. Thus, at least one of the object OBJ1 and the object OBJ2 can be randomly selected as the superimposed image.

Taking the example of selecting the object OBJ1 and the object OBJ2 as the superimposed image, the object OBJ2 may be scaled to generate the object OBJ21, the object OBJ1 may be rotated, mirrored and scaled to generate the object OBJ11, and the object OBJ21 and the object OBJ11 may be superimposed at two arbitrary positions in the object-related image IMG1 to generate the training sample image IMG 11.

In addition, referring to fig. 4A, 4B and 4E, the object OBJ2 may also be scaled to generate an object OBJ22, the object OBJ1 may be mirrored to generate an object OBJ12, and the object OBJ22 and the object OBJ12 may be superimposed on two arbitrary positions in the background image IMG3 to generate the training sample image IMG 31.

Referring back to fig. 1 and fig. 2, in some embodiments, the processor 110 may determine whether there is at least one error image in the training sample images through the image recognition module 1101. When the processor 110 determines that at least one error image exists in the training sample images through the image recognition module 1101, the processor 110 may delete the at least one error image through the image processing module 1103.

In further embodiments, the error image may be an image including at least one object that is not superimposed on the ground, an image including at least one object that is upside down, or an image including at least two objects that are superimposed on each other.

For example, fig. 5A-5C are schematic diagrams of error images according to other exemplary embodiments of the invention. Referring to fig. 5A, training sample image IMG32 includes object OBJ23 that is not superimposed on the ground. Referring to fig. 5B, training sample image IMG33 includes object OBJ24 upside down. Referring to fig. 5C, the training sample image IMG34 includes an object OBJ13 and an object OBJ25 superimposed on each other.

By the above steps, the image data enhancement device of the embodiment of the invention can perform data enhancement on the image obtained by the night vision device to generate the identification model corresponding to the image obtained by the night vision device. Therefore, poor recognition accuracy caused when the model trained by the color image is used for object recognition of the image shot by the night vision device can be avoided.

Referring back to fig. 1, taking an embodiment of training images captured by a general photographing device as an example, similarly, after a plurality of images are captured by the general photographing device, the memory 110 stores the captured images, and the image data enhancement device 100 performs data expansion by using the captured images, thereby increasing data for training the recognition model.

FIG. 6 is a flowchart illustrating an image data enhancement method according to further exemplary embodiments of the present invention. The method of the embodiment shown in fig. 6 is also applicable to the image data enhancement apparatus 100 of fig. 1, but not limited thereto. For convenience and clarity of description, the following description will refer to fig. 1 and 6 together to describe the detailed steps of the image data enhancement method shown in fig. 6 by using the operation relationship between the components in the image data enhancement device 100.

First, fig. 6 and fig. 2 differ in that, after the processor 110 recognizes at least one object-related image including at least one object from the plurality of images through the image recognition module 1101 (i.e., step S601), the processor 110 may perform the determination of step S603 through the image recognition module 1101 to determine whether to use the general data enhancement method. If the general data enhancement method is not used, the determination of step S607 is performed to determine whether to continue to use the image data enhancement method of fig. 2.

In detail, in step S603, the processor 120 may determine whether the number of at least one object-related image in the plurality of images is not greater than the object image number threshold through the image recognition module 1101. If so, the process proceeds to step S605. Otherwise, if not, the process proceeds to step S607.

Next, in step S605, the processor 120 may execute a general data enhancement method through the image processing module 1103 to generate a plurality of training sample images, and perform machine learning through the training module 1105 using the plurality of training sample images.

Next, in step S607, the processor 120 may recognize a plurality of background images from the plurality of images through the image recognition module 1101 to determine whether a number of the plurality of background images in the plurality of images is not less than a background image number threshold, wherein the plurality of background images do not include a plurality of objects. If yes, the process proceeds to step S611. Otherwise, if no, the process proceeds to step S609.

In some embodiments, the object image quantity threshold and the background image quantity threshold may be pre-stored in the memory 110 or received by the processor 120 in real time from a data server (not shown).

Next, in step S609, the processor 120 may collect a plurality of additional images. After the processor 120 collects a plurality of additional images, it may return to step S601.

In some embodiments, the processor 120 may transmit the image request message to the data server to receive a plurality of additional images from the data server, wherein the additional images are different from the plurality of images stored in the memory 110.

In addition, the remaining steps of the image data enhancement method of fig. 6 are the same as those of the image data enhancement method of fig. 2, and therefore, the description thereof is omitted.

By the above steps, when the number of the stored images including the specific object is insufficient, the image data enhancement device of the embodiment of the invention can more efficiently perform data enhancement to generate the recognition model corresponding to the image obtained by the general photographing device. Therefore, the poor recognition accuracy caused by the recognition model trained by the stored images can be avoided.

In summary, the image data enhancement apparatus provided by the present invention randomly superimposes the object image cut out from the plurality of images onto the plurality of images or the background image in the plurality of images to generate a plurality of training sample images. Thus, the problem of poor recognition accuracy caused when the image photographed by the night vision device is subject-recognized using the model trained with color images is solved, and the problem of poor recognition accuracy caused when the recognition model trained with stored images is used for recognition is solved.

Although the present invention has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. An image data enhancement device, comprising:

a memory for storing a plurality of instructions and a plurality of images;

a processor, coupled to the memory, for reading the images and loading and executing the instructions to:

identifying at least one object-related image including at least one object from the images;

cutting at least one object image from the at least one object-related image; and

and superposing the at least one object image to a plurality of arbitrary positions in the images to generate a plurality of training sample images, and performing machine learning by using the training sample images.

2. The image data enhancement device of claim 1, wherein the images are acquired by a night vision device, and the processor is further configured to:

identifying at least one object position of the objects in the at least one object-related image; and

and cutting the at least one object image from the at least one object related image according to the at least one object position.

3. The image data enhancement device of claim 1, wherein the processor is further configured to:

recognizing a plurality of background images from the images, wherein the background images do not include the objects; and

and superposing the at least one object image to the random positions in the background images to generate the training sample images.

4. The image data enhancement device of claim 1, wherein the processor is further configured to:

randomly selecting at least one overlay image from the at least one object image; and

and superposing the at least one superposed image to the random positions in the images to generate the training sample images.

5. The image data enhancement device of claim 4, wherein the processor is further configured to:

performing a plurality of geometric transformation processes on the at least one superimposed image to generate a plurality of changed images; and

and superposing the at least one superposed image and the variation images to the random positions in the images to generate the training sample images.

6. An image data enhancement method, comprising:

cutting out at least one object image from at least one object-related image comprising at least one object in the plurality of images; and

7. The method of claim 6, wherein the images are acquired by a night vision device, and the step of cutting the at least one object image from the at least one object-related image including the at least one object in the images comprises:

the object images are cut out from the at least one object-related image according to the at least one object position.

8. An image data enhancement device, comprising:

a memory for storing a plurality of instructions and a plurality of images;

determining whether the number of the at least one object-related image in the images is not greater than an object image number threshold;

when the number of the at least one object related image is not greater than the object image number threshold, cutting at least one object image from the at least one object related image; and

9. The image data enhancement device of claim 8, wherein the processor is further configured to:

identifying a plurality of background images from the images to judge whether the number of the background images in the images is not less than a background image number threshold value, wherein the background images do not comprise the objects;

identifying at least one object position of the objects in the at least one object-related image when the number of the background images in the images is not less than the background image number threshold; and

10. The image data enhancement device of claim 9, wherein the processor is further configured to: