WO2019132589A1

WO2019132589A1 - Image processing device and method for detecting multiple objects

Info

Publication number: WO2019132589A1
Application number: PCT/KR2018/016862
Authority: WO
Inventors: 김원태; 강신욱; 이명재; 김동민; 김신곤; 김필수; 김기동; 노병필; 문태준
Original assignee: (주)제이엘케이인스펙션; 대한민국(관세청장)
Priority date: 2017-12-29
Filing date: 2018-12-28
Publication date: 2019-07-04
Also published as: KR101932009B1

Abstract

Provided is an image processing method and device for learning a deep learning model to detect multiple objects by using a plurality of images each containing a single object. The image processing method according to the present disclosure comprises: an object area extraction step of receiving a first image containing a first object and a second image containing a second object, and distinguishing between the object and the background of each of the first and second images; an object location information generation step of generating location information of the distinguished first and second objects; an image synthesis step of generating a third image containing the first and second objects on the basis of the location information of the first object and the location information of the second object; and an object detection deep learning model learning step of learning an object detection deep learning model by using the location information of the first object, the location information of the second object, and the third image.

Description

Image processing apparatus and method for multi-object detection

The present disclosure relates to an image processing apparatus and method for multi-object detection. More particularly, the present disclosure relates to an apparatus and method for learning a deep learning model for multi-object detection using a plurality of images including a single object, and a computer readable recording medium storing a program for executing the image processing method of the present disclosure And a recording medium.

Object recognition is processing for recognizing an area recognized as an object in an arbitrary image as one of a predetermined plurality of classes, and an object can mean a specific object in the image.

Deep learning, on the other hand, learns a very large amount of data, and when new data is input, it selects the highest probability with probability based on the learning result, and it can adaptively operate according to the image In the artificial intelligence field, there is an increasing tendency to utilize it in the field of artificial intelligence because it automatically finds the characteristic factor in the learning process of the model based on the data.

However, there is a lack of research on more efficient and accurate data analysis using technology such as deep - run in relation to the task of analyzing the objects in the image in the existing customs and electronic clearance system.

The technical object of the present disclosure is to provide an image processing apparatus and method for learning an image.

It is another object of the present invention to provide an image processing apparatus and method for learning a deep learning model for detecting multiple objects using a plurality of images including a single object.

According to another aspect of the present invention, there is provided an image processing apparatus and method for generating object position information by distinguishing an object and a background from an image including a single object.

The technical objects to be achieved by the present disclosure are not limited to the above-mentioned technical subjects, and other technical subjects which are not mentioned are to be clearly understood from the following description to those skilled in the art It will be possible.

According to an aspect of the present disclosure, there is provided an image processing method including receiving a first image including a first object and a second image including a second object, and separating the object and the background with respect to each of the first image and the second image, An object region extracting unit; An object position information generating unit for generating position information of the first object and the second object; An image synthesizer for generating a third image including the first object and the second object based on the position information of the first object and the position information of the second object; And an object detection deep learning model learning unit for learning an object detection deep learning model using position information of the first object, position information of the second object, and the third image.

According to another aspect of the present disclosure, there is provided an image processing apparatus including an object region extracting unit for receiving an image including an object and distinguishing the object from a background, wherein the object region extracting unit compares a pixel value of the input image with a predetermined threshold value An image processing apparatus may be provided for binarizing the pixel values and grouping the binarized pixel values to distinguish objects included in the input image.

According to still another aspect of the present disclosure, there is provided a method for receiving a first image including a first object and a second image including a second object, for each of the first image and the second image, An object region extraction step for identifying the object region; An object position information generating step of generating position information of the first object and the second object; An image synthesis step of generating a third image including the first object and the second object based on the position information of the first object and the position information of the second object; And an object detection deep learning model learning step of learning an object detection deep learning model using the position information of the first object, the position information of the second object, and the third image .

According to another aspect of the present disclosure, there is provided an image processing method including an object region extracting step of receiving an image including an object and distinguishing the object and a background, wherein the object region extracting step comprises: And binarizing the pixel values and grouping the binarized pixel values to distinguish objects included in the input image.

According to still another aspect of the present disclosure, a computer-readable recording medium having recorded thereon a program for executing the image processing method of the present disclosure can be provided.

The features briefly summarized above for this disclosure are only exemplary aspects of the detailed description of the disclosure which follow, and are not intended to limit the scope of the disclosure.

According to the present disclosure, an image processing apparatus and method for learning a deep learning model so that a multi-object image can be detected more accurately can be provided.

Also, according to the present disclosure, an image processing apparatus and method for learning a deep learning model for multiple object detection using a plurality of images including a single object can be provided.

Also, according to the present disclosure, an image processing apparatus and method for dividing an object and background from an image including a single object and generating object position information can be provided.

The effects obtainable from the present disclosure are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below will be.

1 is a block diagram showing the configuration of an image processing apparatus according to one embodiment of the present disclosure.

2 is a diagram for explaining a process of dividing an object and a background in an image including a single object according to an embodiment of the present disclosure and generating position information of the object.

3 is a diagram illustrating a process of generating a multi-object image using two images including a single object according to an embodiment of the present disclosure.

4 is a diagram for explaining an embodiment of a composite neural network for generating a multi-channel feature map.

5 is a diagram illustrating a process of learning a composite-object neural network using a multi-object image according to an embodiment of the present disclosure.

6 is a diagram for explaining a process of analyzing an actual image using an image processing apparatus according to an embodiment of the present disclosure.

7 is a diagram for explaining an image processing method according to an embodiment of the present disclosure.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings, which will be easily understood by those skilled in the art. However, the present disclosure may be embodied in many different forms and is not limited to the embodiments described herein.

In the following description of the embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure rather unclear. Parts not related to the description of the present disclosure in the drawings are omitted, and like parts are denoted by similar reference numerals.

In the present disclosure, when an element is referred to as being "connected", "coupled", or "connected" to another element, it is understood that not only a direct connection relationship but also an indirect connection relationship May also be included. Also, when an element is referred to as " comprising "or" having "another element, it is meant to include not only excluding another element but also another element .

In the present disclosure, the terms first, second, etc. are used only for the purpose of distinguishing one element from another, and do not limit the order or importance of elements, etc. unless specifically stated otherwise. Thus, within the scope of this disclosure, a first component in one embodiment may be referred to as a second component in another embodiment, and similarly a second component in one embodiment may be referred to as a first component .

In the present disclosure, the components that are distinguished from each other are intended to clearly illustrate each feature and do not necessarily mean that components are separate. That is, a plurality of components may be integrated into one hardware or software unit, or a single component may be distributed into a plurality of hardware or software units. Thus, unless otherwise noted, such integrated or distributed embodiments are also included within the scope of this disclosure.

In the present disclosure, the components described in the various embodiments are not necessarily essential components, and some may be optional components. Thus, embodiments consisting of a subset of the components described in one embodiment are also included within the scope of the present disclosure. Also, embodiments that include other elements in addition to the elements described in the various embodiments are also included in the scope of the present disclosure.

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings.

1, the image processing apparatus 100 includes an object region extraction unit 110, an object position information generation unit 120, an image synthesis unit 130, and / or an object detection deep learning model learning unit 140 . However, this shows only some components necessary for explaining the present embodiment, and the components included in the image processing apparatus 100 are not limited to the above-described examples. For example, two or more constituent units may be implemented in one constituent unit, and an operation performed in one constituent unit may be divided and executed in two or more constituent units. Also, some of the constituent parts may be omitted or additional constituent parts may be added.

The image processing apparatus 100 according to an exemplary embodiment receives a first image including a first object and a second image including a second object, and acquires an object and a background for each of the first image and the second image, And generates a third image including the first object and the second object based on the position information of the first object and the position information of the second object, The object detection deep learning model can be learned using the position information of the first object, the position information of the second object, and the third image.

Referring to FIG. 1, the input image 150 may include an image including a single object. For example, the input image 150 may be an image related to a cargo including one object. Also, for example, the input image 150 may be an X-ray image of the cargo taken by the X-ray reading device. The image may be a raw image taken by an X-ray imaging device or an image in any form (format) for storing or transmitting the image. The image may be obtained by capturing image data captured by an X-ray reading device and transmitting the image data to an output device such as a monitor and then data.

The object region extracting unit 110 may receive the image 150 including a single object and may divide the received image into an object and a background. The object refers to a specific object in the image, and the background can refer to a part excluding an object from the image. The object region extracting unit 110 according to an embodiment compares a pixel value of the input image 150 with a predetermined threshold value to binarize the pixel value and groups the binarized pixel values, You can distinguish objects. A specific process of extracting an object will be described later with reference to FIG.

The object position information generation unit 120 can determine the position of the object extracted from the object region extraction unit 110. For example, the object position information generation unit 120 specifies a bounding box surrounding the object region, and generates position information of the separated object in the object region extraction unit 110 based on the specified rectangular box can do. The specific process of generating the location information of the object will be described in more detail with reference to FIG.

2 is a diagram for explaining a process of dividing an object and a background in an image including a single object according to an embodiment of the present disclosure and generating position information of the object. The object region extraction unit 200 and the object position information generation unit 260 of FIG. 2 may be an embodiment of the object region extraction unit 110 and the object position information generation unit 120 of FIG. 1, respectively. The input image 210 may be the input image 150 described with reference to FIG. 1 and may be, for example, an image relating to the cargo including the bag 212 as a single object. The object region extracting unit 200 roughly cuts the surrounding region based on the bag 212 by performing a cropping operation on the input image 210 including one bag 212 The cropped image 220 can be acquired. Then, the object region extracting unit 200 may obtain the binarized image 230 by thresholding the pixel value of the cropped image 220 and a predetermined threshold value to binarize the pixel value. The object region extracting unit 200 can obtain the grouped image 240 by grouping adjacent pixels to select a portion of the object from the binarized image 230. Then, the object region extracting unit 200 performs labeling and hole filling operations on the grouped image 240 to generate a pixel group formed in the largest shape as a region 252 for the object And determining the remainder as the area 254 for the background.

The object position information generation unit 260 can determine the position of the object in the input image 210 using the information about the object region image extracted by the object region extraction unit 200. [ For example, the object position information generation unit 260 specifies a bounding box surrounding the object region, and generates position information of the object classified by the object region extraction unit 200 based on the specified rectangular box can do. 2, the object position information generating unit 260 specifies a rectangular box 262 surrounding the bag 212 and acquires position information of the bag 212 based on the specified rectangular box have. For example, the position information of the bag 212 may be position information of four vertices forming the rectangular box 262, but is not limited thereto. For example, the position information may be represented by the coordinates (x, y) of one vertex of the rectangular box 262 and the width and height of the rectangular box. The coordinates (x, y) of the one vertex may be the coordinates of the upper left vertex of the square box 262. The coordinates (x, y) of the vertex can be specified based on the coordinates (0, 0) of the upper left vertex of the input image 210.

According to an embodiment of the present disclosure, since position information of an object included in an image can be automatically generated, it is possible to avoid the hassle of a readout source for directly inputting the positional information of an object for each image for artificial intelligence learning .

Referring again to FIG. 1, the image synthesis unit 130 generates a multi-object image using a plurality of single object images obtained by acquiring the position information of the object through the object region extraction unit 110 and the object position information generation unit 120 Can be generated. For example, the first image including the first object and the second image including the second object may be transmitted through the object region extracting unit 110 and the object position information generating unit 120, respectively, The position information of the second object is obtained and the image combining unit 130 generates a third image including the first object and the second object based on the obtained position information of the first object and the position information of the second object can do. A detailed process of generating a multi-object image will be described in detail with reference to FIG.

3 is a diagram illustrating a process of generating a multi-object image using two images including a single object according to an embodiment of the present disclosure. The image combining unit 300 of FIG. 3 is an embodiment of the image combining unit 130 of FIG. 3, the image combining unit 300 includes a first single object image 310, a second single object image 320, and a first single object image 320 obtained through an object region extracting unit and an object position information generating unit 310 and the second single object image 320 are included in the multi object image 340 and the multi object image 340 in which the first single object image 310 and the second single object image are combined And obtain location information 350 for the objects. The image combining unit 300 may also use the image 330 for the background separated from the object when the first single object image 310 and the second single object image 320 are combined.

Referring again to FIG. 1, the object detection deep learning model learning unit 140 may learn an object detection deep learning model using position information of a first object, position information of a second object, and a third image. For example, the object detection deep learning model learning unit 140 can learn the artificial neural network model. The position information of the first object, the position information of the second object, and the third image may be used for learning of the compound neural network model. The combined-product neural network model will be described in more detail with reference to FIGS. 4 and 5. FIG.

The composite neural network of the present disclosure may be used to extract " features " such as borders, line colors, etc. from input data (images) and may include multiple layers. Each layer can receive input data and process the input data of the layer to generate output data. The composite neural network can output the feature map generated by convoluting the input image or the input feature map with filter kernels as output data. The initial layers of the composite product neural network may be operated to extract low level features such as edges or gradients from the input. The next layers of the neural network can extract gradually more complex features such as eyes, nose, and so on. The image processing based on the composite neural network can be applied to various fields. For example, image processing apparatuses for image object recognition, image processing apparatuses for image reconstruction, image processing apparatuses for semantic segmentation, image processing for scene recognition, Device or the like.

Referring to FIG. 4, the input image 410 may be processed through the composite neural network 400 to output a feature map image. The outputted feature map image can be utilized in various fields as described above.

The composite neural network 400 may be processed through a plurality of

layers

420, 430, and 440, and each layer may output multi-channel

feature map images

425 and 435. A plurality of

layers

420, 430, and 440 according to an exemplary embodiment may extract a feature of an image by applying a filter having a predetermined size from a left top end to a right bottom end of input data. For example, the plurality of

layers

420, 430, and 440 multiply the weights by the weighted upper left NxM pixels of the input data and map them to a neuron at the upper left of the feature map. In this case, the weight to be multiplied will also be NxM. The NxM may be, for example, 3x3, but is not limited thereto. Thereafter, in the same process, the plurality of

layers

420, 430, and 440 scans input data from left to right and from top to bottom by k squares, and maps the weights to neurons of the feature map. The k-th column means a stride for moving the filter when performing the product multiplication, and can be set appropriately to adjust the size of the output data. For example, k may be one. The NxM weight is called a filter or filter kernel. That is, the process of applying the filter in the plurality of

layers

420, 430, and 440 is a process of performing a convolution operation with the filter kernel. As a result, the extracted result is called a "feature map" Map image ". In addition, the layer on which the convolution operation is performed may be referred to as a convolution layer.

The term " multiple-channel feature map " refers to a set of feature maps corresponding to a plurality of channels, and may be, for example, a plurality of image data. The multi-channel feature maps may be inputs at any layer of the composite neural network, and may be output according to feature map computation results such as convolution operations. According to one embodiment, the multi-channel feature maps 425, 435 are generated by a plurality of

layers

420, 430, 440, also referred to as "feature extraction layers" or "convolutional layers" do. Each layer may sequentially receive the multi-channel feature maps generated in the previous layer and generate the next multi-channel feature maps as output. Finally, in the L (L is an integer) th layer 440, multi-channel feature maps generated in the (L-1) th layer (not shown) are received to generate multi-channel feature maps.

4, the feature maps 425 having the channel K1 are outputs according to the feature map operation 420 in the layer 1 for the input image 410 and the feature map operation 430 in the layer 2 &Lt; / RTI > Feature maps 435 with channel K2 are also outputs according to feature map operation 430 at layer 2 for input feature maps 425 and feature map operations (not shown) at layer 3, &Lt; / RTI >

Referring to FIG. 4, the multi-channel feature maps 425 generated in the first layer 420 include feature maps corresponding to K1 (K1 is an integer) channels. Also, the multi-channel feature maps 435 generated in the second layer 430 include feature maps corresponding to K2 (K2 is an integer) channels. Here, K1 and K2, which represent the number of channels, may correspond to the number of filter kernels used in the first layer 420 and the second layer 430, respectively. That is, the number of multi-channel feature maps generated in the Mth layer (M is an integer equal to or greater than 1 and equal to or smaller than L-1) may be equal to the number of filter kernels used in the Mth layer.

5 is a diagram illustrating a process of learning a composite-object neural network using a multi-object image according to an embodiment of the present disclosure. The object detection deep learning model learning unit 500 of FIG. 5 is an embodiment of the object detection deep learning model learning unit 140 of FIG. Referring to FIG. 5, a multi-object image 510 synthesized using single object images and location information of objects may be used as data necessary for learning. The object detection deep learning model learning unit 500 can learn the composite neural network 520 by projecting the position information of each of the single objects together with the multi object image 510. According to one embodiment, when there are a plurality of objects in the cargo passing through the X-ray scanner in the e-clearance system, a superimposed X-Ray image of a plurality of objects can be obtained. According to this disclosure, Since the artificial neural network is learned by using the shape of each object together with the position information of the objects of the object, the more accurate detection result can be obtained even when the overlap between the objects occurs.

The image processing apparatus 600 of FIG. 6 is an embodiment of the image processing apparatus 100 of FIG. The operations of the object region extracting unit 604, the object position information generating unit 606, the image synthesizing unit 608 and the object detecting deep learning model learning unit 610 included in the image processing apparatus 600 of FIG. The object position extracting unit 110, the object position information generating unit 120, the image synthesizing unit 130 and the object detecting deep learning model learning unit 140 included in the image processing apparatus 100 of FIG. 1 . Accordingly, the image processing apparatus 600 includes an object region extracting unit 604, an object position information generating unit 606, an image synthesizing unit 608, and an object detecting deep learning model learning unit 604 for a plurality of single object images 602, Lt; RTI ID = 0.0 > 610 < / RTI > The object detecting apparatus 620 can detect each object using the artificial neural network model learned in the image processing apparatus 600 for an image 622 including multiple objects in a real environment. According to one embodiment, when the present disclosure is applied to an electronic clearance system, the image processing apparatus 600 of the present disclosure generates a new multi-object embedded image based on a single object region extraction in an X-ray image . The object detection apparatus 620 can also find an area where there are multiple objects contained in the cargo passing through the X-ray scanner. Therefore, by automatically extracting the position of the object with respect to the X-ray image, it is possible to more easily perform the image inspection operation by the readout source, and further, the information including the extracted object and the quantity information of the object in the cargo And can be used for comparison of computerized information.

In step S700, the first image including the first object and the second image including the second object may be input, and the object and the background may be distinguished for each of the first image and the second image. For example, a pixel value of an input image may be compared with a predetermined threshold value to binarize the pixel value, and binarized pixel values may be grouped to distinguish objects included in the input image.

In step S710, location information of the first object and the second object may be generated. For example, a rectangular box surrounding the object area may be specified, and position information of the object classified in step S700 may be generated based on the specified rectangular box.

In step S720, a third image including the first object and the second object may be generated based on the position information of the first object and the position information of the second object. For example, the third image including the first object and the second object may be generated based on the position information of the first object and the position information of the second object obtained in step S710.

In step S730, the object detection deep learning model can be learned using the position information of the first object, the position information of the second object, and the third image. For example, it is possible to learn the compound neural network model. In order to learn the neural network model, the position information of the first object generated in step S710, the position information of the second object, and the third image generated in step S720 are used .

In the embodiment described with reference to FIGS. 1 to 7, an example of receiving an image including a single object and separating an object and a background has been described. However, the present invention is not limited thereto, and the input image may be an image including two or more objects. In this case, it is possible to distinguish two or more objects and backgrounds from the input image, and generate position information for each of the two or more objects. In this case, in the description with reference to FIG. 2, when a plurality of pixel groups are formed, it can be determined that not only the pixel groups formed in the largest shape but also the other pixel groups are regions for the objects. The process of generating the position information of each determined object is the same as described for the image including one object.

Also, in the above-described embodiment, it has been described that the third image is generated based on the two single object images and the position information of the respective objects. However, the present invention is not limited to this, and a third image may be generated using two or more single object images and position information of each object. That is, the image processing method and apparatus according to the present disclosure can generate a third image based on two or more images each including one or more objects and position information of each object.

The deep learning based model of the present disclosure can also be applied to a fully convoluted neural network, a convolutional neural network, a recurrent neural network, but is not limited to, at least one of a neural network, a restricted Boltzmann machine (RBM), and a deep belief neural network (DBN). Alternatively, a machine running method other than deep running may be included. Or a hybrid model combining deep running and machine running. For example, a feature of an image may be extracted by applying a deep learning-based model, and a model based on a machine learning may be applied when an image is classified or recognized based on the extracted feature. The machine learning based model may include, but is not limited to, a support vector machine (SVM), an AdaBoost, and the like.

Although the exemplary methods of this disclosure are represented by a series of acts for clarity of explanation, they are not intended to limit the order in which the steps are performed, and if necessary, each step may be performed simultaneously or in a different order. In order to implement the method according to the present disclosure, the illustrative steps may additionally include other steps, include the remaining steps except for some steps, or may include additional steps other than some steps.

The various embodiments of the disclosure are not intended to be all-inclusive and are intended to illustrate representative aspects of the disclosure, and the features described in the various embodiments may be applied independently or in a combination of two or more.

In addition, various embodiments of the present disclosure may be implemented by hardware, firmware, software, or a combination thereof. In the case of hardware implementation, one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays A general processor, a controller, a microcontroller, a microprocessor, and the like.

The scope of the present disclosure is to be accorded the broadest interpretation as understanding of the principles of the invention, as well as software or machine-executable instructions (e.g., operating system, applications, firmware, Instructions, and the like are stored and are non-transitory computer-readable medium executable on the device or computer.

The present invention can be used to process images containing multiple objects.

Claims

An object region extracting unit that receives a first image including a first object and a second image including a second object and separates the object and the background for each of the first image and the second image;

An object position information generating unit for generating position information of the first object and the second object;

An image synthesizer for generating a third image including the first object and the second object based on the position information of the first object and the position information of the second object; And

An object detection deep learning model learning unit that learns an object detection deep learning model using position information of the first object, position information of the second object, and the third image.
The method according to claim 1,

The object region extracting unit may extract,

A pixel value of the input image is compared with a predetermined threshold value to binarize the pixel value,

And classifies the objects included in the input image by grouping the binarized pixel values.
The method according to claim 1,

Wherein the object position information generating unit comprises:

Specifying a bounding box surrounding the segmented object,

And generates object position information of the separated object based on the specified rectangular box.
The method according to claim 1,

The object detection deep learning model includes a Convolutional Neural Network (CNN)

The object detection deep learning model learning unit may include:

And generates a feature map of the composite neural network by projecting the position information of the first object and the position information of the second object together when the third image is learned.
And an object region extracting unit for receiving an image including the object and distinguishing the object and the background,

The object region extracting unit may extract,

A pixel value of the input image is compared with a predetermined threshold value to binarize the pixel value,

And classifies the objects included in the input image by grouping the binarized pixel values.
6. The method of claim 5,

And an object position information generating unit for generating position information of the divided object,

Wherein the object position information generating unit comprises:

Specifying a rectangular box surrounding the divided object,

And generates object position information of the separated object based on the specified rectangular box.
An object region extracting step of receiving a first image including a first object and a second image including a second object and distinguishing the object and the background for each of the first image and the second image;

An object position information generating step of generating position information of the first object and the second object;

An image synthesis step of generating a third image including the first object and the second object based on the position information of the first object and the position information of the second object; And

And an object detection deep learning model learning step of learning an object detection deep learning model using the position information of the first object, the position information of the second object, and the third image.
8. The method of claim 7,

Wherein the object region extracting step comprises:

A pixel value of the input image is compared with a predetermined threshold value to binarize the pixel value,

And grouping the binarized pixel values to separate objects included in the input image.
8. The method of claim 7,

Wherein the object position information generation step comprises:

Specifying a bounding box surrounding the segmented object,

And generating object position information of the separated object based on the specified rectangular box.
8. The method of claim 7,

The object detection deep learning model includes a Convolutional Neural Network (CNN)

The object detection deep learning model learning step includes:

And generating a feature map of the composite neural network by projecting the position information of the first object and the position information of the second object when the third image is learned.
And an object region extracting step of receiving an image including the object and distinguishing the object and the background,

Wherein the object region extracting step comprises:

A pixel value of the input image is compared with a predetermined threshold value to binarize the pixel value,

And grouping the binarized pixel values to separate objects included in the input image.
12. The method of claim 11,

Further comprising an object location information generating step of generating location information of the separated object,

Wherein the object position information generation step comprises:

Specifying a rectangular box surrounding the divided object,

And generating object position information of the separated object based on the specified rectangular box.
A computer-readable recording medium storing a program,

The program includes:

An object region extracting step of receiving a first image including a first object and a second image including a second object and distinguishing the object and the background for each of the first image and the second image;

An object position information generating step of generating position information of the first object and the second object;

An image synthesis step of generating a third image including the first object and the second object based on the position information of the first object and the position information of the second object; And

And an object detection deep learning model learning step of learning an object detection deep learning model using the position information of the first object, the position information of the second object, and the third image.
A computer-readable recording medium storing a program,

The program includes:

An object region extracting step of receiving an image including an object and distinguishing the object and the background,

Wherein the object region extracting step comprises:

A pixel value of the input image is compared with a predetermined threshold value to binarize the pixel value,

And classifying the objects included in the input image by grouping the binarized pixel values.