WO2022124673A1

WO2022124673A1 - Device and method for measuring volume of object in receptacle on basis of camera image using machine learning model

Info

Publication number: WO2022124673A1
Application number: PCT/KR2021/017807
Authority: WO
Inventors: 김영호; 이동엽
Original assignee: 주식회사 제로클래스랩
Priority date: 2020-12-11
Filing date: 2021-11-30
Publication date: 2022-06-16
Also published as: KR20220083347A; KR102597692B1

Abstract

An object volume measurement method is provided comprising the steps of: receiving an input image; detecting a standard receptacle area corresponding to a predetermined standard size of a receptacle, from the input image; recognizing a receptacle wall surface area corresponding to the wall surfaces of the standard receptacle and an object area corresponding to an object received inside the standard receptacle, from the input image by means of a first machine learning model; calculating the ratio of the pixels of the object area relative to the pixels of the total area comprising the receptacle wall surface area and object area; and generating a volume measurement value of the object on the basis of the pixel ratio.

Description

Apparatus and method for measuring the volume of an object in a container based on a captured image using a machine learning model

Embodiments of the present disclosure relate to an apparatus and method for measuring the volume of an object in a container based on a captured image using a machine learning model.

Logistics systems, factories, etc. deal with a large number of goods, so a lot of resources are put into accurate inventory measurement. According to the conventional method, in order to grasp the inventory, the stock is measured by grasping the goods in and out. However, a lot of manpower and money are invested in order to check the stocking and release of goods, so it is difficult to quickly grasp the inventory of a large amount of goods. Commercially available inventory tracking technology uses RFID, weight sensors, etc. At this time, it is difficult to spread because RFID tags are attached to each item or weight sensors are installed on each shelf and additional costs are incurred.

SUMMARY Embodiments of the present disclosure provide a method, an apparatus, and a computer program for measuring the volume of an object using image data.

According to an aspect of an embodiment of the present disclosure, receiving an input image; detecting a standard container area corresponding to a predefined standard container from the input image; recognizing a container wall area corresponding to the wall surface of the standard container and an object area corresponding to an object contained in the standard container using a first machine learning model from the input image; calculating a pixel ratio of the object area among pixels of the entire area including the container wall area and the object area; and generating a volume measurement value of the object based on the pixel ratio.

Further, according to an embodiment of the present disclosure, the method for measuring object volume further includes removing a background area excluding the standard container area from the input image, and recognizing the container wall area and the object area. is performed using the input image from which the background area has been removed.

In addition, according to an embodiment of the present disclosure, removing the background region may include defining the standard container region as a region of interest; defining a region excluding the region of interest from the input image as the background region; and generating a first output image in which the background area is displayed as a single pixel value.

In addition, according to an embodiment of the present disclosure, the first machine learning model corresponds to a segmentic segmentation model, and the step of recognizing the container wall area and the object area includes the container wall area using the segmentic segmentation model. and recognizing the object region, wherein the object volume measurement method further includes converting the container wall region into a first pixel value and converting the object region into a second pixel value.

In addition, according to an embodiment of the present disclosure, the standard container is a standard container in the form of a polyhedron partially open, and the calculating of the pixel ratio includes recognizing a plurality of wall areas corresponding to each wall surface of the standard container. to do; dividing the object area into a plurality of sub object areas respectively corresponding to the plurality of wall areas; calculating a weighted object area pixel count by applying a weight corresponding to each wall surface to the pixel number of each of the plurality of sub object areas; and calculating the pixel ratio by using the number of pixels in the weighted object area and the number of pixels in the container wall area.

In addition, according to an embodiment of the present disclosure, the standard container is a rectangular standard container with an open top and front surfaces, and the input image is an image obtained by photographing the standard container with an oblique line from the open top and front surfaces.

In addition, according to an embodiment of the present disclosure, the weight of each wall surface of the rectangular parallelepiped container may increase in the order of both sides, the front, and the lower surface of the rectangular standard container.

In addition, according to an embodiment of the present disclosure, the volume measurement value of the object may be defined as a percentage of the total volume of the standard container.

In addition, according to an embodiment of the present disclosure, the input image is a moving picture including a plurality of frames, and the method for measuring the volume of an object further includes extracting a frame in which the standard container area is detected, the standard The detecting of the container region may include using the extracted frame as the input image.

According to another aspect of an embodiment of the present disclosure, an input interface for receiving an input image; a memory storing at least one instruction; at least one processor executing the at least one instruction; and an output interface, wherein the at least one processor detects a standard container region corresponding to a predefined standard container from the input image by executing the at least one instruction, and performs first machine learning from the input image. The container wall area corresponding to the wall surface of the standard container and the object area corresponding to the object contained in the standard container are recognized using the model, and among the pixels of the entire area including the container wall area and the object area, the An object volume measurement apparatus is provided, which calculates a pixel ratio of an object area, generates a volume measurement value of the object based on the pixel ratio, and outputs the object volume measurement value through the output interface.

According to another aspect of an embodiment of the present disclosure, in a computer program that performs a method for measuring object volume when executed by a processor and recorded on a recording medium, the method for measuring object volume includes a predefined method from the input image. detecting a standard container area corresponding to a standard container; recognizing a container wall area corresponding to the wall surface of the standard container and an object area corresponding to an object contained in the standard container using a first machine learning model from the input image; calculating a pixel ratio of the object area among pixels of the entire area including the container wall area and the object area; and generating a volume measurement value of the object based on the pixel ratio.

According to embodiments of the present disclosure, it is possible to provide a method, an apparatus, and a computer program for measuring the volume of an object by using image data.

1 is a diagram illustrating a system for measuring object volume according to an embodiment of the present disclosure.

2 is a view showing the structure of an object volume measurement apparatus according to an embodiment of the present disclosure.

3 is a flowchart illustrating a method for measuring the volume of an object according to an embodiment of the present disclosure.

4 is a view showing a standard container according to an embodiment of the present disclosure.

5 is a diagram in which a visual tag is disposed according to an embodiment of the present disclosure.

6 is a diagram illustrating a visual tag according to an embodiment of the present disclosure.

7 is a diagram illustrating a flowchart of a method for measuring the volume of an object according to an embodiment of the present disclosure.

8 is a diagram illustrating a process of detecting a standard container area from a black-and-white scale image, according to an embodiment of the present disclosure.

9 is a diagram illustrating a background removal process according to an embodiment of the present disclosure.

10 is a diagram illustrating a process of generating a first output image from a background removal image according to an embodiment of the present disclosure.

11 is a diagram illustrating a process of calculating the volume of an object contained in a standard container, according to an embodiment of the present disclosure.

12 is a diagram illustrating an output of a first machine learning model according to an embodiment of the present disclosure.

13 is a diagram illustrating a structure of a processor according to another embodiment of the present disclosure.

14 is a diagram illustrating the structure of a CNN model according to an embodiment of the present disclosure.

15 is a diagram illustrating image data and a first output image of a standard container area according to an embodiment of the present disclosure.

This specification clarifies the scope of the claims of the present disclosure, and describes the principles of the embodiments of the present disclosure so that those of ordinary skill in the art to which the embodiments of the present disclosure pertain can practice the embodiments of the present disclosure and discloses embodiments. The disclosed embodiments may be implemented in various forms.

Like reference numerals refer to like elements throughout. This specification does not describe all elements of the embodiments, and general content in the technical field to which the embodiments of the present disclosure pertain or overlapping between the embodiments will be omitted. As used herein, the term 'part' (part, portion) may be implemented in software or hardware, and according to embodiments, a plurality of 'parts' may be implemented as one element (unit, element), or one 'part' It is also possible that ' includes a plurality of elements. Hereinafter, with reference to the accompanying drawings, embodiments of the present disclosure, and the principle of operation of the embodiments will be described.

The object volume measurement system 10 according to an embodiment of the present disclosure uses the camera 120 to photograph the object 112 contained in the standard container 110 , and from the captured input image 110 , the object volume measurement device Measure the object volume at (100). The object volume measurement system 10 according to embodiments of the present disclosure may be used in a distribution warehouse, a factory, and the like. For example, the object 112 contained in the standard container 110 is placed on a shelf of the warehouse, and the object volume measurement system 10 takes the standard containers 110 placed while moving the camera 120, It is possible to measure the volume of the object 112 contained in each standard container 110 from the input image. The camera 120 is mounted on a predetermined moving means, for example, the camera 120 and the object volume measuring device 100 are implemented in the form of a movable robot, and while the robot moves in a predetermined path in the warehouse It is possible to measure the volume of the object in each standard container (110).

The object 112 is contained in a predetermined standard container 110 . The standard container 110 is a container having a predefined shape, size, and color. The standard container 110 may be defined as one or more types. The object 112 is an object subject to volume measurement. The object volume means the volume occupied by the object 112 contained in one standard container 110 .

The camera 120 photographs the object 112 contained in the standard container 110 . The camera 120 includes a lens, a shutter, and an image pickup device. The camera 120 captures an image, and outputs the captured input image to the object volume measurement apparatus 100 .

The object volume measuring apparatus 100 may be implemented in the form of an electronic device including a processor and a memory, for example, in the form of a smart phone, a tablet PC, a notebook computer, or a wearable device. According to one embodiment, the object volume measuring apparatus 100 may be implemented in the form of a cloud server.

According to one embodiment, the object volume measuring apparatus 100 may be implemented as one device including the camera 120 .

According to another embodiment, the object volume measuring apparatus 100 may receive an input image from the external camera 120 . The object volume measuring apparatus 100 may receive an input image through a communication unit or a predetermined input interface. According to an embodiment, the camera 120 may correspond to a closed circuit television (CCTV) camera.

The object volume measurement apparatus 100 includes an input interface 210 , a processor 220 , an output interface 230 , and a memory 240 .

The input interface 210 receives an input image photographed from at least one camera for photographing a standard container. According to an embodiment, the object volume measuring apparatus 100 may include a camera in the object volume measuring apparatus 100 . In this case, the input interface 210 receives an input image from a camera built in the object volume measurement apparatus 100 . According to another embodiment, the object volume measurement apparatus 100 may be connected to a camera disposed outside the object volume measurement apparatus 100 to receive an input image through the input interface 210 . In this case, the camera photographs the standard container and transmits the photographed image data to the object volume measurement apparatus 100 . The camera is placed with a Field of View (FOV) set to photograph the standard container. According to an embodiment, the camera may correspond to an existing CCTV camera.

The input interface 210 may correspond to an input device of a predetermined standard for receiving image data from a camera or a communication unit. The input interface 210 transmits the input image data to the processor 220 or the memory 240 . The input image data corresponds to the input image. The processor 220 may read the input image stored in the memory 240 .

The processor 220 controls the overall operation of the object volume measuring apparatus 100 . The processor 220 may be implemented with one or more processors. The processor 220 may execute an instruction or a command stored in the memory to perform a predetermined operation.

The processor 220 detects the standard container area corresponding to the standard container from the input image. The processor 220 may detect the standard container area corresponding to the standard container from the input image using a method of detecting a visual tag or a method using an object detection algorithm such as You Only Look Once (YOLO).

According to an embodiment, the processor 220 obtains a container identifier corresponding to the detected standard container. According to one embodiment, the container identifier may be obtained using a visual tag. According to another embodiment, the container identifier may be obtained by recognizing the container identifier described in the standard container using a character recognition algorithm or a pattern recognition algorithm.

In addition, the processor 220 may acquire object information, which is information about an object contained in a standard container, based on the container identifier. The processor 220 may further include a memory, and store object information corresponding to each container identifier in the memory. The product information may include at least one of a product name, a product category, a manufacturer, a salesperson, a serial number, an expiration date, an active ingredient, and a storage method, or a combination thereof. The processor 220 may acquire object information corresponding to the obtained container identifier based on the obtained container identifier and the object information stored in the memory. As another example, the processor 220 may acquire object information corresponding to the container identifier by using an external database such as a cloud server.

In addition, the processor 220 recognizes the container wall area corresponding to the wall surface of the standard container and the object area corresponding to the object from the image of the standard container area. The processor 220 may recognize the container wall area and the object area using the first machine learning model.

Next, the processor 220 calculates the number of pixels in each of the container wall area and the object area, and calculates the pixel ratio of the object area. The processor 220 calculates the number of pixels in the entire area including the container wall area and the object area. The pixel ratio of the object area corresponds to the ratio of the number of pixels of the object area to the number of pixels of the entire area. The pixel ratio can be defined as a percentage.

Next, the processor 220 generates the object volume measurement value based on the pixel ratio of the object area. According to an embodiment, the processor 220 may define the pixel ratio as a measurement value of the object volume. According to another embodiment, the processor 220 may define a value obtained by multiplying a pixel ratio by a predetermined reference value as the object volume measurement value.

The output interface 230 outputs the volume measurement value generated by the processor 220 . The output interface 230 may correspond to, for example, a display, an audio speaker, or a communication unit.

According to an embodiment, the output interface 230 outputs the container identifier and the object volume value together. According to another embodiment, the output interface 230 outputs the container identifier, the object information, and the object volume value together.

The memory 240 may store data and commands necessary for the operation of the object volume measuring apparatus 100 . The memory 240 may be implemented as at least one of a volatile storage medium and a non-volatile storage medium, or a combination thereof. The memory 240 may be implemented with various types of storage media. The memory 240 may include a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (eg, SD or XD memory), and a RAM. (RAM, Random Access Memory) SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk , may include at least one type of storage medium among optical disks. According to an embodiment, the memory 240 may correspond to a cloud storage space. For example, the memory 240 may be implemented through a cloud service.

The memory 240 may store an input image, a container region image, and an intermediate output image.

Each step of the method for measuring the volume of an object according to an embodiment of the present disclosure may be performed by various types of electronic devices including a processor. The present disclosure will focus on an embodiment in which the object volume measurement apparatus 100 according to embodiments of the present disclosure performs the object volume measurement method. Therefore, the embodiments described with respect to the object volume measuring apparatus 100 are applicable to the embodiments of the object volume measuring method, and on the contrary, the embodiments described for the object volume measuring method are for the object volume measuring apparatus 100 . Applicable to the embodiments. The object volume measurement method according to the disclosed embodiments is not limited to being performed by the object volume measurement apparatus 100 disclosed in the present disclosure, and may be performed by various types of electronic devices.

In step S302, the object volume measurement apparatus 100 receives an input image captured by a camera.

Next, in step S304, the object volume measurement apparatus 100 detects a standard container area from the input image. According to an embodiment, the object volume measuring apparatus 100 may detect a visual tag from an input image, and detect a standard container area based on the detected visual tag. According to another embodiment, the object volume measuring apparatus 100 may detect a standard container region from an input image using a machine learning model.

Next, in step S306 , the object volume measuring apparatus 100 recognizes the container wall area corresponding to the wall surface of the standard container and the object area corresponding to the object from the image of the standard container area. The object volume measuring apparatus 100 may recognize the container wall area and the object area using the first machine learning model.

Next, in step S308 , the object volume measuring apparatus 100 calculates the number of pixels in each of the container wall area and the object area, and calculates the pixel ratio of the object area. The object volume measuring apparatus 100 calculates the number of pixels in the entire area including the container wall area and the object area. The pixel ratio of the object area corresponds to the ratio of the number of pixels of the object area to the number of pixels of the entire area. The pixel ratio can be defined as a percentage.

Next, in step S310 , the object volume measurement apparatus 100 generates an object volume measurement value based on the pixel ratio of the object area. According to an embodiment, the object volume measurement apparatus 100 may define a pixel ratio as an object volume measurement value. According to another embodiment, the object volume measurement apparatus 100 may define a value obtained by multiplying a pixel ratio by a predetermined reference value as the object volume measurement value.

Standard containers (410, 420, 430) according to an embodiment of the present disclosure include one or a plurality of types. Each type of standard container (410, 420, 430) may have a different shape, size, color, and the like. When the standard containers (410, 420, 430) are photographed by a camera, the upper surface and the front part may have an open shape so that the contained object is visible. In addition, the standard containers (410, 420, 430) have a plurality of types of different sizes, the size may be defined by the width, depth, and height.

The

standard containers

410 , 420 , 430 may have a polyhedral shape. For example, the standard containers (410, 420, 430) may have a rectangular parallelepiped shape as shown in FIG. Standard containers (410, 420, 430) may have a shape in which the upper surface and the front of the rectangular parallelepiped are open.

The

standard containers

410 , 420 , and 430 may have

identifier regions

440a , 440b , and 440c indicating information indicating the container identifier in a predetermined region of the front surface photographed by the camera. The

identifier areas

440a, 440b, and 440c are areas representing container identifier information as visual information. According to an embodiment, a visual tag is disposed in the

identifier areas

440a, 440b, and 440c. The visual tag may include, for example, an ARUCO Marker, a Quick Response (QR) code, a barcode, and the like. The visual tag may include container identifier information corresponding to a corresponding standard container, object information, and the like. According to another embodiment, the

identifier regions

440a, 440b, and 440c may be expressed as characters, patterns, symbols, or the like. Characters, patterns, symbols, etc. may represent container identifier information, object information, etc. corresponding to the corresponding standard container.

According to an embodiment of the present disclosure, the container identification tag 510 is disposed in the identifier area on the front of the standard container 110 . The container identification tag 510 is a configuration corresponding to the visual tag. The container identification tag 510 includes identifier information and object information of the standard container 110 . For example, the identifier information indicates the identification number of the standard container 110 . The object information may correspond to, for example, a category (eg, food, clothing, miscellaneous goods, etc.) of an object contained in the standard container 110 , a product model, a product name, and the like.

When the input image is a moving picture, when a new container identification tag 510 is detected from the frame, the object volume measurement apparatus 100 defines a new standard container area. According to an embodiment, the object volume measurement apparatus 100 may perform operations S304, S306, S308, and S310 based on the detection of the new container identification tag. According to an embodiment of the present disclosure, the object volume measurement system 10 may take stock of the warehouse as a whole while sequentially photographing standard containers in the warehouse while moving the camera.

According to an embodiment, the object volume measurement apparatus 100 detects and stores the container identification tag 510 from an input image corresponding to the video. For example, the object volume measurement apparatus 100 may capture 60 frames per second and store the identification number of the container identification tag 510 identified in the captured image. Next, when the object volume measurement apparatus 100 detects a non-overlapping identification number among the stored container identification tags 510 in a new frame, stores the detected new identification number, and S304, S306 for a new standard container , S308, and S310 are performed. The object volume measurement apparatus 100 repeats the above process with respect to the input image, and acquires the volume measurement value of each standard container.

According to an embodiment of the present disclosure, the

visual tags

610a and 610b may be implemented as ARUCO markers as shown in FIG. 6 . ARUCO marker is a type of two-dimensional visual marker, and consists of a two-dimensional bit pattern of n*n size and a black border area surrounding it. The black border area improves the recognition rate of the marker. The two-dimensional bit pattern inside is composed of a combination of white cells and black cells, and represents predetermined information. According to an embodiment of the present disclosure, the container identification tag 510 may be implemented in the form of an ARUCO marker as shown in FIG. 6 .

According to an embodiment of the present disclosure, the object volume measurement method may perform additional image processing in addition to the processing described with reference to FIG. 3 in the process of calculating a volume measurement value from an input image.

First, the object volume measurement apparatus 100 receives an input image captured by a camera in step S702.

Next, in step S704 , the object volume measuring apparatus 100 converts the input image into a black and white scale. The object volume measuring apparatus 100 converts a pixel value of an input image into a black-and-white scale image having two pixel values corresponding to black and white, respectively, by using a predetermined reference value.

Next, in step S706, the object volume measurement apparatus 100 detects the standard container area from the black-and-white scale image. According to an embodiment, the object volume measuring apparatus 100 may detect a visual tag from an input image, and detect a standard container area based on the detected visual tag. According to another embodiment, the object volume measuring apparatus 100 may detect a standard container region from an input image using a machine learning model.

According to an embodiment, the object volume measurement apparatus 100 detects a standard container area from a black-and-white scale image using the YOLO model. A process of detecting the standard container area will be described with reference to FIG. 8 .

The object volume measurement apparatus 110 inputs the black-and-white scale image 810 into the object recognition model 820 . The object recognition model 820 is, for example, a You Only Look Once (YOLO) model. The object recognition model 820 may correspond to a machine learning model including a plurality of nodes and layers. The object recognition model 820 may include a YOLO model trained to detect the standard container 110 from the input image and output the position, area, and probability of corresponding to the standard container 110 of the standard container 110 .

YOLO is a deep learning framework based on CNN (Convolutional Neural Network). YOLO's Object Recognition leaves a total of 4 positional information when recognizing an object trained in advance from a photo. The four positional information includes an x-coordinate, a y-coordinate, a width, and a height of the recognized object. What kind of object is recognized based on the four pieces of location information can be expressed as a rectangle 832 and text 834 through YOLO.

According to an embodiment of the present disclosure, one model capable of recognizing the standard container 110 through YOLO is manufactured. This model cannot recognize other objects, and is only used to find the standard container 110 from the input black-and-white scale image 810 to obtain four positional information.

The object recognition model 820 using the YOLO model outputs the above-described four pieces of location information. According to an embodiment, the object recognition model 820 outputs an object recognition image 830 in which four pieces of location information are overlaid on a black-and-white scale image 810 . The object recognition image 830 includes a box 832 indicating a standard container area and an indicator 834 indicating a standard container. The indicator 834 indicating the standard container may include a probability that the area corresponding to the box 832 corresponds to the standard container 110 .

According to an embodiment of the present disclosure, the object recognition model 820 may crop the black-and-white scale image 810 to include only the standard container area corresponding to the standard container 110 to generate and output the standard container area image. have.

Referring again to FIG. 7, the next step will be described.

Next, in step S708, the object volume measurement apparatus 100 performs a background removal process. Referring to FIG. 9 , a background removal processing process will be described.

The object volume measurement apparatus 100 inputs the object recognition image 830 to the background removal module 910 . The background removal module 910 generates and outputs the background removal image 920 in which the background is removed from the object recognition image 830 except for the area corresponding to the standard container 110 .

According to an embodiment, the background removal module 910 includes a Graph-cut algorithm. Graph-cut is one of the representative background removal algorithms. In order to remove the background of a photo, the foreground or object of interest must be selected first, and this is referred to as ROI (Region of Interest). The four positional information received from YOLO is used to construct this ROI, and the graph-cut algorithm displays the outer and inner pixels of the ROI as foreground or background. At this time, the clustering operation of the basic graph-cut algorithm is used, the pixels clustered in the foreground are left in color, and the colors of the pixels clustered in the background are all changed to black.

The background removal module 910 outputs a background removal image 920 . According to an embodiment, the background removal image 920 may include a box 832 and an indicator 834 generated by the object recognition model 820 .

Referring again to FIG. 7, the next step will be described.

Next, in step S710 , the object volume measuring apparatus 100 recognizes the container wall area corresponding to the wall surface of the standard container and the object area corresponding to the object from the background removal image 920 . The object volume measuring apparatus 100 may recognize the container wall area and the object area using the first machine learning model. The object volume measuring apparatus 100 generates a first output image in which the container wall area and the object area are recognized. The operation of step S710 will be described in detail with reference to FIG. 10 .

The object volume measuring apparatus 100 generates a first output image 1030 in which the object area and the container area are separated by inputting the background removal image 1010 to the first machine learning model 1020 .

The background removal image 1010 is an image in which the box 832 and the indicator 834 are removed from the background removal image 920 described with reference to FIG. 9 . The first machine learning model 1020 receives the background removal image 1010 and recognizes and classifies an object from the background removal image 1010 .

The first machine learning model 1020 includes a semantic segmentation model. The output of the Semantic Segmentation model is an image in which the object inside the standard container 110 and the wall of the container are displayed in different colors.

Semantic segmentation model is a machine learning model that divides objects in an image into meaningful units. Semantic segmentation model predicts which class each pixel of an image belongs to. According to an embodiment of the present disclosure, the semantic segmentation model defines the object and the container wall inside the standard container 110 as classes, and distinguishes each pixel from the image input to the semantic segmentation model into the object and the container wall.

The semantic segmentation model uses a convolutional neural network (CNN) model. The semantic segmentation model can generate and output a segmentation map indicating the predicted class of each pixel.

The semantic segmentation model generates a first output image 1030 in which an object region and a container wall region are displayed in different colors.

Referring again to FIG. 7, the next step will be described.

Next, in step S712 , the object volume measuring apparatus 100 calculates the number of pixels in each of the container wall area and the object area, and calculates the pixel ratio of the object area. The object volume measuring apparatus 100 calculates the number of pixels in each of the container wall area and the object area from the first output image 1030 . The object volume measuring apparatus 100 calculates the number of pixels in the entire area including the container wall area and the object area. The pixel ratio of the object area corresponds to the ratio of the number of pixels of the object area to the number of pixels of the entire area. The pixel ratio can be defined as a percentage.

According to an embodiment of the present disclosure, the object volume measuring apparatus 100 may calculate the number of pixels in each of the container wall area and the object area by applying different weights to the respective container wall surfaces. A process of calculating the volume of an object according to an embodiment of the present disclosure will be described in detail with reference to FIG. 11 .

The final result of the semantic segmentation model is the first output image 1030 in which the object inside the standard container 110 and the container wall are displayed in two colors. The number of pixels corresponding to each color is determined from the final photo. If the number of pixels in the container wall area is a and the number of pixels in the object area is b, the pixel ratio of the most basic object area can be calculated through Equation 1.

[Equation 1]

R = percentage of pixels in the object area

b = total number of pixels in the object area

c = total number of pixels in the area of the vessel wall

According to an embodiment of the present disclosure, the pixels of the container wall region are divided into a total of four

regions

1121, 1122, 1123, and 1124, and different region weights are given to each

region

1121, 1122, 1123, and 1124. . The region weight has a value between 0 and 1. According to an embodiment of the present disclosure, the weight of each wall surface of the standard container 110 of the rectangular parallelepiped shape may increase in the order of both sides, the front side, and the bottom side. The area weight is used to correct shadows appearing in the input image. However, when the standard container 110 is photographed in an oblique direction from the front with the camera, shadows occur most on both sides in a state where the object covers the lower surface, and then a lot of shadows occur on the front side, and a shadow is formed on the lower surface. doesn't happen Accordingly, an embodiment of the present disclosure increases the weight of each wall surface in the order of both sides, the front side, and the bottom side by reflecting the characteristics of the shadow.

For example, it is assumed that the number of pixels b of the object area inside the standard container 110 is 5000. Among the 5000, the number of pixels belonging to the first area 1121 is 800, the number of pixels belonging to the second area 1122 is 700, the number of pixels belonging to the third area 1123 is 950, and the number of pixels belonging to the third area 1123 is 950. The number of pixels belonging to the region 1124 is 2450. The region weight of the first region 1121 and the third region 1123 is defined as 0.7, the region weight of the second region 1122 is 0.8, and the region weight of the fourth region 1124 is defined as 1. The number of pixels in the first region 1121 is multiplied by 0.7, the number of pixels in the second region 1122 is multiplied by 0.8, the number of pixels in the third region 1123 is multiplied by 0.7, and the pixels in the fourth region 1124 are multiplied by 0.7. After multiplying the number by 1, the number of pixels in the four areas multiplied by the weight is summed. According to this example, the sum of the number of pixels multiplied by the weight is 4340. In this case, an adjustment of approximately 13% was applied to the pixel ratio using the number of pixels multiplied by the area weight to the pixel ratio of the object area calculated using Equation 1 above.

The reason for applying the region weight will be described with reference to FIG. 12 .

The reason why there should be an area weight is because a shadow is generated due to light reflection of the standard container 110 itself. When an image is input to the segmentic segmentation model according to an embodiment of the present disclosure, an error caused by a shadow occurs in the first output image 1210 in which each pixel is classified. For example, in the first output image 1210 of FIG. 12 ,

regions

1220 , 1222 , and 1224 are regions corresponding to errors caused by shadows.

Areas

1220 , 1222 , and 1224 partially include a portion corresponding to the container wall area of the standard container 110 , and include pixels incorrectly recognized as object areas due to shadows.

In the Sementic Segmentation model, if the object inside the standard container 110 has a shadow, it is displayed as the same object. According to an embodiment of the present disclosure, the volume calculation module 1110 divides the weight for the shadow into a total of four regions. The area weight of each area is set to be low in the side where shadows occur the most, that is, the first area 1121 and the third area 1123 . The second area 1122 having less shadow than the first area 1121 and the third area 1123 is set higher than the first area 1121 and the third area 1123 . A weight of 1 is applied to the fourth area 1124 , which is not affected by shadows, in order to reflect the number of pixels as it is.

The volume calculation module 1110 receives the number of pixels and positions of pixels in the object area as inputs, and divides it into four areas. The volume calculation module 1110 may use a beta function that outputs the number of pixels of the object area in each of the four areas from the first output image. In addition, the number of pixels in the object area of each area is multiplied by the weight of each area to output the sum of the number of pixels in the object area to which the weight is applied.

The volume calculation module 1110 calculates the pixel ratio of the weighted object area by using Equation (2).

[Equation 2]

R = percentage of pixels in the object area

c = number of pixels in the object area in the first area weighted

d = number of pixels in the object area in the second area weighted

e = number of pixels in the object area in the third area weighted

f = number of pixels in the object area of the fourth area weighted

a = number of pixels in the wall area

For example, it is assumed that the number of pixels b of the object area inside the standard container 110 is 5000. Among the 5000, the number of pixels belonging to the first area 1121 is 800, the number of pixels belonging to the second area 1122 is 700, the number of pixels belonging to the third area 1123 is 950, and the number of pixels belonging to the third area 1123 is 950. The number of pixels belonging to the region 1124 is 2450. The value of a, which is the number of pixels in the wall area, is 5200. The region weight of the first region 1121 and the third region 1123 is defined as 0.7, the region weight of the second region 1122 is 0.8, and the region weight of the fourth region 1124 is defined as 1. In this case, when the pixel ratio of the object area is calculated using Equation 2, it is calculated as in Equation 3.

[Equation 3]

In this case, the R value is 45.59%.

The next step will be described with reference to FIG. 7 again.

Next, in step S714 , the object volume measurement apparatus 100 generates a volume measurement value based on the pixel ratio of the object area. According to an embodiment, the object volume measurement apparatus 100 may define a pixel ratio of the object area as a volume measurement value. The pixel ratio of the object area may correspond to the R value of Equation 1 or the R value of Equation 2 described above. According to another embodiment, the object volume measurement apparatus 100 may define a value obtained by multiplying the pixel ratio of the object area by a predetermined reference value as the object volume measurement value. According to an embodiment of the present disclosure, the predetermined reference value may correspond to a volume value when the standard container 110 is filled with an object.

The processor 220 includes a black-and-white conversion module 1310 , an object recognition model 820 , a background removal module 910 , a first machine learning model 1020 , and a volume calculation module 1110 . Each block in the processor 220 corresponds to a software module, a hardware module, or a combination of a software module and a hardware module. Therefore, the embodiment of the present disclosure is not limited by the structure of each block in the processor 220 , and each block in the processor 220 may be combined with each other, or one block may be divided into a plurality of blocks.

Since the operation of each module of FIG. 13 is similar to the operation of each step described with reference to FIG. 7 , the operation of each module will be briefly described in FIG. 13 to avoid duplicate description. The operation of the device described with reference to FIG. 7 may also be applied to each module of FIG. 13 .

The black-and-white conversion module 1310 generates a black-and-white scale image by converting the input image 810 into a black-and-white scale. When an image is processed using the machine learning model, since the amount of processing is large, the black-and-white conversion module 1310 converts the input image 810 into a black-and-white scale in order to reduce the amount of processing. By using the black-and-white input image, the processor 220 can process the input image with one channel instead of processing the input image with three channels, R, G, and B, so that the throughput can be reduced.

Next, the object recognition model 820 detects the standard container area from the black-and-white input image. According to an embodiment, the object recognition model 820 may include a YOLO model. The object recognition module 820 generates an object recognition image 830 corresponding to the standard container area and outputs it to the background removal module 910 .

The background removal module 910 generates a background removal image 920 in which a background is removed from the object recognition image 830 except for an area corresponding to the standard container 110 . The background removal module 910 outputs the background removal image 920 to the first machine learning model 1020 .

The first machine learning model 1020 receives the background removed image 920 , and recognizes and classifies an object from the background removed image 920 . The first machine learning model 1020 includes a semantic segmentation model. The output of the Semantic Segmentation model is an image in which the object inside the standard container 110 and the wall of the container are displayed in different colors. The first machine learning model 1020 outputs a first output image 1030 in which the object area and the container wall area are displayed in different colors.

In order to successfully use semantic segmentation, the first machine learning model 1020 may be trained using the Tensorflow API. For example, the first machine learning model 1020 is learned using training data using the background-removed image 920 as input data and image data from which an object is recognized and classified as output data. The input data and the output data may have a predetermined size, and the size may be defined as, for example, 250*250, 128*128, or the like. To generate the training data, any engine or data augmentation algorithm may be used.

The volume calculation module 1110 receives the first output image 1030 from the first machine learning model 1020 . The volume calculation module 1110 generates and outputs a volume calculation value from the first output image 1030 in the manner described above with reference to FIG. 11 .

According to an embodiment of the present disclosure, the first machine learning model 1020 includes an artificial deep neural network of a CNN structure. The CNN structure includes a convolutional product layer and a fully connected layer. The convolutional product layer performs the operation of feature extraction. The synthetic product layer includes a convolution layer, an activation layer, and a pooling layer. The feature of the input vector is extracted from the input vector by the convolutional product layer. After the convolutional product layer, a fully connected layer is placed. The fully connected layer generates an output vector from features extracted from the convolutional product layer. Fully Connected layer is calculated by connecting all nodes between layers.

The first machine learning model 1020 may be learned by training data based on a model including a CNN structure.

The image data 1510 of the standard container area may be generated such that the standard container is disposed in the center. According to an embodiment, the image data 1510 of the standard container area may recognize the side 1512 of the standard container and display it on the image.

The first output image 1520 classifies object types and indicates regions corresponding to each object type with the same pixel value or pattern.

Meanwhile, the disclosed embodiments may be implemented in the form of a computer-readable recording medium storing instructions and data executable by a computer. The instructions may be stored in the form of program code, and when executed by the processor, a predetermined program module may be generated to perform a predetermined operation. Further, the instruction, when executed by a processor, may perform certain operations of the disclosed embodiments.

As described above, the disclosed embodiments have been described with reference to the accompanying drawings. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention may be practiced in other forms than the disclosed embodiments without changing the technical spirit or essential features of the present invention. The disclosed embodiments are illustrative and should not be construed as limiting.

Claims

receiving an input image;

detecting a standard container area corresponding to a predefined standard container from the input image;

recognizing a container wall area corresponding to the wall surface of the standard container and an object area corresponding to an object contained in the standard container using a first machine learning model from the input image;

calculating a pixel ratio of the object area among pixels of the entire area including the container wall area and the object area; and

and generating a volume measurement value of the object based on the pixel ratio.
According to claim 1,

The object volume measurement method further comprises removing a background area excluding the standard container area from the input image,

The step of recognizing the container wall area and the object area is performed using an input image from which the background area has been removed.
3. The method of claim 2,

The step of removing the background area comprises:

defining the standard container region as a region of interest;

defining a region excluding the region of interest from the input image as the background region; and

and generating a first output image in which the background area is displayed as a single pixel value.
According to claim 1,

The first machine learning model corresponds to a segmentic segmentation model,

Recognizing the container wall area and the object area comprises:

Recognizing the container wall area and the object area using the segmentic segmentation model,

The method for measuring object volume further includes converting the container wall area into a first pixel value and converting the object area into a second pixel value.
The method of claim 1,

The standard container is a standard container in the form of a polyhedron partially open,

Calculating the pixel ratio comprises:

recognizing a plurality of wall areas corresponding to each wall surface of the standard container;

dividing the object area into a plurality of sub object areas respectively corresponding to the plurality of wall areas;

calculating a weighted object area pixel count by applying a weight corresponding to each wall surface to the pixel number of each of the plurality of sub object areas; and

and calculating the pixel ratio by using the number of pixels in the weighted object area and the number of pixels in the container wall area.
6. The method of claim 5,

The standard container is a rectangular standard container with an open top and front surfaces,

The input image is an image obtained by photographing the standard container in an oblique line from the open upper surface and the front surface, the object volume measurement method.
7. The method of claim 6,

The weight of each wall surface of the rectangular parallelepiped container is increased in the order of both sides, the front, and the lower surface of the rectangular standard container, the object volume measurement method.
According to claim 1,

wherein the volume measurement value of the object is defined as a percentage of the total volume of the standard container.
According to claim 1,

The input image is a video including a plurality of frames,

The object volume measurement method further comprises extracting a frame in which the standard container area is detected,

The detecting of the standard container area includes using the extracted frame as the input image, an object volume measurement method.
an input interface for receiving an input image;

a memory storing at least one instruction;

at least one processor executing the at least one instruction; and

an output interface;

the at least one processor by executing the at least one instruction,

Detecting a standard container area corresponding to a predefined standard container from the input image,

Recognizing the container wall area corresponding to the wall surface of the standard container and the object area corresponding to the object contained in the standard container using a first machine learning model from the input image,

calculating a pixel ratio of the object area among pixels of the entire area including the container wall area and the object area;

generate a volumetric value of the object based on the pixel ratio;

and outputting a volume measurement value of the object through the output interface.
A computer program recorded on a recording medium for performing an object volume measurement method when executed by a processor, wherein the object volume measurement method comprises:

detecting a standard container area corresponding to a predefined standard container from the input image;

recognizing a container wall area corresponding to the wall surface of the standard container and an object area corresponding to an object contained in the standard container using a first machine learning model from the input image;

calculating a pixel ratio of the object area among pixels of the entire area including the container wall area and the object area; and

and generating a volume measurement value of the object based on the pixel ratio.