WO2021182343A1

WO2021182343A1 - Learning data creation device, method, program, learning data, and machine learning device

Info

Publication number: WO2021182343A1
Application number: PCT/JP2021/008789
Authority: WO
Inventors: 一央岩見
Original assignee: 富士フイルム株式会社
Priority date: 2020-03-13
Filing date: 2021-03-05
Publication date: 2021-09-16
Also published as: JP7531578B2; JPWO2021182343A1

Abstract

The present invention includes a learning image data generation process for generating a learning image by acquiring a photographed image in which a subject (medicine) is photographed and by moving an image of the medicine extracted from the photographed image, and a correct answer data generation process for generating, on the basis of a mask image indicating a region of the medicine in the photographed image, second region information indicative of the region of the medicine in the generated learning image, and defining the generated second region information as correct answer data for the learning image, wherein a pair of the respectively generated learning image and correct answer data is stored as learning data in a memory.

Description

Learning data creation device, method, program, learning data and machine learning device

The present invention relates to a learning data creation device, a method, a program, a learning data, and a machine learning device, and particularly relates to a technique for efficiently creating a large number of learning data.

Conventionally, a visual inspection device has been proposed that learns based on a large number of teaching data stored in a teaching file, recognizes a pattern, and determines a defect (Patent Document 1).

This visual inspection device includes a teaching data generation device that generates new teaching data by transforming the specific teaching data for a specific teaching data having a small number of data among a large number of teaching data in the teaching file. By supplementing the teaching data generated by the teaching data generator to the teaching file and learning, it is possible to inspect defects with a small number of data.

Further, when the teaching data to be generated is image data, the teaching data generation device performs affine transformation including enlargement, reduction, and rotation of the image, and attribute conversion including brightness, contrast, and edge strength. New teaching data is being generated.

Japanese Unexamined Patent Publication No. 2006-48370

By the way, in order to accurately recognize the area of the object in the photographed image from the photographed image of the object by the trained learning model, the area information indicating the image of the object and the area of the object (area information indicating the area of the object) It is necessary to create a large number of pairs with the correct answer data) and machine-learn the learning model using a learning data set consisting of a large number of pairs.

Conventionally, this type of correct answer data is created by displaying the captured image on the display and filling the image of the object pixel by pixel while viewing the captured image displayed on the display, which is troublesome to create the correct answer data. There is a problem that it takes time.

On the other hand, the visual inspection apparatus described in Patent Document 1 uses a camera to image a printed matter or an object on the ground (paper, film, metal, etc.), recognizes a print defect from the captured image, and recognizes the type of defect (defect type (paper, film, metal, etc.)). It separates "holes", "stains", "convex", "streaks", etc.).

Therefore, when the teaching data generator transforms one data (image data) having a small number of data to newly generate a plurality of teaching data, it corresponds to the plurality of teaching data generated by transforming the same image data. The correct answer data is data indicating the same type of defect. That is, Patent Document 1 does not describe the problem that it takes time and effort to create correct answer data for teaching data (teaching image), and does not disclose a technique for solving the problem.

The present invention has been made in view of such circumstances, and is a learning data creation device, method, program, and learning that can efficiently create learning data for machine learning a learning model that recognizes an object region. The purpose is to provide data and machine learning equipment.

The invention according to the first aspect in order to achieve the above object is a learning data creation device including a processor and a memory, in which the processor creates learning data for machine learning, and the processor creates an image of an object. The acquisition process to be acquired, the learning image generation process to move the acquired image of the object to generate the learning image, and the second area information corresponding to the area of the object in the generated learning image are generated. The correct answer data generation process in which the generated second area information is used as the correct answer data for the learning image, and the storage control for storing the pair of the generated learning image and the correct answer data in the memory as the learning data are performed.

According to the first aspect of the present invention, a learning image is generated by moving an image of an object. Further, the second area information corresponding to the area of the object in the generated learning image is generated, and the generated second area information is used as the correct answer data for the learning image. Since the correct answer data can be generated by the correct answer data generation process by the processor, it does not require time and effort to create the correct answer data.

By using the pair of the learning image and the correct answer data generated in this way as the learning data, a lot of learning data can be generated (inflated).

In the learning data creation device according to the second aspect of the present invention, the processor acquisition process acquires the first area information corresponding to the area of the object, and the correct answer data generation process is based on the acquired first area information. It is preferable to generate the second region information.

In the learning data creation device according to the third aspect of the present invention, the first area information is the area information in which the area of the object is manually set, the area information in which the area of the object is automatically extracted by image processing, or the object. It is preferable that the area is automatically extracted by image processing and the area information is manually adjusted.

In the learning data creation device according to the fourth aspect of the present invention, the correct answer data includes the correct image corresponding to the area of the object, the bounding box information surrounding the area of the object with a rectangle, and the edge indicating the edge of the area of the object. It is preferable to include at least one of the information. The correct image includes a mask image.

In the learning data creating apparatus according to the fifth aspect of the present invention, the learning image generation process generates a learning image by translating, rotating, reversing, or scaling the image of the object, and the correct answer data generation process is performed. , It is preferable to generate correct answer data by translating, rotating, reversing, or scaling the first region information corresponding to the image of the object. The generation of the learning image and the generation of the correct answer data may be synchronously generated at the same time, or one of the learning image and the correct answer data may be generated and then the other may be generated.

In the learning data creation device according to the sixth aspect of the present invention, the learning image generation process synthesizes two or more images obtained by translating, rotating, reversing, or scaling the image of the object to create a learning image. In the correct answer data generation process, the correct answer data can be generated by translating, rotating, reversing, or scaling the first region information corresponding to each of the two or more images according to the image of the object. preferable. As a result, it is possible to generate a learning image composed of images of a plurality of objects and correct answer data thereof.

In the learning data creation device according to the seventh aspect of the present invention, in the learning image generation process, when a learning image including an image of a plurality of objects is generated, all or a part of the images of the plurality of objects is used. It is preferable to generate a learning image that makes contact with points or lines.

In the learning data creation device according to the eighth aspect of the present invention, the correct answer data preferably includes an edge image showing only a portion where all or a part of the images of a plurality of objects are in contact with each other by points or lines. As correct answer data for a learning image in which all or a part of images of a plurality of objects are in contact with points or lines, an edge image showing only a portion of the images of a plurality of objects in contact with points or lines can be included. .. This learning data is useful for separating the points of contact with points or lines in the images of a plurality of objects.

In the learning data creation device according to the ninth aspect of the present invention, it is preferable that at least a part of the object is transparent. An image of an object that is at least partially transparent is more difficult to extract than an image of an object that is entirely opaque, and there is less training data. Therefore, training data is generated for an image of an object that is at least partially transparent. Is particularly effective.

In the learning data creating apparatus according to the tenth aspect of the present invention, when the learning image generation process by the processor generates a learning image including an image of a plurality of objects, an object other than the transparent object image is generated. It is preferable to move the image of. In the case of an image of an object that is at least partially transparent, the positional relationship between the image of the object that is arbitrarily moved and the image that the transparent object is placed at the same position and photographed is the illumination light. This is because the images will be different.

In the learning data creating apparatus according to the eleventh aspect of the present invention, at least a part of the object is transparent, and the learning image generation process moves the image of the object within a threshold value to generate a learning image. Is preferable.

According to the eleventh aspect of the present invention, a constraint (threshold value) is set for the movement of the image of the transparent object, and the image of the transparent object is moved within the threshold value to generate a learning image. The learning image generated by moving the image of the transparent object within the threshold value does not have a significant change in the positional relationship with the illumination light, and as a result, the transparent image actually taken at that position is transparent. It matches or substantially matches the image of the object.

In the learning data creation device according to the twelfth aspect of the present invention, the movement preferably includes either parallel movement or rotational movement.

The invention according to the thirteenth aspect is learning data composed of a pair of a learning image generated by moving an image of an object and correct answer data having second area information indicating an area of the object in the learning image. be.

The machine learning device according to the 14th aspect of the present invention includes a learning model and a learning control unit that uses the above learning data to perform machine learning of the learning model.

In the machine learning device according to the fifteenth aspect of the present invention, the learning model is preferably composed of a convolutional neural network.

The invention according to the 16th aspect is a learning data creation method in which a processor creates learning data for machine learning by performing the processing of each of the following steps, the step of acquiring an image of an object and the acquisition. The step of moving the image of the object to generate the learning image, the second area information corresponding to the area of the object in the generated learning image is generated, and the generated second area information is the correct answer to the learning image. It includes a step of making data and a step of storing a pair of a generated learning image and correct answer data in a memory as learning data.

In the learning data creation method according to the 17th aspect of the present invention, the step of generating correct answer data includes the step of acquiring the first area information corresponding to the area of the object, and the step of generating the correct answer data is the first based on the acquired first area information. It is preferable to generate two-region information.

In the learning data creation method according to the eighteenth aspect of the present invention, the correct answer data includes the correct image corresponding to the area of the object, the bounding box information surrounding the area of the object with a rectangle, and the edge indicating the edge of the area of the object. It is preferable to include at least one of the information.

In the learning data creation method according to the nineteenth aspect of the present invention, in the step of generating a learning image, when arranging images of a plurality of objects, all or a part of the images of the plurality of objects is pointed or lined. It is preferable to make contact with.

In the learning data creation method according to the twentieth aspect of the present invention, it is preferable that the correct answer data includes an edge image showing only the points or lines of the images of a plurality of objects that come into contact with each other.

In the learning data creation method according to the 21st aspect of the present invention, it is preferable that at least a part of the object is transparent.

In the learning data creation method according to the 22nd aspect of the present invention, the step of generating a learning image is an object other than a transparent object image when generating a learning image including an image of a plurality of objects. It is preferable to move the image of.

The invention according to the 23rd aspect corresponds to a function of acquiring an image of an object, a function of moving the acquired image of the object to generate a learning image, and a region of the object in the generated learning image. A function of generating second area information and using the generated second area information as correct answer data for a learning image, and a function of storing a pair of the generated learning image and correct answer data in a memory as learning data. It is a learning data creation program realized by a computer.

In the learning data creation program according to the 24th aspect of the present invention, the function of acquiring the first area information corresponding to the area of the object and the function of generating the correct answer data is the first based on the acquired first area information. It is preferable to generate two-region information.

According to the present invention, it is possible to efficiently create learning data for machine learning a learning model that recognizes an object area.

FIG. 1 is a diagram showing a captured image input to the trained learning model and an output result desired to be acquired from the learning model. FIG. 2 is a diagram showing an example of learning data. FIG. 3 is a conceptual diagram showing image processing when correct answer data is automatically created. FIG. 4 is a conceptual diagram showing a method of mass-producing learning data by simulation. FIG. 5 is a diagram showing a first embodiment in which learning data is created by simulation. FIG. 6 is a diagram showing a mode in which one photographed image is generated from two photographed images. FIG. 7 is a diagram showing a second embodiment in which learning data is created by simulation. FIG. 8 is a block diagram showing an example of the hardware configuration of the learning data creation device according to the present invention. FIG. 9 is a plan view showing a drug package in which a plurality of drugs are packaged. FIG. 10 is a block diagram showing a schematic configuration of the photographing apparatus shown in FIG. FIG. 11 is a plan view showing a schematic configuration of the photographing apparatus. FIG. 12 is a side view showing a schematic configuration of the photographing apparatus. FIG. 13 is a block diagram showing an embodiment of the learning data creation device according to the present invention. FIG. 14 is a diagram showing an example of first region information indicating a region of a drug in a captured image acquired by the image acquisition unit and a captured image acquired by the first region information acquisition unit. FIG. 15 is a diagram showing an example of learning data generated from the captured image and the mask image shown in FIG. FIG. 16 is a diagram showing an example of an edge image showing only a portion of contact with a plurality of drug points or lines. FIG. 17 is a diagram showing an example of a photographed image including a plurality of transparent agents and a plurality of opaque agents. FIG. 18 is a diagram used to explain the lens effect of the transparent agent. FIG. 19 is a diagram used to illustrate the limitation of movement of the clearing agent. FIG. 20 is a block diagram showing an embodiment of the machine learning device according to the present invention. FIG. 21 is a flowchart showing an embodiment of the learning data creation method according to the present invention.

Hereinafter, preferred embodiments of the learning data creation device, method, program, learning data, and machine learning device according to the present invention will be described with reference to the accompanying drawings.

[Outline of the present invention]
FIG. 1 is a diagram showing an image input to the trained learning model and an output result desired to be acquired from the learning model.

FIG. 1 (A) is an image of an object (drug in this example) taken, and FIG. 1 (B) is shown in FIG. 1 (A) in a trained learning model (hereinafter referred to as “learned model”). This is the output result that the trained model wants to output when the image shown in) is input.

The output result of the trained model is an inference result inferring the drug region (drug region) shown in FIG. 1 (A), and in this example, it is a mask image in which the drug region and the background region are classified into regions. The inference result is not limited to the mask image, and for example, a bounding box surrounding the drug region with a rectangular frame, coordinates of two diagonal points of the bounding box, or a combination thereof can be considered.

In order to obtain a desired output result (inference result) from an arbitrary input image by the trained model, it is necessary to prepare a large amount of training data for machine learning the unlearned learning model.

FIG. 2 is a diagram showing an example of learning data.

In FIGS. 2A to 2C, the left side is a drug image (learning image), the right side is a correct image (correct data) for the drug image, and the left and right drug images and the correct image are The pair is the training data. The correct image on the right side shown in FIG. 2 is a mask image that distinguishes the region of each drug from the background.

Basically, the learning data requires images of the objects (drugs) on the left side shown in FIGS. 2 (A) to 2 (C), but since there are a small number of drugs such as new drugs, for example, there are some drugs. There is a problem that many images are not collected.

To create a correct answer image for a learning image (for example, an image of a drug), the image of the drug is displayed on the display, and the user fills the area of the drug pixel by pixel while viewing the image displayed on the display. It is common.

In addition, when the correct answer image is automatically created, it can be obtained by calculating the position and rotation angle of the drug by template matching, for example.

FIG. 3 is a conceptual diagram showing image processing when a correct image is automatically created.

For the image (photographed image) ITP obtained by photographing the drug, a template image _Itpl which is an image showing the drug is prepared. When the shape of the drug is not circular, it is preferable to prepare a _{plurality of template images Itpl for each rotation angle to be searched.}

_{Then, by searching for the position where the correlation with the template image Itpl} is the highest from the captured image ITP and the template image with the rotation angle (template matching), the position _{of the template image Itpl when the correlation is the highest, and} Based on the rotation angle of the template image _Itpl , correct answer data indicating the region of the drug in the captured image ITP can be created.

Further, the captured image ITP and the correct image (for example, a mask image) may be superimposed and displayed on the display, and if there is an error in the mask image, the user may correct the mask image on a pixel-by-pixel basis.

[How to mass-produce learning data by simulation]
FIG. 4 is a conceptual diagram showing a method of mass-producing learning data by simulation.

First, prepare a pair of an image of the object (learning image) and a correct image showing the area of the object.

Then, the pair of the learning image and the correct image is synchronized and translated (shifted), rotated, or inverted, or the image that is moved in synchronization and the image before the movement are combined (copy and paste). -Paste) to create training data consisting of a new pair of learning image and correct image. Rotational movement refers to rotating an image of an object around a certain point and moving it to another position. In this example, rotational movement refers to the case of rotating an object around a certain point (for example, the center of gravity), and hereinafter, "rotational movement" is simply referred to as "rotation".

By repeating these operations, learning data can be mass-produced. Further, the creation of the learning data by such a method is simpler than, for example, the case where the correct answer image is created after the creation of a new learning image.

<First embodiment for creating learning data by simulation>
FIG. 5 is a diagram showing a first embodiment in which learning data is created by simulation.

FIG. 5A is a diagram showing a pair of a photographed image of a drug as an object and a mask image manually or automatically generated based on the photographed image.

The present invention creates learning data by simulation from a pair of a photographed image and a mask image (inflates the learning data).

FIG. 5 (B) is a diagram showing a pair of a photographed image and a mask image in which the photographed image and the mask image shown in FIG. 5 (A) are inverted, respectively.

The inverted (left-right inverted) mask image on the right side shown in FIG. 5 (B) is a mask image showing the region of the drug in the inverted photographed image on the left side shown in FIG. 5 (B). Therefore, the inverted captured image can be used as a new learning image, and the inverted mask image can be used as correct answer data for the newly generated learning image.

That is, by synchronizing and inverting the captured image and the mask image shown in FIG. 5 (A), new learning data consisting of the pair of the learning image and the mask image shown in FIG. 5 (B) is created. be able to. The image inversion is not limited to horizontal inversion, but also includes vertical inversion. Alternatively, the image on the left may be created first, and the image on the right may be created by detecting the region of the drug image from the image.

FIG. 5 (C) is a diagram showing an image obtained by adding the images shown in FIGS. 5 (A) and 5 (B).

The photographed image on the left side shown in FIG. 5 (C) can be created by synthesizing the photographed image shown in FIG. 5 (A) and the inverted photographed image shown in FIG. 5 (B). The photographed image on the left side shown in FIG. 5 (C) can be created by synthesizing the photographed image shown in FIG. 5 (A) and the inverted photographed image shown in FIG. 5 (B). That is, the captured image shown in FIG. 5 (C) is an image (drug image) obtained by cutting out a region of the drug in the inverted captured image shown in FIG. 5 (B) on the captured image shown in FIG. 5 (A). It can be created by pasting. The drug image is cut out from the inverted photographed image by the process of cutting out the drug region from the photographed image shown in FIG. 5 (A) based on the inverted mask image of FIG. 5 (B). Can be done.

Further, the method of synthesizing two or more drug images is not limited to the method of using a mask image as described above. For example, using a background image of only the background in which the drug has not been photographed, only the drug image is extracted from FIGS. 5 (A) and 5 (B), and each extracted drug image is combined with the background image. Therefore, the captured image (learning image) shown in FIG. 5C can be generated. Further, in the case of a captured image captured so that the background is black (pixel value is zero), a learning image having each drug image can be generated by adding each captured image.

On the other hand, the mask image on the right side shown in FIG. 5 (C) can be created by adding the mask image shown in FIG. 5 (A) and the inverted mask image shown in FIG. 5 (B). When adding the two mask images, in order to separate the instances, in this example, the pixel value of the drug region of the inverted mask image in FIG. 5C is set to, for example, "0.5", and the background pixels. By adding the values as "0", the pixel values of the two drug regions in the generated mask image are made different.

In this way, the two training data shown in FIGS. 5 (B) and 5 (C) can be created from one training data consisting of the pair of the captured image and the mask image shown in FIG. 5 (A). ..

Further, in the first embodiment described above, the captured image and the mask image shown in FIG. 5 (A) are inverted, and the new captured image (learning image) and the mask image pair shown in FIG. 5 (B) are inverted. The learning data consisting of the above is not limited to this, but the captured image and the mask image shown in FIG. 5 (A) are synchronizedly moved, rotated, or scaled in parallel to create a new learning image. And training data consisting of a pair of mask images may be created. When a margin is generated in the background by translating, rotating, or reducing the captured image and the mask image, respectively, it is preferable to fill the margin with the same pixel value as the background.

Further, in the first embodiment described above, a new photographed image and a mask image are created from the photographed image and the mask image in which one drug is photographed, but a plurality of photographed images and a plurality of photographed images in which a plurality of different agents are separately photographed and A new photographed image and a mask image may be created from the mask image or the photographed image and the mask image in which a plurality of different agents are simultaneously photographed.

FIG. 6 is a diagram showing a mode in which one learning image is generated from two captured images.

In the example shown in FIG. 6 (A), a case where a new learning image having four drug images is generated from two captured images in which two objects (drugs) are captured is shown. There is.

Four drug images are cut out from the two captured images, and the cut out drug images are translated, rotated, or scaled and combined to generate a new learning image including the four drug images.

In the example shown in FIG. 6 (B), the case where one learning image is generated from the two captured images is shown as in FIG. 6 (A), but the four agents in the two captured images are shown. One learning image is generated using three drug images out of the images. As described above, the newly generated learning image does not have to use all the drug images in the two captured images. In addition, the plurality of drug images in one generated learning image may include drug images that have not been subjected to operations such as translation, rotation, or scaling (drug images that do not move). ..

When a new learning image is generated as described above, a mask image corresponding to the newly generated learning image is also generated, and a new pair of the newly generated learning image and the mask image is generated. It becomes learning data.

<Second embodiment for creating learning data by simulation>
FIG. 7 is a diagram showing a second embodiment in which learning data is created by simulation.

FIG. 7A is a diagram showing a pair of a photographed image obtained by photographing the drug and a mask image manually or automatically generated based on the photographed image, and is the same as the pair shown in FIG. 5A. be.

FIG. 7B is a diagram showing drug regions cut out from the photographed image and the mask image shown in FIG. 7A, respectively.

In this example, the area within the rectangular frame surrounding the drug area is defined as the area for cutting out the image (cutout area). Since the drug region is known from the mask image, the image in the rectangular frame surrounding the drug region can be cut out based on the mask image.

FIG. 7 (C) shows an image of a cut-out area cut out from the photographed image and the mask image shown in FIG. 7 (A), respectively. The drug image is cut out from the captured image by a process (drug image acquisition process) of cutting out a drug region from the captured image shown in FIG. 7 (A) based on the mask image shown in FIG. 7 (A). Can be done. Since the mask image shown in FIG. 7A has information indicating the drug region (first region information), the drug region (hereinafter referred to as “drug mask image”) can be cut out from the mask image. can. Further, the drug image acquisition process may include a process of reading an image in a state after being cut out from a memory or the like.

FIG. 7 (D) is a diagram showing a new photographed image and a mask image created by pasting the cut out drug image and drug mask image at an arbitrary position and an arbitrary rotation angle.

The captured image and mask image shown in FIG. 7 (D) are training data composed of a new pair of learning image and correct answer data created by the above image processing from the captured image and mask image shown in FIG. 7 (A). It becomes.

FIG. 7 (E) is a diagram showing a new photographed image (learning image) and a mask image created by pasting the cut out drug image and drug mask image at an arbitrary position and an arbitrary rotation angle. In particular, a plurality of drug images are created so as to be in contact with each other by dots or lines.

In order to improve the inference result in the learning model, it is necessary to create a large amount of learning data in which the drugs are in contact with each other by points or lines. This is because it is more difficult to accurately infer the region of each drug from the photographed image in which the drugs are in contact with each other by a point or a line, as compared with the case where each drug is isolated without contact.

The mask image on the left side shown in FIG. 7 (E) is preferably image-processed so that the drug regions do not come into contact with each other. Since the contact points of the drug regions are known, the contact points can be prevented from contacting each other by replacing the contact points with a background color.

Further, when each drug in which the drugs are in contact with each other by a point or a line is the same drug, it is preferable to make the pixel value of the drug region different for instance separation. In this case, since each drug region in the mask image can be recognized by the difference in the pixel value, it is not necessary to replace the portion where the drug region contacts with the background color.

As described above, a lot of learning data can be created based on the photographed image in which the drug was photographed and the first area information (mask image) indicating the area of the drug in the photographed image.

[Configuration of learning data creation device]
FIG. 8 is a block diagram showing an example of the hardware configuration of the learning data creation device according to the present invention.

The learning data creation device 1 shown in FIG. 8 can be configured by, for example, a computer, and is mainly composed of an image acquisition unit 22, a CPU (Central Processing Unit) 24, an operation unit 25, a RAM (Random Access Memory) 26, and a ROM (Read). It is composed of Only Memory) 27, a memory 28, and a display unit 29.

The image acquisition unit 22 acquires a photographed image in which the drug is photographed by the photographing device 10 from the photographing device 10.

The drug photographed by the imaging device 10 is, for example, a drug for one dose or an arbitrary drug, which may be contained in a drug package or may not be contained in the drug package.

FIG. 9 is a plan view showing a drug package in which a plurality of drugs are packaged.

The drug package TP shown in FIG. 9 is a package in which a plurality of drugs to be taken at one time are stored in a transparent package and packed one by one. The drug package TPs are connected in a band shape as shown in FIGS. 11 and 12, and have a cut line that enables each drug package TP to be separated. In the drug package TP shown in FIG. 9, six drug Ts are packaged in one package.

FIG. 10 is a block diagram showing a schematic configuration of the photographing apparatus shown in FIG.

The imaging device 10 shown in FIG. 10 includes two

cameras

12A and 12B for photographing the drug, two

lighting devices

16A and 16B for illuminating the drug, and a photographing control unit 13.

11 and 12 are a plan view and a side view showing a schematic configuration of the photographing apparatus, respectively.

The medicine package TP is placed on a transparent stage 14 installed horizontally (xy plane).

The

cameras

12A and 12B are arranged so as to face each other with the stage 14 in the direction orthogonal to the stage 14 (z direction). The camera 12A faces the surface of the medicine package TP and photographs the medicine package TP from above. The camera 12B faces the back surface of the medicine package TP and photographs the medicine package TP from below.

A lighting device 16A is provided on the side of the camera 12A and a lighting device 16B is provided on the side of the camera 12B with the stage 14 in between.

The lighting device 16A is arranged above the stage 14 and illuminates the medicine package TP placed on the stage 14 from above. The illuminating device 16A has four light emitting units 16A1 to 16A4 arranged radially, and irradiates the illuminating light from four orthogonal directions. The light emission of each light emitting unit 16A1 to 16A4 is individually controlled.

The lighting device 16B is arranged below the stage 14 and illuminates the medicine package TP placed on the stage 14 from below. The illuminating device 16B has four light emitting units 16B1 to 16B4 arranged radially like the illuminating device 16A, and irradiates the illuminating light from four orthogonal directions. The light emission of each light emitting unit 16B1 to 16B4 is individually controlled.

Shooting is done as follows. First, the medicine package TP is photographed from above using the camera 12A. At the time of shooting, the light emitting units 16A1 to 16A4 of the lighting device 16A are sequentially emitted to emit four images, and then the light emitting units 16A1 to 16A4 are simultaneously emitted to emit one image. I do. Next, each light emitting unit 16B1 to 16B4 of the lower lighting device 16B is made to emit light at the same time, a reflector (not shown) is inserted, the medicine package TP is illuminated from below through the reflector, and the medicine package is illuminated from above using the camera 12A. Take a picture of TP.

The four images taken by sequentially emitting light from each of the light emitting units 16A1 to 16A4 have different illumination directions, and when there is a marking (unevenness) on the surface of the drug, the appearance of the shadow due to the marking is different. Become. These four captured images are used to generate an engraved image that emphasizes the engraving on the surface side of the drug T.

One image taken by simultaneously emitting light of each light emitting unit 16A1 to 16A4 is an image having no uneven brightness, and is used, for example, when cutting out an image (drug image) on the surface side of the drug T, and also. It is a photographed image on which the engraved image is superimposed.

Further, the image in which the medicine package TP is illuminated from below via the reflector and the medicine package TP is photographed from above using the camera 12A is a photographed image used when recognizing a plurality of drug T regions. ..

Next, using the camera 12B, the medicine package TP is photographed from below. At the time of shooting, the light emitting units 16B1 to 16B4 of the lighting device 16B are sequentially emitted to emit four images, and then the light emitting units 16B1 to 16B4 are simultaneously emitted to emit one image. I do.

The four captured images are used to generate an engraved image emphasizing the engraving on the back surface side of the drug T, and one image taken by simultaneously emitting light of each light emitting unit 16B1 to 16B4 has uneven brightness. It is a non-existent image, for example, a photographed image used when cutting out a drug image on the back surface side of the drug T, and on which an engraved image is superimposed.

The imaging control unit 13 shown in FIG. 10 controls the

cameras

12A and 12B and the

lighting devices

16A and 16B, and photographs 11 times for one medicine package TP (6 times with the camera 12A and 5 times with the camera 12B). (Shooting).

Further, the photographing is performed in a dark room, and the light emitted to the medicine package TP at the time of photographing is only the illumination light from the lighting device 16A or the lighting device 16B. Therefore, of the 11 captured images taken as described above, the background is the light source of the image in which the medicine package TP is illuminated from below via the reflector and the medicine package TP is photographed from above using the camera 12A. (White), and the area of each drug T is shielded from light and becomes black. On the other hand, in the other 10 captured images, the background is black, and the region of each drug is the color of the drug.

Even if the medicine package TP is illuminated from below via a reflector and the medicine package TP is photographed from above using the camera 12A, the whole drug is transparent (semi-transparent), or a part or a part thereof. In the case of capsules (partially transparent drugs) in which powder or granular medicine is filled in all transparent capsules, light is transmitted from the area of the drug, so that it does not turn black like an opaque drug.

Returning to FIG. 8, the learning data creation device 1 machine-learns a learning model that infers the drug from the captured image in which the drug is captured (particularly, infers the region of each drug T existing in the captured image). It creates learning data.

Therefore, the image acquisition unit 22 of the learning data creation device 1 is a captured image (that is, a reflector) used when recognizing a plurality of regions of the drug T among the 11 captured images captured by the imaging device 10. It is preferable to illuminate the medicine package TP from below via the camera 12A and acquire a photographed image of the medicine package TP taken from above using the camera 12A).

The memory 28 is a storage portion for storing learning data, and is, for example, a non-volatile memory such as a hard disk device or a flash memory.

The CPU 24 uses the RAM 26 as a work area, uses various programs including a learning data creation program stored in the ROM 27 or the memory 28, and executes various processes of the present apparatus by executing the programs.

The operation unit 25 includes a keyboard and a pointing device (mouse, etc.), and is a part for inputting various information and instructions by the user's operation.

The display unit 29 displays a screen required for operation on the operation unit 25, functions as a part that realizes a GUI (Graphical User Interface), and can display a captured image or the like.

[Embodiment of learning data creation device]
FIG. 13 is a block diagram showing an embodiment of the learning data creation device according to the present invention.

The learning data creating device 1 shown in FIG. 13 is a functional block diagram showing a function executed by the hardware configuration of the learning data creating device 1 shown in FIG. 8, and includes a processor 2 and a memory 28.

The processor 2 is composed of the image acquisition unit 22, the CPU 24, the RAM 26, the ROM 27, the memory 28, and the like shown in FIG. 8, and performs various processes shown below.

The processor 2 functions as an acquisition unit 20, a learning image generation unit 30, a correct answer data generation unit 32, and a memory control unit 34.

The acquisition unit 20 includes an image acquisition unit 22 and a first area information acquisition unit 23.

The image acquisition unit 22 acquires the photographed image ITP obtained by photographing the drug T from the photographing device 10 as described above (performs the acquisition process of the photographed image).

The first area information acquisition unit 23 acquires information (first area information) indicating the area of the drug in the captured image ITP acquired by the image acquisition unit 22. This first area information is correct answer data for the inference result inferred by the learning model when the captured image is used as an input image for machine learning of the learning model. The first region information, which is the correct answer data, indicates a correct answer image (for example, a mask image) showing the region of the drug in the captured image, bounding box information surrounding the region of the drug with a rectangle, and an edge of the region of the drug. It is preferable to include at least one of the edge information.

FIG. 14 is a diagram showing an example of the first region information showing the captured image acquired by the image acquisition unit and the region of the drug in the captured image acquired by the first region information acquisition unit.

The photographed image ITP shown in FIG. 14 (A) is an image in which the drug package TP is illuminated from below via a reflector and the drug package TP (see FIG. 9) is photographed from above using the camera 12A. Six drugs T1 to T6 are packaged in this drug package TP.

The agents T1 to T3 shown in FIG. 14A are opaque agents that block the illumination light from below, and thus are photographed in black. Since the drug T4 is a transparent drug, the illumination light from below is transmitted and the image is taken in white. The agents T5 and T6 are capsules of the same type, and because part of the illumination light from below leaks, they are partially photographed in white.

FIG. 14B is first region information showing regions of each drug T1 to T6 in the captured image ITP, and is a mask image IM in this example.

In the mask image IM, for example, the captured image ITP is displayed on the display unit 29, and while viewing the captured image ITP displayed on the display unit 29, the user uses a pointing device such as a mouse to display the regions of the respective agents T1 to T6. Can be created by filling in pixel units. For example, a binarized mask image IM can be created by setting the pixel value of each of the filled agents T1 to T6 to "1" and the pixel value of the background area to "0".

Although the capsule-shaped agents T5 and T6 are of the same type, it is preferable that the pixel values of the regions of the two agents T5 and T6 are different for instance separation. For example, the pixel value in the region of the drug T5 can be set to "1", and the pixel value in the region of the drug T6 can be set to "0.5".

In the above example, the mask image IM, which is the first area information, is the area information generated when the user manually sets the areas of the respective agents T1 to T6 in the captured image ITP using the pointing device. However, the present invention is not limited to this, and the drug region in the captured image may be automatically extracted and generated by image processing, or the drug region in the captured image may be automatically extracted by image processing and manually adjusted. It may be generated by doing.

Returning to FIG. 13, the learning image generating unit 30 receives the captured image ITP taken the drug from the image acquiring unit 22, to move the drug from the photographed image ITP input learning image (I _A, I _B , _{I C,} ...) to generate. That is, the learning image generator 30, a plurality of learning images _{_{_{(I A, I B, I}}} C, ...) the learning image generation processing for generating performed based on the captured image ITP.

The movement of the drug captured in the captured image ITP may be performed by the user instructing the position and rotation of the drug image by the pointing device, or as described with reference to FIG. 5, the captured image is inverted. Or by addition or the like. Further, the drug may be moved by randomly determining the position and rotation of the drug image using a random number. In this case, it is necessary to prevent the drug images from overlapping.

Correct answer data generator 32, the first area information acquisition unit 23 inputs the mask image IM which is the first area information inputted mask image IM from a plurality of learning images _{_{_{(I A, I B, I}}} C, ... ), A plurality of correct answer data (I _a , I _b , I _c , ...) Are generated. That is, correct answer data generating unit 32, a plurality of learning images on the basis of the mask image _{IM (I A, I B,} I C, ...) to generate a second area information indicating the area of the drug in the generated second performing region information several learning images _{_{_{(I a, I B, I}}} C, ...) a plurality of answer data against each _{_{_{(I a, I b, I}}} c, ...) the correct answer data generation processing to be.

Incidentally, a plurality of learning images _{_{_{(I A, I B, I}}} C, ...), and a plurality of answer data _{_{_{(I a, I b, I}}} c, ...) generated in the first to create a training data by simulation As described in the embodiment and the second embodiment, the photographed image obtained by photographing the drug and the first area information (for example, a mask image) indicating the area of the drug in the photographed image are used to obtain the photographed image and the photographed image. Generated by inverting, paralleling, rotating, or scaling the mask images in synchronization with each other, or by parallel-moving, rotating, scaling, or pasting the photographed image and the drug image and the drug mask image cut out from the mask image. can do.

The storage control unit 34, the learning image generated by the learning image generating section _{_{30 (I A, I B,}} I C, ...) and correct answer data generated by the correct answer data generating unit 32 _{_(I} a, I _b , _{I c,} ...) and enter the correct answer and the corresponding pairs (learning image _{I a} data _{I a,} learning image _{I B} and solution data _{I b,} learning image _{I C} and solution data _{I c,} ... ,) Is stored in the memory 28.

As a result, a lot of learning data is stored and accumulated in the memory 28. Although not shown in FIG. 8, it is preferable that the pair of the photographed image ITP and the mask image IM input to the learning image generation unit 30 and the correct answer data generation unit 32 are also stored in the memory 28 as learning data. ..

FIG. 15 is a diagram showing an example of learning data generated from the photographed image and the mask image shown in FIG.

FIG. 15 (A) shows the training data consisting of pairs of learning images _{I A} and solution data (mask image) _{I a,} FIG. 15 (B) is the learning image _{I B} and the mask image _{I b} The training data consisting of pairs is shown.

In the learning image _{I A} shown in FIG. 15 (A), capsular drug T5, T6 is in contact with the line, drug T2, T3, T4 are in contact at a point with each other. Mask image I _a corresponding to the learning image I _A, by varying the pixel values of the drug T5, T6 region of the same drug, allows for instance the separation area of the drug T5, T6, and the line The boundaries of the drugs T5 and T6 that are in contact with each other are also distinguishable.

The mask image I _a is a portion in contact with each other point of the drug T2, T3, T4, by the same as the pixel value of the background, as each drug T2, T3, T4 is not in contact with each other, The regions of each drug T2, T3, and T4 are clarified.

Further, in the learning image _{I B} shown in FIG. 15 (B), a capsule form medicament T5, T6 is in contact with the line, drug T6 and drug T3 are in contact at a point. Mask image I _b corresponding to the learning image I _B is varied pixel values of the drug T5, T6 of the region which is the same drug (e.g., a pixel value of the region of the drug T6 to "0.5" ), It is possible to separate the instances of the regions of the drugs T5 and T6, and to distinguish the boundary between the drugs T5 and T6 which are in contact with each other by a line and the boundary between the drug T6 and the drug T3 which are in contact with each other by a point.

The learning data shown in FIG. 15 is an example, and drug images showing each drug T1 to T6 are arranged in combination of translation, rotation, etc., and drug mask images showing regions of each drug T1 to T6 are arranged in the same manner. By doing so, a lot of learning data can be created.

In this case, it is preferable to generate learning data by arranging a part or all of the plurality of drug images so as to contact each other with points or lines. This is because the trained learning model machine-learned by such training data correctly infers the region of each drug when the photographed image obtained by photographing the drug in contact with a point or a line is used as an input image.

Further, when a transparent drug T4 is photographed as in the photographed image ITP shown in FIG. 14 (A), the illumination light from below is transmitted and the image is photographed white, but it is illuminated depending on the position and angle of the drug T4. The light transmission status changes. That is, the drug image of the transparent drug T4 is an image in which the brightness distribution and the like differ depending on the position and angle of the transparent drug T4 in the imaging region.

Therefore, when the drug image is moved to generate a learning image from the captured images of a plurality of drugs including the transparent drug, the transparent drug image is not moved and the image of the drug other than the transparent drug image is generated. It is preferable to move the image to generate a learning image.

Further, in this example, the mask image is generated as the correct answer data, but it can be used as the edge information (edge image) for each drug image indicating the edge of the area of the drug image. When the drugs are in contact with each other by dots or lines, it is preferable to replace the points of contact with the dots or lines with a background color and separate the edge images for each drug.

Furthermore, when the drugs come into contact with each other by points or lines, an edge image showing only the points or lines of contact may be generated as correct answer data.

FIG. 16 is a diagram showing an example of an edge image showing only a portion of contact with a plurality of drug points or lines.

The edge image IE shown in FIG. 16 is an image showing only the locations E1 and E2 in which two or more drugs out of the plurality of drugs T1 to T6 are in contact with each other by a point or a line, and is an image shown by a solid line on FIG. Is. The region shown by the dotted line on FIG. 16 indicates a region in which a plurality of agents T1 to T6 are present.

The edge image of the portion E1 in contact with the line is an image of the portion where the capsule-shaped agents T5 and T6 are in contact with the line, and the edge image of the portion E2 in contact with the point is an image of the three agents T2 to T4. It is an image of a place where they are in contact with each other at a point.

Since the arrangement of each drug image in the learning image is known, the location where two or more drugs out of the plurality of drugs contact with each other by a point or a line is also known. Therefore, the correct answer data generation unit 32 shown in FIG. 13 automatically generates an edge image (correct answer data) showing only the points or lines that come into contact with the learning image generated by the learning image generation unit 30. Can be created in.

Edge image IE shown in FIG. 16 may be a correct answer data corresponding to the learning image I _A shown in FIG. 15 (A). That is, it is the learning data consisting of pairs of edge image IE shown in learning images I _A and 16 shown in FIG. 15 (A).

Such learning data is used when a learning model is machine-learned in which a drug image obtained by photographing a drug in contact with a point or a line is used as an input image and an edge image of only the part in contact with the point or a line is output as an inference result. Can be used.

Further, the edge image (inference result) of only the points or lines that come into contact with each other is, for example, a drug image obtained by photographing a plurality of drugs that come into contact with points or lines, and an edge image of only the points or lines that come into contact with each other. Can be used as an input image (multi-channel input image) as a learning model for inferring multiple drug regions. According to this learning model, in addition to the input image, information on the points or lines of contact is input, so that the region of each drug can be inferred more accurately.

FIG. 17 is a diagram showing an example of a photographed image containing a plurality of transparent agents and a plurality of opaque agents.

In the photographed image shown in FIG. 17A, a medicine package in which a plurality of transparent agents and a plurality of opaque agents are packaged is illuminated from the upper lighting device 16A (light emitting units 16A1 to 16A4) as shown in FIG. However, it is an image of the medicine package taken from above using the upper camera 12A.

The photographed image shown in FIG. 17B is an image obtained by illuminating the same medicine package from the lower lighting device 16B (light emitting units 16B1 to 16B4) via a reflector and taking the medicine package from above using the camera 12A. be.

In the case of an opaque drug, the drug image shown in FIG. 17B is only silhouette information (the image taken in black on FIG. 17), which is suitable for acquiring edge information of the drug image.

FIG. 18 is a diagram used to explain the lens effect of the transparent drug, and is an image taken by the same method as the drug image shown in FIG. 17 (B).

As shown in FIG. 18, all the opaque agents are images taken in black with only silhouette information. Therefore, the edge information of the drug image of the opaque drug is not affected by the relative position between the position of the opaque drug and the position of the lighting device 16B (light emitting units 16B1 to 16B4), and has uniform edge information. .. Therefore, by cutting out the drug image of the opaque drug from the photographed image and performing operations such as translation and rotation of each of the extracted drug images, it is equivalent to the photographed image in which the drug placed at that position or the like is actually photographed. It is possible to generate a captured image of. Further, since a large amount of edge information of the drug image of the opaque drug can be obtained, the number of pixels required for photographing can be greatly reduced by removing the texture information on the surface of the drug image of the opaque drug.

On the other hand, in FIG. 18, in the case of the transparent drug surrounded by the bounding box, it cannot be moved in the same manner as the opaque drug. This is because the edge information of the image of the transparent agent taken in an imaging environment rich in transmitted light changes greatly depending on the relative positional relationship between the lighting device 16B and the transparent agent due to the lens effect of the transparent agent itself.

The four transparent agents shown in FIG. 18 are agents in which the transmitted light transmitted through the agent occupies a large proportion (for example, the signal in the transmitting portion: the signal in the non-transmissive portion = the agent having 5: 1 or more) and have a capsule shape. The same capsule that has.

In FIG. 18, the transparent agent arranged horizontally and the transparent agent arranged vertically have significantly different edge information, and even if the transparent agent is arranged in the same direction, the transparent agent is arranged. Edge information differs depending on the position.

FIG. 19 is a diagram used to explain the limitation of movement of the transparent drug.

Since the edge information of the transparent agent image included in the photographed image shown in FIG. 19B differs depending on the position of the illumination and the position and orientation of the transparent agent, the image of the transparent agent cut out from the photographed image is displayed. If a new learning image is generated by moving the image without any restrictions, a learning image in a situation different from the actual one is generated.

Therefore, for the generation of learning images containing transparent agents, refer to the information on transparent agents before cutting out. For example, in the present embodiment, when pasting the image of the transparent drug cut out, the position and orientation of the transparent drug before cutting out are utilized, and for example, the position and orientation before cutting out are pasted. ..

In FIG. 19, the image of the transparent agent cut out from the captured image of FIG. 19 (B) can be pasted at the position (a) of FIG. 19 (A) in the generation of a new learning image. , It is not possible to paste at position (b). It is desirable that the learning image covers all variations (position, orientation, etc.), but it is practically impossible. Therefore, when moving the transparent agent to be attached, the amount of translation and / or the amount of rotation from the position and orientation of the transparent agent before cutting is limited within the respective threshold values.

For example, the threshold of the amount of parallel movement is set to n pixels, the threshold of the amount of rotation is set to m degrees, and the parallel movement and / or rotational movement of the transparent drug exceeding these thresholds is restricted.

When the transparent agent is translated and / or rotated, the edge information of the transparent agent is determined according to the shooting environment (illumination position, camera position, shooting angle of view, etc.) and the shape and size of the transparent agent. The mode of change is different. Therefore, it is preferable that the threshold value of the translation amount and the threshold value of the rotation amount are set by simulating the translation amount and / or the rotation movement amount in a range in which the edge information of the transparent agent can be regarded as hardly changing.

[Machine learning device]
FIG. 20 is a block diagram showing an embodiment of the machine learning device according to the present invention.

The machine learning device 50 shown in FIG. 20 is composed of a learning model (a convolutional neural network (CNN) which is one of the learning models) 52, a loss value calculation unit 54, and a parameter control unit 56. ..

This machine learning device 50 is created by the learning data creating device 1 shown in FIG. 13, and uses the learning data stored in the memory 28 to machine-learn the CNN 52.

The CNN 52 is a part that infers the region of the drug shown in the input image when the captured image obtained by photographing the drug is used as the input image, has a plurality of layer structures, and holds a plurality of weight parameters. .. Weight parameters include the filter coefficients of a filter called the kernel used for convolution operations in the convolution layer.

CNN52 can change from an unlearned learning model to a learned learning model by updating the weight parameter from the initial value to the optimum value.

The CNN 52 includes an input layer 52A, an intermediate layer 52B having a plurality of sets composed of a convolution layer and a pooling layer, and an output layer 52C, and each layer has a structure in which a plurality of "nodes" are connected by "edges". It has become.

A learning image to be learned is input to the input layer 52A as an input image. The learning image is a learning image in the learning data (learning data consisting of a pair of the learning image and the correct answer data) stored in the memory 28.

The intermediate layer 52B has a plurality of sets including a convolution layer and a pooling layer as one set, and is a portion for extracting features from an image input from the input layer 52A. The convolution layer filters nearby nodes in the previous layer (performs a convolution operation using the filter) and acquires a "feature map". The pooling layer reduces the feature map output from the convolution layer to a new feature map. The "convolution layer" plays a role of feature extraction such as edge extraction from an image, and the "pooling layer" plays a role of imparting robustness so that the extracted features are not affected by translation or the like.

The intermediate layer 52B is not limited to the case where the convolution layer and the pooling layer are set as one set, but may also include the case where the convolution layers are continuous, the activation process by the activation function, and the normalization layer.

The output layer 52C is a part that outputs a feature map showing the features extracted by the intermediate layer 52B. Further, in the trained CNN 52, the output layer 52C outputs, for example, an inference result in which the drug region or the like shown in the input image is region-classified (segmented) in pixel units or in units of several pixels as a group.

Arbitrary initial values are set for the coefficient and offset value of the filter applied to each convolution layer of CNN52 before learning.

The loss value calculation unit 54 of the loss value calculation unit 54 and the parameter control unit 56 that function as the learning control unit is a feature map output from the output layer 52C of the CNN 52 and correct answer data for the input image (learning image). A certain mask image (a mask image read from the memory 28 corresponding to the learning image) is compared, and an error (loss value which is a value of the loss function) between the two is calculated. As a method of calculating the loss value, for example, softmax cross entropy, sigmoid, etc. can be considered.

The parameter control unit 56 adjusts the weight parameter of the CNN 52 by the back-propagation method based on the loss value calculated by the loss value calculation unit 54. In the error back-propagation method, the error is back-propagated in order from the final layer, the stochastic gradient descent method is performed in each layer, and the parameter update is repeated until the error converges.

This weight parameter adjustment process is repeated, and learning is repeated until the difference between the output of CNN52 and the mask image which is the correct answer data becomes small.

The machine learning device 50 repeats machine learning using the learning data stored in the memory 28, so that the CNN 52 becomes a trained model. When the trained CNN 52 inputs an unknown input image (captured image obtained by photographing the drug), the trained CNN 52 outputs an inference result such as a mask image showing a region of the drug in the captured image.

As CNN52, R-CNN (Regions with Convolutional Neural Networks) can be applied. In R-CNN, the bounding box of different sizes is slid in the captured image ITP to detect the area of the bounding box in which the drug enters. Then, the edge of the drug is detected by evaluating only the image portion in the bounding box (extracting the CNN feature amount). Further, instead of R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN, SVM (Support vector machine) and the like can be used.

The inference result of the trained model constructed in this way can be used, for example, when an image of each drug is cut out from a photographed image of a drug package in which a plurality of drugs are packaged. The image of each drug cut out is used when auditing / distinguishing each drug contained in the drug package.

By the way, as described above, the memory 28 stores a lot of learning data created by simulation based on the photographed image obtained by photographing the drug and the correct answer data indicating the region of the drug in the photographed image. The image is preferably an image of a drug handled by the own pharmacy. By creating learning data using captured images of drugs handled by the own pharmacy and constructing a learning model using the learning data, auditing and discrimination of the drugs handled by the own pharmacy is performed. This is because the learning model can be used effectively in some cases.

[Learning data creation method]
FIG. 21 is a flowchart showing an embodiment of the learning data creation method according to the present invention.

The processing of each step shown in FIG. 21 is performed by, for example, the processor 2 of the learning data creation device 1 shown in FIG.

In FIG. 21, the image acquisition unit 22 acquires a photographed image ITP (for example, the photographed image ITP shown in FIG. 14A) obtained by photographing the drug from the photographing apparatus 10 (step S10). The photographed image ITP shown in FIG. 14 (A) is an image in which the drug package is illuminated from below via a reflector and the drug package is photographed from above the drug package. It is not limited to those taken in this way. Further, the drug to be photographed may not be contained in the drug package, or the number of drugs may be one.

Further, the first region information acquisition unit 23 uses a mask image IM (for example, the mask image IM shown in FIG. 14B) as the first region information indicating the region of the drug in the captured image acquired by the image acquisition unit 22. Acquire (step S12). The mask image IM is manually or automatically generated based on the captured image ITP and stored in the memory 28 or the like.

Subsequently, the learning image generation unit 30 moves the agents T1 to T6 from the captured image ITP acquired in step S10 to generate a learning image (step S14). The learning image can be generated by image processing in which a drug image showing each drug is translated, inverted, rotated, or scaled. If the drug to be moved is a transparent drug, it is not moved or the movement is restricted within the threshold value.

Further, the correct answer data generation unit 32 generates correct answer data (mask image) corresponding to the learning image generated in step S14 based on the mask image IM acquired in step S12 (step S16). That is, in step S16, the region of each drug in the mask image IM is arranged in the same manner as each drug in the learning image, the second region information indicating the region of each arranged drug is generated, and the generated second region information is generated. Is used as the correct answer data (mask image) for the learning image.

The storage control unit 34 stores the pair of the learning image generated in step S14 and the mask image generated in step S16 in the memory 28 as learning data (step S18). 15 (A) and 15 (B) show an example of learning data composed of a pair of a learning image and a mask image generated as described above and stored in the memory 28.

Subsequently, the processor 2 determines whether or not to end the generation of the training data (step S20). For example, when there is an instruction input from the user to end the generation of learning data, or when the creation of a predetermined number of learning data from one pair of captured image ITP and mask image is completed, the learning data It can be determined that the generation is completed.

When it is determined that the generation of the learning data has not been completed (in the case of "No"), the process returns to step S14 and step S16, and the next learning data is created in steps S14 to S20.

When it is determined that the generation of the training data is completed (in the case of "Yes"), the creation of the training data based on the captured image ITP and the mask image IM acquired in steps S10 and S12 is completed.

Needless to say, when another captured image ITP and mask image IM are acquired in steps S10 and S12, a plurality of learning data based on the captured image ITP and mask image IM are created.

[others]
In the present embodiment, the image of the object is an image obtained by photographing the object, but the present invention is not limited to this, and includes, for example, an image created by CAD (computer-aided design) data of the object. Further, in the present embodiment, a drug has been described as an example as an object, but the object is not limited to this, and for example, an industrial product including a medical device, its parts, an agricultural product, or a microorganism photographed with a microscope or the like is used. include.

Further, the hardware structure of the learning data creation device according to the present invention, for example, a processing unit that executes various processes such as a CPU 24, is various processors as shown below. .. For various processors, the circuit configuration can be changed after manufacturing the CPU (Central Processing Unit), FPGA (Field Programmable Gate Array), etc., which are general-purpose processors that execute software (programs) and function as various processing units. Includes a dedicated electric circuit, which is a processor with a circuit configuration specially designed to execute a specific process such as a programmable logic device (PLD), an ASIC (Application Specific Integrated Circuit), etc. Is done.

One processing unit may be composed of one of these various processors, or may be composed of two or more processors of the same type or different types (for example, a plurality of FPGAs or a combination of a CPU and an FPGA). You may. Further, a plurality of processing units may be configured by one processor. As an example of configuring a plurality of processing units with one processor, first, one processor is configured by a combination of one or more CPUs and software, as represented by a computer such as a client or a server. There is a form in which a processor functions as a plurality of processing units. Secondly, as typified by System On Chip (SoC), there is a form in which a processor that realizes the functions of the entire system including a plurality of processing units with one IC (Integrated Circuit) chip is used. be. As described above, the various processing units are configured by using one or more of the above-mentioned various processors as a hardware-like structure.

More specifically, the hardware structure of these various processors is an electric circuit (circuitry) that combines circuit elements such as semiconductor elements.

Further, the present invention includes a learning data creation program that realizes various functions as a learning data creation device according to the present invention by being installed on a computer, and a recording medium on which this learning data creation program is recorded.

Furthermore, it goes without saying that the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit of the present invention.

1 Learning data creation device 2 Processor 10

Imaging device

12A, 12B Camera 13 Imaging control unit 14

Stage

16A, 16B Lighting device 16A1 to 16A4, 16B1 to 16B4 Light emitting unit 20 Acquisition unit 22 Image acquisition unit 23 First area information acquisition unit 24 CPU
25 Operation unit 26 RAM
27 ROM
28 Memory 29 Display unit 30 Learning image generation unit 32 Correct answer data generation unit 34 Storage control unit 50 Machine learning device 52 Learning model (CNN)
52A input layer 52B intermediate layer 52C output layer 54 loss value calculation unit 56 parameter control unit _{_I} _A, _I _B, _I _C learning image IE edge image _{_{IM, I a, I b,}} I c mask image (correct answer data)
ITP photographed image _Itpl template image S10-S20 Step T, T1-T6 Drug TP Drug package

Claims

A learning data creation device including a processor and a memory, wherein the processor creates learning data for machine learning.
The processor
The acquisition process to acquire the image of the object and
A learning image generation process for moving the acquired image of the object to generate a learning image, and
A correct answer data generation process that generates second region information corresponding to the region of the object in the generated learning image and uses the generated second region information as correct data for the learning image.
A storage control for storing a pair of the generated learning image and the correct answer data in the memory as learning data, and
Learning data creation device to perform.
The acquisition process of the processor acquires the first area information corresponding to the area of the object, and obtains the first area information.
The correct answer data generation process generates the second area information based on the acquired first area information.
The learning data creation device according to claim 1.
The first area information includes area information in which the area of the object is manually set, area information in which the area of the object is automatically extracted by image processing, or area information in which the area of the object is automatically extracted by image processing. , And manually adjusted area information,
The learning data creation device according to claim 2.
The correct answer data includes at least one of a correct image corresponding to the region of the object, bounding box information surrounding the region of the object with a rectangle, and edge information indicating an edge of the region of the object.
The learning data creation device according to claim 2 or 3.
The learning image generation process generates the learning image by translating, rotating, reversing, or scaling the image of the object.
The correct answer data generation process generates the correct answer data by translating, rotating, reversing, or scaling the first region information corresponding to the image of the object.
The learning data creation device according to any one of claims 2 to 4.
The learning image generation process generates the learning image by synthesizing two or more images obtained by translating, rotating, reversing, or scaling the image of the object.
The correct answer data generation process generates the correct answer data by translating, rotating, reversing, or scaling the first region information corresponding to each of the two or more images in accordance with the image of the object. ,
The learning data creation device according to any one of claims 2 to 5.
In the learning image generation process, when a learning image including an image of a plurality of objects is generated, the learning image in which all or a part of the images of the plurality of objects are in contact with a point or a line is generated. Generate,
The learning data creation device according to any one of claims 1 to 6.
The correct answer data includes an edge image showing only a part where all or a part of the images of the plurality of objects contact with a point or a line.
The learning data creation device according to claim 7.
The object is at least partially transparent,
The learning data creation device according to any one of claims 1 to 8.
The learning image generation process by the processor moves an image of an object other than the transparent object image when generating the learning image including a plurality of images of the object.
The learning data creation device according to claim 9.
The object is at least partially transparent and
The learning image generation process generates the learning image by moving the image of the object within a threshold value.
The learning data creation device according to any one of claims 1 to 8.
The movement includes either translation or rotation.
The learning data creation device according to claim 11.
A learning image generated by moving the image of the object and
Correct answer data having a second area information indicating the area of the object in the learning image, and
Learning data consisting of a pair of.
Learning model and
A learning control unit that machine-learns the learning model using the learning data according to claim 13.
Machine learning device equipped with.
The learning model is composed of a convolutional neural network.
The machine learning device according to claim 14.
It is a learning data creation method in which the processor creates learning data for machine learning by performing the processing of each of the following steps.
Steps to get an image of an object,
A step of moving the acquired image of the object to generate a learning image, and
A step of generating second region information corresponding to the region of the object in the generated learning image and using the generated second region information as correct answer data for the learning image.
A step of storing the pair of the generated learning image and the correct answer data in the memory as learning data, and
How to create learning data including.
Including the step of acquiring the first area information corresponding to the area of the object.
The step of generating the correct answer data generates the second area information based on the acquired first area information.
The learning data creation method according to claim 16.
The correct answer data includes at least one of a correct image corresponding to the region of the object, bounding box information surrounding the region of the object with a rectangle, and edge information indicating an edge of the region of the object.
The learning data creation method according to claim 16 or 17.
The step of generating the learning image is to bring all or a part of the images of the plurality of objects into contact with points or lines when arranging the images of the plurality of objects.
The learning data creation method according to any one of claims 16 to 18.
The correct answer data includes an edge image showing only the points or lines of the images of the plurality of objects that come into contact with each other.
The learning data creation method according to claim 19.
The object is at least partially transparent,
The learning data creation method according to any one of claims 16 to 20.
The step of generating the learning image moves an image of an object other than the transparent object image when generating a learning image including an image of a plurality of objects.
The learning data creation method according to claim 21.
The function to acquire the image of the object and
A function of moving the acquired image of the object to generate a learning image, and
A function of generating second region information corresponding to the region of the object in the generated learning image and using the generated second region information as correct answer data for the learning image.
A function of storing a pair of the generated learning image and the correct answer data in a memory as learning data,
A learning data creation program that realizes this with a computer.
Includes a function to acquire first area information corresponding to the area of the object.
The function of generating the correct answer data generates the second area information based on the acquired first area information.
The learning data creation program according to claim 23.
A non-temporary, computer-readable recording medium on which the program according to claim 23 is recorded.
A non-temporary, computer-readable recording medium on which the program according to claim 24 is recorded.