WO2021182343A1 - Learning data creation device, method, program, learning data, and machine learning device - Google Patents

Learning data creation device, method, program, learning data, and machine learning device Download PDF

Info

Publication number
WO2021182343A1
WO2021182343A1 PCT/JP2021/008789 JP2021008789W WO2021182343A1 WO 2021182343 A1 WO2021182343 A1 WO 2021182343A1 JP 2021008789 W JP2021008789 W JP 2021008789W WO 2021182343 A1 WO2021182343 A1 WO 2021182343A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
learning
drug
correct answer
data
Prior art date
Application number
PCT/JP2021/008789
Other languages
French (fr)
Japanese (ja)
Inventor
一央 岩見
Original Assignee
富士フイルム株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士フイルム株式会社 filed Critical 富士フイルム株式会社
Priority to JP2022507149A priority Critical patent/JP7531578B2/en
Publication of WO2021182343A1 publication Critical patent/WO2021182343A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to a learning data creation device, a method, a program, a learning data, and a machine learning device, and particularly relates to a technique for efficiently creating a large number of learning data.
  • Patent Document 1 a visual inspection device has been proposed that learns based on a large number of teaching data stored in a teaching file, recognizes a pattern, and determines a defect.
  • This visual inspection device includes a teaching data generation device that generates new teaching data by transforming the specific teaching data for a specific teaching data having a small number of data among a large number of teaching data in the teaching file. By supplementing the teaching data generated by the teaching data generator to the teaching file and learning, it is possible to inspect defects with a small number of data.
  • the teaching data generation device performs affine transformation including enlargement, reduction, and rotation of the image, and attribute conversion including brightness, contrast, and edge strength. New teaching data is being generated.
  • the area information indicating the image of the object and the area of the object (area information indicating the area of the object) It is necessary to create a large number of pairs with the correct answer data) and machine-learn the learning model using a learning data set consisting of a large number of pairs.
  • this type of correct answer data is created by displaying the captured image on the display and filling the image of the object pixel by pixel while viewing the captured image displayed on the display, which is troublesome to create the correct answer data. There is a problem that it takes time.
  • the visual inspection apparatus described in Patent Document 1 uses a camera to image a printed matter or an object on the ground (paper, film, metal, etc.), recognizes a print defect from the captured image, and recognizes the type of defect (defect type (paper, film, metal, etc.)). It separates "holes”, “stains”, “convex”, “streaks”, etc.).
  • the teaching data generator transforms one data (image data) having a small number of data to newly generate a plurality of teaching data, it corresponds to the plurality of teaching data generated by transforming the same image data.
  • the correct answer data is data indicating the same type of defect. That is, Patent Document 1 does not describe the problem that it takes time and effort to create correct answer data for teaching data (teaching image), and does not disclose a technique for solving the problem.
  • the present invention has been made in view of such circumstances, and is a learning data creation device, method, program, and learning that can efficiently create learning data for machine learning a learning model that recognizes an object region.
  • the purpose is to provide data and machine learning equipment.
  • the invention according to the first aspect in order to achieve the above object is a learning data creation device including a processor and a memory, in which the processor creates learning data for machine learning, and the processor creates an image of an object.
  • the acquisition process to be acquired, the learning image generation process to move the acquired image of the object to generate the learning image, and the second area information corresponding to the area of the object in the generated learning image are generated.
  • the correct answer data generation process in which the generated second area information is used as the correct answer data for the learning image, and the storage control for storing the pair of the generated learning image and the correct answer data in the memory as the learning data are performed.
  • a learning image is generated by moving an image of an object. Further, the second area information corresponding to the area of the object in the generated learning image is generated, and the generated second area information is used as the correct answer data for the learning image. Since the correct answer data can be generated by the correct answer data generation process by the processor, it does not require time and effort to create the correct answer data.
  • the processor acquisition process acquires the first area information corresponding to the area of the object, and the correct answer data generation process is based on the acquired first area information. It is preferable to generate the second region information.
  • the first area information is the area information in which the area of the object is manually set, the area information in which the area of the object is automatically extracted by image processing, or the object. It is preferable that the area is automatically extracted by image processing and the area information is manually adjusted.
  • the correct answer data includes the correct image corresponding to the area of the object, the bounding box information surrounding the area of the object with a rectangle, and the edge indicating the edge of the area of the object. It is preferable to include at least one of the information.
  • the correct image includes a mask image.
  • the learning image generation process generates a learning image by translating, rotating, reversing, or scaling the image of the object, and the correct answer data generation process is performed.
  • the generation of the learning image and the generation of the correct answer data may be synchronously generated at the same time, or one of the learning image and the correct answer data may be generated and then the other may be generated.
  • the learning image generation process synthesizes two or more images obtained by translating, rotating, reversing, or scaling the image of the object to create a learning image.
  • the correct answer data can be generated by translating, rotating, reversing, or scaling the first region information corresponding to each of the two or more images according to the image of the object. preferable. As a result, it is possible to generate a learning image composed of images of a plurality of objects and correct answer data thereof.
  • the learning data creation device in the learning image generation process, when a learning image including an image of a plurality of objects is generated, all or a part of the images of the plurality of objects is used. It is preferable to generate a learning image that makes contact with points or lines.
  • the correct answer data preferably includes an edge image showing only a portion where all or a part of the images of a plurality of objects are in contact with each other by points or lines.
  • an edge image showing only a portion of the images of a plurality of objects in contact with points or lines can be included. .. This learning data is useful for separating the points of contact with points or lines in the images of a plurality of objects.
  • At least a part of the object is transparent.
  • An image of an object that is at least partially transparent is more difficult to extract than an image of an object that is entirely opaque, and there is less training data. Therefore, training data is generated for an image of an object that is at least partially transparent. Is particularly effective.
  • the learning image generation process by the processor when the learning image generation process by the processor generates a learning image including an image of a plurality of objects, an object other than the transparent object image is generated. It is preferable to move the image of. In the case of an image of an object that is at least partially transparent, the positional relationship between the image of the object that is arbitrarily moved and the image that the transparent object is placed at the same position and photographed is the illumination light. This is because the images will be different.
  • the learning image generation process moves the image of the object within a threshold value to generate a learning image. Is preferable.
  • a constraint (threshold value) is set for the movement of the image of the transparent object, and the image of the transparent object is moved within the threshold value to generate a learning image.
  • the learning image generated by moving the image of the transparent object within the threshold value does not have a significant change in the positional relationship with the illumination light, and as a result, the transparent image actually taken at that position is transparent. It matches or substantially matches the image of the object.
  • the movement preferably includes either parallel movement or rotational movement.
  • the invention according to the thirteenth aspect is learning data composed of a pair of a learning image generated by moving an image of an object and correct answer data having second area information indicating an area of the object in the learning image. be.
  • the machine learning device includes a learning model and a learning control unit that uses the above learning data to perform machine learning of the learning model.
  • the learning model is preferably composed of a convolutional neural network.
  • the invention according to the 16th aspect is a learning data creation method in which a processor creates learning data for machine learning by performing the processing of each of the following steps, the step of acquiring an image of an object and the acquisition.
  • the step of generating correct answer data includes the step of acquiring the first area information corresponding to the area of the object, and the step of generating the correct answer data is the first based on the acquired first area information. It is preferable to generate two-region information.
  • the correct answer data includes the correct image corresponding to the area of the object, the bounding box information surrounding the area of the object with a rectangle, and the edge indicating the edge of the area of the object. It is preferable to include at least one of the information.
  • the learning data creation method in the step of generating a learning image, when arranging images of a plurality of objects, all or a part of the images of the plurality of objects is pointed or lined. It is preferable to make contact with.
  • the correct answer data includes an edge image showing only the points or lines of the images of a plurality of objects that come into contact with each other.
  • the learning data creation method it is preferable that at least a part of the object is transparent.
  • the step of generating a learning image is an object other than a transparent object image when generating a learning image including an image of a plurality of objects. It is preferable to move the image of.
  • the invention according to the 23rd aspect corresponds to a function of acquiring an image of an object, a function of moving the acquired image of the object to generate a learning image, and a region of the object in the generated learning image.
  • the function of acquiring the first area information corresponding to the area of the object and the function of generating the correct answer data is the first based on the acquired first area information. It is preferable to generate two-region information.
  • FIG. 1 is a diagram showing a captured image input to the trained learning model and an output result desired to be acquired from the learning model.
  • FIG. 2 is a diagram showing an example of learning data.
  • FIG. 3 is a conceptual diagram showing image processing when correct answer data is automatically created.
  • FIG. 4 is a conceptual diagram showing a method of mass-producing learning data by simulation.
  • FIG. 5 is a diagram showing a first embodiment in which learning data is created by simulation.
  • FIG. 6 is a diagram showing a mode in which one photographed image is generated from two photographed images.
  • FIG. 7 is a diagram showing a second embodiment in which learning data is created by simulation.
  • FIG. 8 is a block diagram showing an example of the hardware configuration of the learning data creation device according to the present invention.
  • FIG. 8 is a block diagram showing an example of the hardware configuration of the learning data creation device according to the present invention.
  • FIG. 9 is a plan view showing a drug package in which a plurality of drugs are packaged.
  • FIG. 10 is a block diagram showing a schematic configuration of the photographing apparatus shown in FIG.
  • FIG. 11 is a plan view showing a schematic configuration of the photographing apparatus.
  • FIG. 12 is a side view showing a schematic configuration of the photographing apparatus.
  • FIG. 13 is a block diagram showing an embodiment of the learning data creation device according to the present invention.
  • FIG. 14 is a diagram showing an example of first region information indicating a region of a drug in a captured image acquired by the image acquisition unit and a captured image acquired by the first region information acquisition unit.
  • FIG. 15 is a diagram showing an example of learning data generated from the captured image and the mask image shown in FIG. FIG.
  • FIG. 16 is a diagram showing an example of an edge image showing only a portion of contact with a plurality of drug points or lines.
  • FIG. 17 is a diagram showing an example of a photographed image including a plurality of transparent agents and a plurality of opaque agents.
  • FIG. 18 is a diagram used to explain the lens effect of the transparent agent.
  • FIG. 19 is a diagram used to illustrate the limitation of movement of the clearing agent.
  • FIG. 20 is a block diagram showing an embodiment of the machine learning device according to the present invention.
  • FIG. 21 is a flowchart showing an embodiment of the learning data creation method according to the present invention.
  • FIG. 1 is a diagram showing an image input to the trained learning model and an output result desired to be acquired from the learning model.
  • FIG. 1 (A) is an image of an object (drug in this example) taken, and FIG. 1 (B) is shown in FIG. 1 (A) in a trained learning model (hereinafter referred to as “learned model”). This is the output result that the trained model wants to output when the image shown in) is input.
  • learned model a trained learning model
  • the output result of the trained model is an inference result inferring the drug region (drug region) shown in FIG. 1 (A), and in this example, it is a mask image in which the drug region and the background region are classified into regions.
  • the inference result is not limited to the mask image, and for example, a bounding box surrounding the drug region with a rectangular frame, coordinates of two diagonal points of the bounding box, or a combination thereof can be considered.
  • FIG. 2 is a diagram showing an example of learning data.
  • the left side is a drug image (learning image)
  • the right side is a correct image (correct data) for the drug image
  • the left and right drug images and the correct image are The pair is the training data.
  • the correct image on the right side shown in FIG. 2 is a mask image that distinguishes the region of each drug from the background.
  • the learning data requires images of the objects (drugs) on the left side shown in FIGS. 2 (A) to 2 (C), but since there are a small number of drugs such as new drugs, for example, there are some drugs. There is a problem that many images are not collected.
  • a correct answer image for a learning image for example, an image of a drug
  • the image of the drug is displayed on the display, and the user fills the area of the drug pixel by pixel while viewing the image displayed on the display. It is common.
  • the correct answer image when the correct answer image is automatically created, it can be obtained by calculating the position and rotation angle of the drug by template matching, for example.
  • FIG. 3 is a conceptual diagram showing image processing when a correct image is automatically created.
  • a template image Itpl which is an image showing the drug is prepared.
  • the shape of the drug is not circular, it is preferable to prepare a plurality of template images Itpl for each rotation angle to be searched.
  • the captured image ITP and the correct image may be superimposed and displayed on the display, and if there is an error in the mask image, the user may correct the mask image on a pixel-by-pixel basis.
  • FIG. 4 is a conceptual diagram showing a method of mass-producing learning data by simulation.
  • Rotational movement refers to rotating an image of an object around a certain point and moving it to another position.
  • rotational movement refers to the case of rotating an object around a certain point (for example, the center of gravity), and hereinafter, “rotational movement” is simply referred to as "rotation”.
  • learning data can be mass-produced. Further, the creation of the learning data by such a method is simpler than, for example, the case where the correct answer image is created after the creation of a new learning image.
  • FIG. 5 is a diagram showing a first embodiment in which learning data is created by simulation.
  • FIG. 5A is a diagram showing a pair of a photographed image of a drug as an object and a mask image manually or automatically generated based on the photographed image.
  • the present invention creates learning data by simulation from a pair of a photographed image and a mask image (inflates the learning data).
  • FIG. 5 (B) is a diagram showing a pair of a photographed image and a mask image in which the photographed image and the mask image shown in FIG. 5 (A) are inverted, respectively.
  • the inverted (left-right inverted) mask image on the right side shown in FIG. 5 (B) is a mask image showing the region of the drug in the inverted photographed image on the left side shown in FIG. 5 (B). Therefore, the inverted captured image can be used as a new learning image, and the inverted mask image can be used as correct answer data for the newly generated learning image.
  • new learning data consisting of the pair of the learning image and the mask image shown in FIG. 5 (B) is created.
  • the image inversion is not limited to horizontal inversion, but also includes vertical inversion.
  • the image on the left may be created first, and the image on the right may be created by detecting the region of the drug image from the image.
  • FIG. 5 (C) is a diagram showing an image obtained by adding the images shown in FIGS. 5 (A) and 5 (B).
  • the photographed image on the left side shown in FIG. 5 (C) can be created by synthesizing the photographed image shown in FIG. 5 (A) and the inverted photographed image shown in FIG. 5 (B).
  • the photographed image on the left side shown in FIG. 5 (C) can be created by synthesizing the photographed image shown in FIG. 5 (A) and the inverted photographed image shown in FIG. 5 (B). That is, the captured image shown in FIG. 5 (C) is an image (drug image) obtained by cutting out a region of the drug in the inverted captured image shown in FIG. 5 (B) on the captured image shown in FIG. 5 (A). It can be created by pasting.
  • the drug image is cut out from the inverted photographed image by the process of cutting out the drug region from the photographed image shown in FIG. 5 (A) based on the inverted mask image of FIG. 5 (B). Can be done.
  • the method of synthesizing two or more drug images is not limited to the method of using a mask image as described above.
  • a background image of only the background in which the drug has not been photographed only the drug image is extracted from FIGS. 5 (A) and 5 (B), and each extracted drug image is combined with the background image. Therefore, the captured image (learning image) shown in FIG. 5C can be generated. Further, in the case of a captured image captured so that the background is black (pixel value is zero), a learning image having each drug image can be generated by adding each captured image.
  • the mask image on the right side shown in FIG. 5 (C) can be created by adding the mask image shown in FIG. 5 (A) and the inverted mask image shown in FIG. 5 (B).
  • the pixel value of the drug region of the inverted mask image in FIG. 5C is set to, for example, "0.5", and the background pixels. By adding the values as "0", the pixel values of the two drug regions in the generated mask image are made different.
  • the two training data shown in FIGS. 5 (B) and 5 (C) can be created from one training data consisting of the pair of the captured image and the mask image shown in FIG. 5 (A). ..
  • the captured image and the mask image shown in FIG. 5 (A) are inverted, and the new captured image (learning image) and the mask image pair shown in FIG. 5 (B) are inverted.
  • the learning data consisting of the above is not limited to this, but the captured image and the mask image shown in FIG. 5 (A) are synchronizedly moved, rotated, or scaled in parallel to create a new learning image. And training data consisting of a pair of mask images may be created.
  • a margin is generated in the background by translating, rotating, or reducing the captured image and the mask image, respectively, it is preferable to fill the margin with the same pixel value as the background.
  • a new photographed image and a mask image are created from the photographed image and the mask image in which one drug is photographed, but a plurality of photographed images and a plurality of photographed images in which a plurality of different agents are separately photographed and A new photographed image and a mask image may be created from the mask image or the photographed image and the mask image in which a plurality of different agents are simultaneously photographed.
  • FIG. 6 is a diagram showing a mode in which one learning image is generated from two captured images.
  • FIG. 6 (A) a case where a new learning image having four drug images is generated from two captured images in which two objects (drugs) are captured is shown. There is.
  • the case where one learning image is generated from the two captured images is shown as in FIG. 6 (A), but the four agents in the two captured images are shown.
  • One learning image is generated using three drug images out of the images.
  • the newly generated learning image does not have to use all the drug images in the two captured images.
  • the plurality of drug images in one generated learning image may include drug images that have not been subjected to operations such as translation, rotation, or scaling (drug images that do not move). ..
  • a mask image corresponding to the newly generated learning image is also generated, and a new pair of the newly generated learning image and the mask image is generated. It becomes learning data.
  • FIG. 7 is a diagram showing a second embodiment in which learning data is created by simulation.
  • FIG. 7A is a diagram showing a pair of a photographed image obtained by photographing the drug and a mask image manually or automatically generated based on the photographed image, and is the same as the pair shown in FIG. 5A. be.
  • FIG. 7B is a diagram showing drug regions cut out from the photographed image and the mask image shown in FIG. 7A, respectively.
  • the area within the rectangular frame surrounding the drug area is defined as the area for cutting out the image (cutout area). Since the drug region is known from the mask image, the image in the rectangular frame surrounding the drug region can be cut out based on the mask image.
  • FIG. 7 (C) shows an image of a cut-out area cut out from the photographed image and the mask image shown in FIG. 7 (A), respectively.
  • the drug image is cut out from the captured image by a process (drug image acquisition process) of cutting out a drug region from the captured image shown in FIG. 7 (A) based on the mask image shown in FIG. 7 (A). Can be done. Since the mask image shown in FIG. 7A has information indicating the drug region (first region information), the drug region (hereinafter referred to as “drug mask image”) can be cut out from the mask image. can. Further, the drug image acquisition process may include a process of reading an image in a state after being cut out from a memory or the like.
  • FIG. 7 (D) is a diagram showing a new photographed image and a mask image created by pasting the cut out drug image and drug mask image at an arbitrary position and an arbitrary rotation angle.
  • the captured image and mask image shown in FIG. 7 (D) are training data composed of a new pair of learning image and correct answer data created by the above image processing from the captured image and mask image shown in FIG. 7 (A). It becomes.
  • FIG. 7 (E) is a diagram showing a new photographed image (learning image) and a mask image created by pasting the cut out drug image and drug mask image at an arbitrary position and an arbitrary rotation angle.
  • a plurality of drug images are created so as to be in contact with each other by dots or lines.
  • the mask image on the left side shown in FIG. 7 (E) is preferably image-processed so that the drug regions do not come into contact with each other. Since the contact points of the drug regions are known, the contact points can be prevented from contacting each other by replacing the contact points with a background color.
  • each drug in which the drugs are in contact with each other by a point or a line is the same drug, it is preferable to make the pixel value of the drug region different for instance separation. In this case, since each drug region in the mask image can be recognized by the difference in the pixel value, it is not necessary to replace the portion where the drug region contacts with the background color.
  • a lot of learning data can be created based on the photographed image in which the drug was photographed and the first area information (mask image) indicating the area of the drug in the photographed image.
  • FIG. 8 is a block diagram showing an example of the hardware configuration of the learning data creation device according to the present invention.
  • the learning data creation device 1 shown in FIG. 8 can be configured by, for example, a computer, and is mainly composed of an image acquisition unit 22, a CPU (Central Processing Unit) 24, an operation unit 25, a RAM (Random Access Memory) 26, and a ROM (Read). It is composed of Only Memory) 27, a memory 28, and a display unit 29.
  • a computer Central Processing Unit
  • CPU Central Processing Unit
  • RAM Random Access Memory
  • ROM Read
  • the image acquisition unit 22 acquires a photographed image in which the drug is photographed by the photographing device 10 from the photographing device 10.
  • the drug photographed by the imaging device 10 is, for example, a drug for one dose or an arbitrary drug, which may be contained in a drug package or may not be contained in the drug package.
  • FIG. 9 is a plan view showing a drug package in which a plurality of drugs are packaged.
  • the drug package TP shown in FIG. 9 is a package in which a plurality of drugs to be taken at one time are stored in a transparent package and packed one by one.
  • the drug package TPs are connected in a band shape as shown in FIGS. 11 and 12, and have a cut line that enables each drug package TP to be separated.
  • six drug Ts are packaged in one package.
  • FIG. 10 is a block diagram showing a schematic configuration of the photographing apparatus shown in FIG.
  • the imaging device 10 shown in FIG. 10 includes two cameras 12A and 12B for photographing the drug, two lighting devices 16A and 16B for illuminating the drug, and a photographing control unit 13.
  • 11 and 12 are a plan view and a side view showing a schematic configuration of the photographing apparatus, respectively.
  • the medicine package TP is placed on a transparent stage 14 installed horizontally (xy plane).
  • the cameras 12A and 12B are arranged so as to face each other with the stage 14 in the direction orthogonal to the stage 14 (z direction).
  • the camera 12A faces the surface of the medicine package TP and photographs the medicine package TP from above.
  • the camera 12B faces the back surface of the medicine package TP and photographs the medicine package TP from below.
  • a lighting device 16A is provided on the side of the camera 12A and a lighting device 16B is provided on the side of the camera 12B with the stage 14 in between.
  • the lighting device 16A is arranged above the stage 14 and illuminates the medicine package TP placed on the stage 14 from above.
  • the illuminating device 16A has four light emitting units 16A1 to 16A4 arranged radially, and irradiates the illuminating light from four orthogonal directions. The light emission of each light emitting unit 16A1 to 16A4 is individually controlled.
  • the lighting device 16B is arranged below the stage 14 and illuminates the medicine package TP placed on the stage 14 from below.
  • the illuminating device 16B has four light emitting units 16B1 to 16B4 arranged radially like the illuminating device 16A, and irradiates the illuminating light from four orthogonal directions. The light emission of each light emitting unit 16B1 to 16B4 is individually controlled.
  • Shooting is done as follows. First, the medicine package TP is photographed from above using the camera 12A. At the time of shooting, the light emitting units 16A1 to 16A4 of the lighting device 16A are sequentially emitted to emit four images, and then the light emitting units 16A1 to 16A4 are simultaneously emitted to emit one image. I do. Next, each light emitting unit 16B1 to 16B4 of the lower lighting device 16B is made to emit light at the same time, a reflector (not shown) is inserted, the medicine package TP is illuminated from below through the reflector, and the medicine package is illuminated from above using the camera 12A. Take a picture of TP.
  • the four images taken by sequentially emitting light from each of the light emitting units 16A1 to 16A4 have different illumination directions, and when there is a marking (unevenness) on the surface of the drug, the appearance of the shadow due to the marking is different. Become. These four captured images are used to generate an engraved image that emphasizes the engraving on the surface side of the drug T.
  • One image taken by simultaneously emitting light of each light emitting unit 16A1 to 16A4 is an image having no uneven brightness, and is used, for example, when cutting out an image (drug image) on the surface side of the drug T, and also. It is a photographed image on which the engraved image is superimposed.
  • the image in which the medicine package TP is illuminated from below via the reflector and the medicine package TP is photographed from above using the camera 12A is a photographed image used when recognizing a plurality of drug T regions. ..
  • the medicine package TP is photographed from below.
  • the light emitting units 16B1 to 16B4 of the lighting device 16B are sequentially emitted to emit four images, and then the light emitting units 16B1 to 16B4 are simultaneously emitted to emit one image. I do.
  • the four captured images are used to generate an engraved image emphasizing the engraving on the back surface side of the drug T, and one image taken by simultaneously emitting light of each light emitting unit 16B1 to 16B4 has uneven brightness. It is a non-existent image, for example, a photographed image used when cutting out a drug image on the back surface side of the drug T, and on which an engraved image is superimposed.
  • the imaging control unit 13 shown in FIG. 10 controls the cameras 12A and 12B and the lighting devices 16A and 16B, and photographs 11 times for one medicine package TP (6 times with the camera 12A and 5 times with the camera 12B). (Shooting).
  • the photographing is performed in a dark room, and the light emitted to the medicine package TP at the time of photographing is only the illumination light from the lighting device 16A or the lighting device 16B. Therefore, of the 11 captured images taken as described above, the background is the light source of the image in which the medicine package TP is illuminated from below via the reflector and the medicine package TP is photographed from above using the camera 12A. (White), and the area of each drug T is shielded from light and becomes black. On the other hand, in the other 10 captured images, the background is black, and the region of each drug is the color of the drug.
  • the whole drug is transparent (semi-transparent), or a part or a part thereof.
  • capsules partially transparent drugs
  • light is transmitted from the area of the drug, so that it does not turn black like an opaque drug.
  • the learning data creation device 1 machine-learns a learning model that infers the drug from the captured image in which the drug is captured (particularly, infers the region of each drug T existing in the captured image). It creates learning data.
  • the image acquisition unit 22 of the learning data creation device 1 is a captured image (that is, a reflector) used when recognizing a plurality of regions of the drug T among the 11 captured images captured by the imaging device 10. It is preferable to illuminate the medicine package TP from below via the camera 12A and acquire a photographed image of the medicine package TP taken from above using the camera 12A).
  • the memory 28 is a storage portion for storing learning data, and is, for example, a non-volatile memory such as a hard disk device or a flash memory.
  • the CPU 24 uses the RAM 26 as a work area, uses various programs including a learning data creation program stored in the ROM 27 or the memory 28, and executes various processes of the present apparatus by executing the programs.
  • the operation unit 25 includes a keyboard and a pointing device (mouse, etc.), and is a part for inputting various information and instructions by the user's operation.
  • the display unit 29 displays a screen required for operation on the operation unit 25, functions as a part that realizes a GUI (Graphical User Interface), and can display a captured image or the like.
  • GUI Graphic User Interface
  • FIG. 13 is a block diagram showing an embodiment of the learning data creation device according to the present invention.
  • the learning data creating device 1 shown in FIG. 13 is a functional block diagram showing a function executed by the hardware configuration of the learning data creating device 1 shown in FIG. 8, and includes a processor 2 and a memory 28.
  • the processor 2 is composed of the image acquisition unit 22, the CPU 24, the RAM 26, the ROM 27, the memory 28, and the like shown in FIG. 8, and performs various processes shown below.
  • the processor 2 functions as an acquisition unit 20, a learning image generation unit 30, a correct answer data generation unit 32, and a memory control unit 34.
  • the acquisition unit 20 includes an image acquisition unit 22 and a first area information acquisition unit 23.
  • the image acquisition unit 22 acquires the photographed image ITP obtained by photographing the drug T from the photographing device 10 as described above (performs the acquisition process of the photographed image).
  • the first area information acquisition unit 23 acquires information (first area information) indicating the area of the drug in the captured image ITP acquired by the image acquisition unit 22.
  • This first area information is correct answer data for the inference result inferred by the learning model when the captured image is used as an input image for machine learning of the learning model.
  • the first region information which is the correct answer data, indicates a correct answer image (for example, a mask image) showing the region of the drug in the captured image, bounding box information surrounding the region of the drug with a rectangle, and an edge of the region of the drug. It is preferable to include at least one of the edge information.
  • FIG. 14 is a diagram showing an example of the first region information showing the captured image acquired by the image acquisition unit and the region of the drug in the captured image acquired by the first region information acquisition unit.
  • the photographed image ITP shown in FIG. 14 (A) is an image in which the drug package TP is illuminated from below via a reflector and the drug package TP (see FIG. 9) is photographed from above using the camera 12A.
  • Six drugs T1 to T6 are packaged in this drug package TP.
  • the agents T1 to T3 shown in FIG. 14A are opaque agents that block the illumination light from below, and thus are photographed in black. Since the drug T4 is a transparent drug, the illumination light from below is transmitted and the image is taken in white.
  • the agents T5 and T6 are capsules of the same type, and because part of the illumination light from below leaks, they are partially photographed in white.
  • FIG. 14B is first region information showing regions of each drug T1 to T6 in the captured image ITP, and is a mask image IM in this example.
  • the captured image ITP is displayed on the display unit 29, and while viewing the captured image ITP displayed on the display unit 29, the user uses a pointing device such as a mouse to display the regions of the respective agents T1 to T6.
  • a pointing device such as a mouse to display the regions of the respective agents T1 to T6.
  • a binarized mask image IM can be created by setting the pixel value of each of the filled agents T1 to T6 to "1" and the pixel value of the background area to "0".
  • the capsule-shaped agents T5 and T6 are of the same type, it is preferable that the pixel values of the regions of the two agents T5 and T6 are different for instance separation.
  • the pixel value in the region of the drug T5 can be set to "1”
  • the pixel value in the region of the drug T6 can be set to "0.5".
  • the mask image IM which is the first area information
  • the mask image IM is the area information generated when the user manually sets the areas of the respective agents T1 to T6 in the captured image ITP using the pointing device.
  • the present invention is not limited to this, and the drug region in the captured image may be automatically extracted and generated by image processing, or the drug region in the captured image may be automatically extracted by image processing and manually adjusted. It may be generated by doing.
  • the learning image generating unit 30 receives the captured image ITP taken the drug from the image acquiring unit 22, to move the drug from the photographed image ITP input learning image (I A, I B , I C, ...) to generate. That is, the learning image generator 30, a plurality of learning images (I A, I B, I C, ...) the learning image generation processing for generating performed based on the captured image ITP.
  • the movement of the drug captured in the captured image ITP may be performed by the user instructing the position and rotation of the drug image by the pointing device, or as described with reference to FIG. 5, the captured image is inverted. Or by addition or the like. Further, the drug may be moved by randomly determining the position and rotation of the drug image using a random number. In this case, it is necessary to prevent the drug images from overlapping.
  • Correct answer data generator 32 the first area information acquisition unit 23 inputs the mask image IM which is the first area information inputted mask image IM from a plurality of learning images (I A, I B, I C, ... ), A plurality of correct answer data (I a , I b , I c , ...) Are generated. That is, correct answer data generating unit 32, a plurality of learning images on the basis of the mask image IM (I A, I B, I C, ...) to generate a second area information indicating the area of the drug in the generated second performing region information several learning images (I a, I B, I C, 7) a plurality of answer data against each (I a, I b, I c, ...) the correct answer data generation processing to be.
  • a plurality of learning images (I A, I B, I C, ...), and a plurality of answer data (I a, I b, I c, ...) generated in the first to create a training data by simulation
  • the photographed image obtained by photographing the drug and the first area information (for example, a mask image) indicating the area of the drug in the photographed image are used to obtain the photographed image and the photographed image.
  • Generated by inverting, paralleling, rotating, or scaling the mask images in synchronization with each other, or by parallel-moving, rotating, scaling, or pasting the photographed image and the drug image and the drug mask image cut out from the mask image. can do.
  • the storage control unit 34 the learning image generated by the learning image generating section 30 (I A, I B, I C, ...) and correct answer data generated by the correct answer data generating unit 32 (I a, I b , I c, ...) and enter the correct answer and the corresponding pairs (learning image I a data I a, learning image I B and solution data I b, learning image I C and solution data I c, ... ,) Is stored in the memory 28.
  • the pair of the photographed image ITP and the mask image IM input to the learning image generation unit 30 and the correct answer data generation unit 32 are also stored in the memory 28 as learning data. ..
  • FIG. 15 is a diagram showing an example of learning data generated from the photographed image and the mask image shown in FIG.
  • FIG. 15 (A) shows the training data consisting of pairs of learning images I A and solution data (mask image) I a
  • FIG. 15 (B) is the learning image I B and the mask image I b
  • the training data consisting of pairs is shown.
  • capsular drug T5, T6 is in contact with the line
  • drug T2, T3, T4 are in contact at a point with each other.
  • Mask image I a corresponding to the learning image I A, by varying the pixel values of the drug T5, T6 region of the same drug, allows for instance the separation area of the drug T5, T6, and the line
  • the boundaries of the drugs T5 and T6 that are in contact with each other are also distinguishable.
  • the mask image I a is a portion in contact with each other point of the drug T2, T3, T4, by the same as the pixel value of the background, as each drug T2, T3, T4 is not in contact with each other, The regions of each drug T2, T3, and T4 are clarified.
  • a capsule form medicament T5, T6 is in contact with the line, drug T6 and drug T3 are in contact at a point.
  • Mask image I b corresponding to the learning image I B is varied pixel values of the drug T5, T6 of the region which is the same drug (e.g., a pixel value of the region of the drug T6 to "0.5" ), It is possible to separate the instances of the regions of the drugs T5 and T6, and to distinguish the boundary between the drugs T5 and T6 which are in contact with each other by a line and the boundary between the drug T6 and the drug T3 which are in contact with each other by a point.
  • the learning data shown in FIG. 15 is an example, and drug images showing each drug T1 to T6 are arranged in combination of translation, rotation, etc., and drug mask images showing regions of each drug T1 to T6 are arranged in the same manner. By doing so, a lot of learning data can be created.
  • the illumination light from below is transmitted and the image is photographed white, but it is illuminated depending on the position and angle of the drug T4.
  • the light transmission status changes. That is, the drug image of the transparent drug T4 is an image in which the brightness distribution and the like differ depending on the position and angle of the transparent drug T4 in the imaging region.
  • the transparent drug image is not moved and the image of the drug other than the transparent drug image is generated. It is preferable to move the image to generate a learning image.
  • the mask image is generated as the correct answer data, but it can be used as the edge information (edge image) for each drug image indicating the edge of the area of the drug image.
  • edge information edge image
  • the drugs are in contact with each other by dots or lines, it is preferable to replace the points of contact with the dots or lines with a background color and separate the edge images for each drug.
  • an edge image showing only the points or lines of contact may be generated as correct answer data.
  • FIG. 16 is a diagram showing an example of an edge image showing only a portion of contact with a plurality of drug points or lines.
  • the edge image IE shown in FIG. 16 is an image showing only the locations E1 and E2 in which two or more drugs out of the plurality of drugs T1 to T6 are in contact with each other by a point or a line, and is an image shown by a solid line on FIG. Is.
  • the region shown by the dotted line on FIG. 16 indicates a region in which a plurality of agents T1 to T6 are present.
  • the edge image of the portion E1 in contact with the line is an image of the portion where the capsule-shaped agents T5 and T6 are in contact with the line
  • the edge image of the portion E2 in contact with the point is an image of the three agents T2 to T4. It is an image of a place where they are in contact with each other at a point.
  • the correct answer data generation unit 32 shown in FIG. 13 automatically generates an edge image (correct answer data) showing only the points or lines that come into contact with the learning image generated by the learning image generation unit 30. Can be created in.
  • Edge image IE shown in FIG. 16 may be a correct answer data corresponding to the learning image I A shown in FIG. 15 (A). That is, it is the learning data consisting of pairs of edge image IE shown in learning images I A and 16 shown in FIG. 15 (A).
  • Such learning data is used when a learning model is machine-learned in which a drug image obtained by photographing a drug in contact with a point or a line is used as an input image and an edge image of only the part in contact with the point or a line is output as an inference result. Can be used.
  • the edge image (inference result) of only the points or lines that come into contact with each other is, for example, a drug image obtained by photographing a plurality of drugs that come into contact with points or lines, and an edge image of only the points or lines that come into contact with each other.
  • FIG. 17 is a diagram showing an example of a photographed image containing a plurality of transparent agents and a plurality of opaque agents.
  • a medicine package in which a plurality of transparent agents and a plurality of opaque agents are packaged is illuminated from the upper lighting device 16A (light emitting units 16A1 to 16A4) as shown in FIG.
  • the upper lighting device 16A light emitting units 16A1 to 16A4
  • the photographed image shown in FIG. 17B is an image obtained by illuminating the same medicine package from the lower lighting device 16B (light emitting units 16B1 to 16B4) via a reflector and taking the medicine package from above using the camera 12A. be.
  • the drug image shown in FIG. 17B is only silhouette information (the image taken in black on FIG. 17), which is suitable for acquiring edge information of the drug image.
  • FIG. 18 is a diagram used to explain the lens effect of the transparent drug, and is an image taken by the same method as the drug image shown in FIG. 17 (B).
  • the edge information of the drug image of the opaque drug is not affected by the relative position between the position of the opaque drug and the position of the lighting device 16B (light emitting units 16B1 to 16B4), and has uniform edge information. .. Therefore, by cutting out the drug image of the opaque drug from the photographed image and performing operations such as translation and rotation of each of the extracted drug images, it is equivalent to the photographed image in which the drug placed at that position or the like is actually photographed. It is possible to generate a captured image of. Further, since a large amount of edge information of the drug image of the opaque drug can be obtained, the number of pixels required for photographing can be greatly reduced by removing the texture information on the surface of the drug image of the opaque drug.
  • the transparent agent arranged horizontally and the transparent agent arranged vertically have significantly different edge information, and even if the transparent agent is arranged in the same direction, the transparent agent is arranged. Edge information differs depending on the position.
  • FIG. 19 is a diagram used to explain the limitation of movement of the transparent drug.
  • the edge information of the transparent agent image included in the photographed image shown in FIG. 19B differs depending on the position of the illumination and the position and orientation of the transparent agent, the image of the transparent agent cut out from the photographed image is displayed. If a new learning image is generated by moving the image without any restrictions, a learning image in a situation different from the actual one is generated.
  • the position and orientation of the transparent drug before cutting out are utilized, and for example, the position and orientation before cutting out are pasted. ..
  • the image of the transparent agent cut out from the captured image of FIG. 19 (B) can be pasted at the position (a) of FIG. 19 (A) in the generation of a new learning image.
  • the learning image covers all variations (position, orientation, etc.), but it is practically impossible. Therefore, when moving the transparent agent to be attached, the amount of translation and / or the amount of rotation from the position and orientation of the transparent agent before cutting is limited within the respective threshold values.
  • the threshold of the amount of parallel movement is set to n pixels
  • the threshold of the amount of rotation is set to m degrees
  • the parallel movement and / or rotational movement of the transparent drug exceeding these thresholds is restricted.
  • the edge information of the transparent agent is determined according to the shooting environment (illumination position, camera position, shooting angle of view, etc.) and the shape and size of the transparent agent.
  • the mode of change is different. Therefore, it is preferable that the threshold value of the translation amount and the threshold value of the rotation amount are set by simulating the translation amount and / or the rotation movement amount in a range in which the edge information of the transparent agent can be regarded as hardly changing.
  • FIG. 20 is a block diagram showing an embodiment of the machine learning device according to the present invention.
  • the machine learning device 50 shown in FIG. 20 is composed of a learning model (a convolutional neural network (CNN) which is one of the learning models) 52, a loss value calculation unit 54, and a parameter control unit 56. ..
  • a learning model a convolutional neural network (CNN) which is one of the learning models
  • a loss value calculation unit 54 a loss value calculation unit 52
  • a parameter control unit 56 a parameter control unit 56.
  • This machine learning device 50 is created by the learning data creating device 1 shown in FIG. 13, and uses the learning data stored in the memory 28 to machine-learn the CNN 52.
  • the CNN 52 is a part that infers the region of the drug shown in the input image when the captured image obtained by photographing the drug is used as the input image, has a plurality of layer structures, and holds a plurality of weight parameters. .. Weight parameters include the filter coefficients of a filter called the kernel used for convolution operations in the convolution layer.
  • CNN52 can change from an unlearned learning model to a learned learning model by updating the weight parameter from the initial value to the optimum value.
  • the CNN 52 includes an input layer 52A, an intermediate layer 52B having a plurality of sets composed of a convolution layer and a pooling layer, and an output layer 52C, and each layer has a structure in which a plurality of "nodes" are connected by "edges". It has become.
  • a learning image to be learned is input to the input layer 52A as an input image.
  • the learning image is a learning image in the learning data (learning data consisting of a pair of the learning image and the correct answer data) stored in the memory 28.
  • the intermediate layer 52B has a plurality of sets including a convolution layer and a pooling layer as one set, and is a portion for extracting features from an image input from the input layer 52A.
  • the convolution layer filters nearby nodes in the previous layer (performs a convolution operation using the filter) and acquires a "feature map".
  • the pooling layer reduces the feature map output from the convolution layer to a new feature map.
  • the "convolution layer” plays a role of feature extraction such as edge extraction from an image, and the “pooling layer” plays a role of imparting robustness so that the extracted features are not affected by translation or the like.
  • the intermediate layer 52B is not limited to the case where the convolution layer and the pooling layer are set as one set, but may also include the case where the convolution layers are continuous, the activation process by the activation function, and the normalization layer.
  • the output layer 52C is a part that outputs a feature map showing the features extracted by the intermediate layer 52B. Further, in the trained CNN 52, the output layer 52C outputs, for example, an inference result in which the drug region or the like shown in the input image is region-classified (segmented) in pixel units or in units of several pixels as a group.
  • Arbitrary initial values are set for the coefficient and offset value of the filter applied to each convolution layer of CNN52 before learning.
  • the loss value calculation unit 54 of the loss value calculation unit 54 and the parameter control unit 56 that function as the learning control unit is a feature map output from the output layer 52C of the CNN 52 and correct answer data for the input image (learning image).
  • a certain mask image (a mask image read from the memory 28 corresponding to the learning image) is compared, and an error (loss value which is a value of the loss function) between the two is calculated.
  • an error (loss value which is a value of the loss function) between the two is calculated.
  • a method of calculating the loss value for example, softmax cross entropy, sigmoid, etc. can be considered.
  • the parameter control unit 56 adjusts the weight parameter of the CNN 52 by the back-propagation method based on the loss value calculated by the loss value calculation unit 54.
  • the error back-propagation method the error is back-propagated in order from the final layer, the stochastic gradient descent method is performed in each layer, and the parameter update is repeated until the error converges.
  • This weight parameter adjustment process is repeated, and learning is repeated until the difference between the output of CNN52 and the mask image which is the correct answer data becomes small.
  • the machine learning device 50 repeats machine learning using the learning data stored in the memory 28, so that the CNN 52 becomes a trained model.
  • the trained CNN 52 inputs an unknown input image (captured image obtained by photographing the drug)
  • the trained CNN 52 outputs an inference result such as a mask image showing a region of the drug in the captured image.
  • R-CNN (Regions with Convolutional Neural Networks) can be applied.
  • R-CNN the bounding box of different sizes is slid in the captured image ITP to detect the area of the bounding box in which the drug enters. Then, the edge of the drug is detected by evaluating only the image portion in the bounding box (extracting the CNN feature amount).
  • R-CNN Fast R-CNN, Faster R-CNN, Mask R-CNN, SVM (Support vector machine) and the like can be used.
  • the inference result of the trained model constructed in this way can be used, for example, when an image of each drug is cut out from a photographed image of a drug package in which a plurality of drugs are packaged.
  • the image of each drug cut out is used when auditing / distinguishing each drug contained in the drug package.
  • the memory 28 stores a lot of learning data created by simulation based on the photographed image obtained by photographing the drug and the correct answer data indicating the region of the drug in the photographed image.
  • the image is preferably an image of a drug handled by the own pharmacy.
  • FIG. 21 is a flowchart showing an embodiment of the learning data creation method according to the present invention.
  • each step shown in FIG. 21 is performed by, for example, the processor 2 of the learning data creation device 1 shown in FIG.
  • the image acquisition unit 22 acquires a photographed image ITP (for example, the photographed image ITP shown in FIG. 14A) obtained by photographing the drug from the photographing apparatus 10 (step S10).
  • the photographed image ITP shown in FIG. 14 (A) is an image in which the drug package is illuminated from below via a reflector and the drug package is photographed from above the drug package. It is not limited to those taken in this way. Further, the drug to be photographed may not be contained in the drug package, or the number of drugs may be one.
  • the first region information acquisition unit 23 uses a mask image IM (for example, the mask image IM shown in FIG. 14B) as the first region information indicating the region of the drug in the captured image acquired by the image acquisition unit 22.
  • a mask image IM for example, the mask image IM shown in FIG. 14B
  • the mask image IM is manually or automatically generated based on the captured image ITP and stored in the memory 28 or the like.
  • the learning image generation unit 30 moves the agents T1 to T6 from the captured image ITP acquired in step S10 to generate a learning image (step S14).
  • the learning image can be generated by image processing in which a drug image showing each drug is translated, inverted, rotated, or scaled. If the drug to be moved is a transparent drug, it is not moved or the movement is restricted within the threshold value.
  • the correct answer data generation unit 32 generates correct answer data (mask image) corresponding to the learning image generated in step S14 based on the mask image IM acquired in step S12 (step S16). That is, in step S16, the region of each drug in the mask image IM is arranged in the same manner as each drug in the learning image, the second region information indicating the region of each arranged drug is generated, and the generated second region information is generated. Is used as the correct answer data (mask image) for the learning image.
  • the storage control unit 34 stores the pair of the learning image generated in step S14 and the mask image generated in step S16 in the memory 28 as learning data (step S18).
  • 15 (A) and 15 (B) show an example of learning data composed of a pair of a learning image and a mask image generated as described above and stored in the memory 28.
  • the processor 2 determines whether or not to end the generation of the training data (step S20). For example, when there is an instruction input from the user to end the generation of learning data, or when the creation of a predetermined number of learning data from one pair of captured image ITP and mask image is completed, the learning data It can be determined that the generation is completed.
  • step S14 When it is determined that the generation of the learning data has not been completed (in the case of "No"), the process returns to step S14 and step S16, and the next learning data is created in steps S14 to S20.
  • the image of the object is an image obtained by photographing the object, but the present invention is not limited to this, and includes, for example, an image created by CAD (computer-aided design) data of the object.
  • CAD computer-aided design
  • a drug has been described as an example as an object, but the object is not limited to this, and for example, an industrial product including a medical device, its parts, an agricultural product, or a microorganism photographed with a microscope or the like is used. include.
  • the hardware structure of the learning data creation device for example, a processing unit that executes various processes such as a CPU 24, is various processors as shown below. ..
  • the circuit configuration can be changed after manufacturing the CPU (Central Processing Unit), FPGA (Field Programmable Gate Array), etc., which are general-purpose processors that execute software (programs) and function as various processing units.
  • a dedicated electric circuit which is a processor with a circuit configuration specially designed to execute a specific process such as a programmable logic device (PLD), an ASIC (Application Specific Integrated Circuit), etc. Is done.
  • One processing unit may be composed of one of these various processors, or may be composed of two or more processors of the same type or different types (for example, a plurality of FPGAs or a combination of a CPU and an FPGA). You may. Further, a plurality of processing units may be configured by one processor. As an example of configuring a plurality of processing units with one processor, first, one processor is configured by a combination of one or more CPUs and software, as represented by a computer such as a client or a server. There is a form in which a processor functions as a plurality of processing units.
  • SoC System On Chip
  • a processor that realizes the functions of the entire system including a plurality of processing units with one IC (Integrated Circuit) chip is used.
  • the various processing units are configured by using one or more of the above-mentioned various processors as a hardware-like structure.
  • circuitry that combines circuit elements such as semiconductor elements.
  • the present invention includes a learning data creation program that realizes various functions as a learning data creation device according to the present invention by being installed on a computer, and a recording medium on which this learning data creation program is recorded.
  • 1 Learning data creation device 2 Processor 10 Imaging device 12A, 12B Camera 13 Imaging control unit 14 Stage 16A, 16B Lighting device 16A1 to 16A4, 16B1 to 16B4 Light emitting unit 20 Acquisition unit 22 Image acquisition unit 23 First area information acquisition unit 24 CPU 25 Operation unit 26 RAM 27 ROM 28 Memory 29 Display unit 30 Learning image generation unit 32 Correct answer data generation unit 34 Storage control unit 50 Machine learning device 52 Learning model (CNN) 52A input layer 52B intermediate layer 52C output layer 54 loss value calculation unit 56 parameter control unit I A, I B, I C learning image IE edge image IM, I a, I b, I c mask image (correct answer data) ITP photographed image Itpl template image S10-S20 Step T, T1-T6 Drug TP Drug package

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present invention includes a learning image data generation process for generating a learning image by acquiring a photographed image in which a subject (medicine) is photographed and by moving an image of the medicine extracted from the photographed image, and a correct answer data generation process for generating, on the basis of a mask image indicating a region of the medicine in the photographed image, second region information indicative of the region of the medicine in the generated learning image, and defining the generated second region information as correct answer data for the learning image, wherein a pair of the respectively generated learning image and correct answer data is stored as learning data in a memory.

Description

学習データ作成装置、方法、プログラム、学習データ及び機械学習装置Learning data creation device, method, program, learning data and machine learning device
 本発明は学習データ作成装置、方法、プログラム、学習データ及び機械学習装置に係り、特に多数の学習データを効率よく作成する技術に関する。 The present invention relates to a learning data creation device, a method, a program, a learning data, and a machine learning device, and particularly relates to a technique for efficiently creating a large number of learning data.
 従来、教示ファイルに格納されている多数の教示データに基づいて学習し、パターン認識して欠陥判定をする外観検査装置が提案されている(特許文献1)。 Conventionally, a visual inspection device has been proposed that learns based on a large number of teaching data stored in a teaching file, recognizes a pattern, and determines a defect (Patent Document 1).
 この外観検査装置は、教示ファイル中の多数の教示データのうち、データ数の少ない特定の教示データについては、その特定の教示データを変形して新たな教示データを生成する教示データ生成装置を備え、教示データ生成装置により生成された教示データを教示ファイルに補充して学習することで、データ数の少ない欠陥の検査を可能にしている。 This visual inspection device includes a teaching data generation device that generates new teaching data by transforming the specific teaching data for a specific teaching data having a small number of data among a large number of teaching data in the teaching file. By supplementing the teaching data generated by the teaching data generator to the teaching file and learning, it is possible to inspect defects with a small number of data.
 また、教示データ生成装置は、生成すべき教示データが画像データであるときは、画像の拡大、縮小、回転を含むアフィン変換と、明るさ、コントラスト、エッジ強度を含む属性変換を行うことにより、新たな教示データを生成している。 Further, when the teaching data to be generated is image data, the teaching data generation device performs affine transformation including enlargement, reduction, and rotation of the image, and attribute conversion including brightness, contrast, and edge strength. New teaching data is being generated.
特開2006-48370号公報Japanese Unexamined Patent Publication No. 2006-48370
 ところで、対象物が撮影された撮影画像からその撮影画像内の対象物の領域を、学習済みの学習モデルにより精度よく認識するためには、対象物の画像と対象物の領域を示す領域情報(正解データ)とのペアを多数作成し、多数のペアからなる学習データセットにより学習モデルを機械学習させる必要がある。 By the way, in order to accurately recognize the area of the object in the photographed image from the photographed image of the object by the trained learning model, the area information indicating the image of the object and the area of the object (area information indicating the area of the object) It is necessary to create a large number of pairs with the correct answer data) and machine-learn the learning model using a learning data set consisting of a large number of pairs.
 従来のこの種の正解データは、撮影画像をディスプレイに表示し、ディスプレイに表示された撮影画像を見ながら対象物の画像をユーザが画素単位で塗り潰して作成しており、正解データの作成に手間と時間がかかるという問題がある。 Conventionally, this type of correct answer data is created by displaying the captured image on the display and filling the image of the object pixel by pixel while viewing the captured image displayed on the display, which is troublesome to create the correct answer data. There is a problem that it takes time.
 一方、特許文献1に記載の外観検査装置は、カメラを用いて印刷物や無地面(紙、フィルム、金属など)の対象物を撮像し、撮像した画像から印刷欠陥を認識し、欠陥の種類(「穴」、「しみ」、「凸」、「すじ」など)を分別するものである。 On the other hand, the visual inspection apparatus described in Patent Document 1 uses a camera to image a printed matter or an object on the ground (paper, film, metal, etc.), recognizes a print defect from the captured image, and recognizes the type of defect (defect type (paper, film, metal, etc.)). It separates "holes", "stains", "convex", "streaks", etc.).
 したがって、教示データ生成装置により、データ数の少ない一のデータ(画像データ)を変形して新たに複数の教示データを生成する場合、同じ画像データを変形して生成した複数の教示データに対応する正解データは、同一の欠陥の種類を示すデータになる。即ち、特許文献1には、教示データ(教示画像)に対する正解データの作成に手間と時間がかかるという課題の記載がなく、それを解決する技術も開示されていない。 Therefore, when the teaching data generator transforms one data (image data) having a small number of data to newly generate a plurality of teaching data, it corresponds to the plurality of teaching data generated by transforming the same image data. The correct answer data is data indicating the same type of defect. That is, Patent Document 1 does not describe the problem that it takes time and effort to create correct answer data for teaching data (teaching image), and does not disclose a technique for solving the problem.
 本発明はこのような事情に鑑みてなされたもので、対象物の領域を認識する学習モデルを機械学習させるための学習データを効率よく作成することができる学習データ作成装置、方法、プログラム、学習データ及び機械学習装置を提供することを目的とする。 The present invention has been made in view of such circumstances, and is a learning data creation device, method, program, and learning that can efficiently create learning data for machine learning a learning model that recognizes an object region. The purpose is to provide data and machine learning equipment.
 上記目的を達成するために第1態様に係る発明は、プロセッサと、メモリとを備え、プロセッサが機械学習用の学習データを作成する学習データ作成装置であって、プロセッサは、対象物の画像を取得する取得処理と、取得した対象物の画像を移動させて学習用画像を生成する学習用画像生成処理と、生成した学習用画像における対象物の領域に対応する第2領域情報を生成し、生成した第2領域情報を学習用画像に対する正解データとする正解データ生成処理と、生成した学習用画像と正解データとのペアを、学習データとしてメモリに記憶させる記憶制御と、を行う。 The invention according to the first aspect in order to achieve the above object is a learning data creation device including a processor and a memory, in which the processor creates learning data for machine learning, and the processor creates an image of an object. The acquisition process to be acquired, the learning image generation process to move the acquired image of the object to generate the learning image, and the second area information corresponding to the area of the object in the generated learning image are generated. The correct answer data generation process in which the generated second area information is used as the correct answer data for the learning image, and the storage control for storing the pair of the generated learning image and the correct answer data in the memory as the learning data are performed.
 本発明の第1態様によれば、対象物の画像を移動させることで学習用画像を生成する。また、生成した学習用画像における対象物の領域に対応する第2領域情報を生成し、生成した第2領域情報を学習用画像に対する正解データとする。この正解データの生成は、プロセッサによる正解データ生成処理により行うことができるため、正解データの作成に手間と時間を要しない。 According to the first aspect of the present invention, a learning image is generated by moving an image of an object. Further, the second area information corresponding to the area of the object in the generated learning image is generated, and the generated second area information is used as the correct answer data for the learning image. Since the correct answer data can be generated by the correct answer data generation process by the processor, it does not require time and effort to create the correct answer data.
 このようにして生成した学習用画像と正解データとのペアを学習データとすることで、多くの学習データを生成すること(水増しすること)ができる。 By using the pair of the learning image and the correct answer data generated in this way as the learning data, a lot of learning data can be generated (inflated).
 本発明の第2態様に係る学習データ作成装置において、プロセッサの取得処理は、対象物の領域に対応する第1領域情報を取得し、正解データ生成処理は、取得した第1領域情報に基づいて第2領域情報を生成することが好ましい。 In the learning data creation device according to the second aspect of the present invention, the processor acquisition process acquires the first area information corresponding to the area of the object, and the correct answer data generation process is based on the acquired first area information. It is preferable to generate the second region information.
 本発明の第3態様に係る学習データ作成装置において、第1領域情報は、対象物の領域を手動で設定した領域情報、対象物の領域を画像処理により自動で抽出した領域情報、又は対象物の領域を画像処理により自動で抽出し、かつ手動で調整された領域情報であることが好ましい。 In the learning data creation device according to the third aspect of the present invention, the first area information is the area information in which the area of the object is manually set, the area information in which the area of the object is automatically extracted by image processing, or the object. It is preferable that the area is automatically extracted by image processing and the area information is manually adjusted.
 本発明の第4態様に係る学習データ作成装置において、正解データは、対象物の領域に対応する正解画像、対象物の領域を矩形で囲むバウンディングボックス情報、及び対象物の領域のエッジを示すエッジ情報のうちの少なくとも1つを含むことが好ましい。尚、正解画像は、マスク画像を含む。 In the learning data creation device according to the fourth aspect of the present invention, the correct answer data includes the correct image corresponding to the area of the object, the bounding box information surrounding the area of the object with a rectangle, and the edge indicating the edge of the area of the object. It is preferable to include at least one of the information. The correct image includes a mask image.
 本発明の第5態様に係る学習データ作成装置において、学習用画像生成処理は、対象物の画像を平行移動、回転移動、反転、又は拡縮させて学習用画像を生成し、正解データ生成処理は、第1領域情報を対象物の画像に対応して平行移動、回転移動、反転、又は拡縮させて正解データを生成することが好ましい。学習用画像の生成と正解データの生成とは、同期して同時に生成してもよいし、学習用画像及び正解データのうちのいずれか一方を生成してから他方を生成してもよい。 In the learning data creating apparatus according to the fifth aspect of the present invention, the learning image generation process generates a learning image by translating, rotating, reversing, or scaling the image of the object, and the correct answer data generation process is performed. , It is preferable to generate correct answer data by translating, rotating, reversing, or scaling the first region information corresponding to the image of the object. The generation of the learning image and the generation of the correct answer data may be synchronously generated at the same time, or one of the learning image and the correct answer data may be generated and then the other may be generated.
 本発明の第6態様に係る学習データ作成装置において、学習用画像生成処理は、対象物の画像を平行移動、回転移動、反転、又は拡縮させた2以上の画像を合成して学習用画像を生成し、正解データ生成処理は、2以上の画像の各々に対応する第1領域情報を対象物の画像に対応して平行移動、回転移動、反転、又は拡縮させて正解データを生成することが好ましい。これにより、複数の対象物の画像からなる学習用画像とその正解データを生成することができる。 In the learning data creation device according to the sixth aspect of the present invention, the learning image generation process synthesizes two or more images obtained by translating, rotating, reversing, or scaling the image of the object to create a learning image. In the correct answer data generation process, the correct answer data can be generated by translating, rotating, reversing, or scaling the first region information corresponding to each of the two or more images according to the image of the object. preferable. As a result, it is possible to generate a learning image composed of images of a plurality of objects and correct answer data thereof.
 本発明の第7態様に係る学習データ作成装置において、学習用画像生成処理は、複数の対象物の画像を含む学習用画像を生成する際に、複数の対象物の画像の全部又は一部が点又は線で接触する学習用画像を生成することが好ましい。 In the learning data creation device according to the seventh aspect of the present invention, in the learning image generation process, when a learning image including an image of a plurality of objects is generated, all or a part of the images of the plurality of objects is used. It is preferable to generate a learning image that makes contact with points or lines.
 本発明の第8態様に係る学習データ作成装置において、正解データは、複数の対象物の画像の全部又は一部が点又は線で接触する箇所のみを示すエッジ画像を含むことが好ましい。複数の対象物の画像の全部又は一部が点又は線で接触する学習用画像に対する正解データとして、複数の対象物の画像の点又は線で接触する箇所のみを示すエッジ画像を含めることができる。この学習データは、複数の対象物の画像の点又は線で接触する箇所の分離に有用なものとなる。 In the learning data creation device according to the eighth aspect of the present invention, the correct answer data preferably includes an edge image showing only a portion where all or a part of the images of a plurality of objects are in contact with each other by points or lines. As correct answer data for a learning image in which all or a part of images of a plurality of objects are in contact with points or lines, an edge image showing only a portion of the images of a plurality of objects in contact with points or lines can be included. .. This learning data is useful for separating the points of contact with points or lines in the images of a plurality of objects.
 本発明の第9態様に係る学習データ作成装置において、対象物は、少なくとも一部が透明であることが好ましい。少なくとも一部が透明な対象物の画像は、全体が不透明な対象物の画像と比較して抽出が難しく、かつ学習データが少ないため、少なくとも一部が透明な対象物の画像に対する学習データの生成は特に有効である。 In the learning data creation device according to the ninth aspect of the present invention, it is preferable that at least a part of the object is transparent. An image of an object that is at least partially transparent is more difficult to extract than an image of an object that is entirely opaque, and there is less training data. Therefore, training data is generated for an image of an object that is at least partially transparent. Is particularly effective.
 本発明の第10態様に係る学習データ作成装置において、プロセッサによる学習用画像生成処理は、複数の対象物の画像を含む学習用画像を生成する際に、透明な対象物の画像以外の対象物の画像を移動させることが好ましい。少なくとも一部が透明な対象物の画像の場合、その対象物の画像を任意に移動させたものと、透明な対象物を同じ位置に配置して撮影したものとは、照明光との位置関係で異なる画像になるからである。 In the learning data creating apparatus according to the tenth aspect of the present invention, when the learning image generation process by the processor generates a learning image including an image of a plurality of objects, an object other than the transparent object image is generated. It is preferable to move the image of. In the case of an image of an object that is at least partially transparent, the positional relationship between the image of the object that is arbitrarily moved and the image that the transparent object is placed at the same position and photographed is the illumination light. This is because the images will be different.
 本発明の第11態様に係る学習データ作成装置において、対象物は、少なくとも一部が透明であり、学習用画像生成処理は、対象物の画像を閾値以内で移動させて学習用画像を生成することが好ましい。 In the learning data creating apparatus according to the eleventh aspect of the present invention, at least a part of the object is transparent, and the learning image generation process moves the image of the object within a threshold value to generate a learning image. Is preferable.
 本発明の第11態様によれば、透明な対象物の画像の移動に制約(閾値)を設け、透明な対象物の画像を閾値以内で移動させて学習用画像を生成する。透明な対象物の画像を閾値以内で移動させて生成される学習用画像は、照明光との位置関係が大幅に変化したものにならず、その結果、その位置で実際に撮影される透明な対象物の画像と一致し、又は略一致したものとなる。 According to the eleventh aspect of the present invention, a constraint (threshold value) is set for the movement of the image of the transparent object, and the image of the transparent object is moved within the threshold value to generate a learning image. The learning image generated by moving the image of the transparent object within the threshold value does not have a significant change in the positional relationship with the illumination light, and as a result, the transparent image actually taken at that position is transparent. It matches or substantially matches the image of the object.
 本発明の第12態様に係る学習データ作成装置において、移動は、平行移動及び回転移動のいずれか一方を含むことが好ましい。 In the learning data creation device according to the twelfth aspect of the present invention, the movement preferably includes either parallel movement or rotational movement.
 第13態様に係る発明は、対象物の画像を移動させて生成した学習用画像と、学習用画像における対象物の領域を示す第2領域情報を有する正解データと、のペアからなる学習データである。 The invention according to the thirteenth aspect is learning data composed of a pair of a learning image generated by moving an image of an object and correct answer data having second area information indicating an area of the object in the learning image. be.
 本発明の第14態様に係る機械学習装置は、学習モデルと、上記の学習データを使用し、学習モデルを機械学習させる学習制御部と、を備える。 The machine learning device according to the 14th aspect of the present invention includes a learning model and a learning control unit that uses the above learning data to perform machine learning of the learning model.
 本発明の第15態様に係る機械学習装置において、学習モデルは、畳み込みニューラルネットワークで構成されることが好ましい。 In the machine learning device according to the fifteenth aspect of the present invention, the learning model is preferably composed of a convolutional neural network.
 第16態様に係る発明は、プロセッサが、以下の各ステップの処理を行うことにより機械学習用の学習データを作成する学習データ作成方法であって、対象物の画像を取得するステップと、取得した対象物の画像を移動させて学習用画像を生成するステップと、生成した学習用画像における対象物の領域に対応する第2領域情報を生成し、生成した第2領域情報を学習用画像に対する正解データとするステップと、生成した学習用画像と正解データとのペアを、学習データとしてメモリに記憶させるステップと、を含む。 The invention according to the 16th aspect is a learning data creation method in which a processor creates learning data for machine learning by performing the processing of each of the following steps, the step of acquiring an image of an object and the acquisition. The step of moving the image of the object to generate the learning image, the second area information corresponding to the area of the object in the generated learning image is generated, and the generated second area information is the correct answer to the learning image. It includes a step of making data and a step of storing a pair of a generated learning image and correct answer data in a memory as learning data.
 本発明の第17態様に係る学習データ作成方法において、対象物の領域に対応する第1領域情報を取得するステップを含み、正解データを生成するステップは、取得した第1領域情報に基づいて第2領域情報を生成することが好ましい。 In the learning data creation method according to the 17th aspect of the present invention, the step of generating correct answer data includes the step of acquiring the first area information corresponding to the area of the object, and the step of generating the correct answer data is the first based on the acquired first area information. It is preferable to generate two-region information.
 本発明の第18態様に係る学習データ作成方法において、正解データは、対象物の領域に対応する正解画像、対象物の領域を矩形で囲むバウンディングボックス情報、及び対象物の領域のエッジを示すエッジ情報のうちの少なくとも1つを含むことが好ましい。 In the learning data creation method according to the eighteenth aspect of the present invention, the correct answer data includes the correct image corresponding to the area of the object, the bounding box information surrounding the area of the object with a rectangle, and the edge indicating the edge of the area of the object. It is preferable to include at least one of the information.
 本発明の第19態様に係る学習データ作成方法において、学習用画像を生成するステップは、複数の対象物の画像を配置する際に、複数の対象物の画像の全部又は一部を点又は線で接触させることが好ましい。 In the learning data creation method according to the nineteenth aspect of the present invention, in the step of generating a learning image, when arranging images of a plurality of objects, all or a part of the images of the plurality of objects is pointed or lined. It is preferable to make contact with.
 本発明の第20態様に係る学習データ作成方法において、正解データは、複数の対象物の画像の点又は線で接触する箇所のみを示すエッジ画像を含むことが好ましい。 In the learning data creation method according to the twentieth aspect of the present invention, it is preferable that the correct answer data includes an edge image showing only the points or lines of the images of a plurality of objects that come into contact with each other.
 本発明の第21態様に係る学習データ作成方法において、対象物は、少なくとも一部が透明であることが好ましい。 In the learning data creation method according to the 21st aspect of the present invention, it is preferable that at least a part of the object is transparent.
 本発明の第22態様に係る学習データ作成方法において、学習用画像を生成するステップは、複数の対象物の画像を含む学習用画像を生成する際に、透明な対象物の画像以外の対象物の画像を移動させることが好ましい。 In the learning data creation method according to the 22nd aspect of the present invention, the step of generating a learning image is an object other than a transparent object image when generating a learning image including an image of a plurality of objects. It is preferable to move the image of.
 第23態様に係る発明は、対象物の画像を取得する機能と、取得した対象物の画像を移動させて学習用画像を生成する機能と、生成した学習用画像における対象物の領域に対応する第2領域情報を生成し、生成した第2領域情報を学習用画像に対する正解データとする機能と、生成した学習用画像と正解データとのペアを、学習データとしてメモリに記憶させる機能と、をコンピュータにより実現させる学習データ作成プログラムである。 The invention according to the 23rd aspect corresponds to a function of acquiring an image of an object, a function of moving the acquired image of the object to generate a learning image, and a region of the object in the generated learning image. A function of generating second area information and using the generated second area information as correct answer data for a learning image, and a function of storing a pair of the generated learning image and correct answer data in a memory as learning data. It is a learning data creation program realized by a computer.
 本発明の第24態様に係る学習データ作成プログラムにおいて、対象物の領域に対応する第1領域情報を取得する機能を含み、正解データを生成する機能は、取得した第1領域情報に基づいて第2領域情報を生成することが好ましい。 In the learning data creation program according to the 24th aspect of the present invention, the function of acquiring the first area information corresponding to the area of the object and the function of generating the correct answer data is the first based on the acquired first area information. It is preferable to generate two-region information.
 本発明によれば、対象物の領域を認識する学習モデルを機械学習させるための学習データを効率よく作成することができる。 According to the present invention, it is possible to efficiently create learning data for machine learning a learning model that recognizes an object area.
図1は、学習済みの学習モデルに入力される撮影画像と学習モデルから取得したい出力結果とを示す図である。FIG. 1 is a diagram showing a captured image input to the trained learning model and an output result desired to be acquired from the learning model. 図2は、学習データの一例を示す図である。FIG. 2 is a diagram showing an example of learning data. 図3は、正解データを自動で作成する場合の画像処理を示す概念図である。FIG. 3 is a conceptual diagram showing image processing when correct answer data is automatically created. 図4は、シミュレーションにより学習データを量産する方法を示す概念図である。FIG. 4 is a conceptual diagram showing a method of mass-producing learning data by simulation. 図5は、シミュレーションにより学習データを作成する第1実施形態を示す図である。FIG. 5 is a diagram showing a first embodiment in which learning data is created by simulation. 図6は、2枚の撮影画像から1枚の撮影画像を生成する態様を示す図である。FIG. 6 is a diagram showing a mode in which one photographed image is generated from two photographed images. 図7は、シミュレーションにより学習データを作成する第2実施形態を示す図である。FIG. 7 is a diagram showing a second embodiment in which learning data is created by simulation. 図8は、本発明に係る学習データ作成装置のハードウェア構成の一例を示すブロック図である。FIG. 8 is a block diagram showing an example of the hardware configuration of the learning data creation device according to the present invention. 図9は、複数の薬剤が一包化された薬包を示す平面図である。FIG. 9 is a plan view showing a drug package in which a plurality of drugs are packaged. 図10は、図8に示した撮影装置の概略構成を示すブロック図である。FIG. 10 is a block diagram showing a schematic configuration of the photographing apparatus shown in FIG. 図11は、撮影装置の概略構成を示す平面図である。FIG. 11 is a plan view showing a schematic configuration of the photographing apparatus. 図12は、撮影装置の概略構成を示す側面図である。FIG. 12 is a side view showing a schematic configuration of the photographing apparatus. 図13は、本発明に係る学習データ作成装置の実施形態を示すブロック図である。FIG. 13 is a block diagram showing an embodiment of the learning data creation device according to the present invention. 図14は、画像取得部が取得する撮影画像及び第1領域情報取得部が取得する撮影画像内の薬剤の領域を示す第1領域情報の一例を示す図である。FIG. 14 is a diagram showing an example of first region information indicating a region of a drug in a captured image acquired by the image acquisition unit and a captured image acquired by the first region information acquisition unit. 図15は、図14に示した撮影画像及びマスク画像から生成した学習データの一例を示す図である。FIG. 15 is a diagram showing an example of learning data generated from the captured image and the mask image shown in FIG. 図16は、複数の薬剤の点又は線で接触する箇所のみを示すエッジ画像の一例を示す図である。FIG. 16 is a diagram showing an example of an edge image showing only a portion of contact with a plurality of drug points or lines. 図17は、複数の透明薬剤及び複数の不透明な薬剤を含む撮影画像の一例を示す図である。FIG. 17 is a diagram showing an example of a photographed image including a plurality of transparent agents and a plurality of opaque agents. 図18は、透明薬剤のレンズ効果を説明するために用いた図である。FIG. 18 is a diagram used to explain the lens effect of the transparent agent. 図19は、透明薬剤の移動の制限を説明するために用いた図である。FIG. 19 is a diagram used to illustrate the limitation of movement of the clearing agent. 図20は、本発明に係る機械学習装置の実施形態を示すブロック図である。FIG. 20 is a block diagram showing an embodiment of the machine learning device according to the present invention. 図21は、本発明に係る学習データ作成方法の実施形態を示すフローチャートである。FIG. 21 is a flowchart showing an embodiment of the learning data creation method according to the present invention.
 以下、添付図面に従って本発明に係る学習データ作成装置、方法、プログラム、学習データ及び機械学習装置の好ましい実施形態について説明する。 Hereinafter, preferred embodiments of the learning data creation device, method, program, learning data, and machine learning device according to the present invention will be described with reference to the accompanying drawings.
 [本発明の概要]
 図1は、学習済みの学習モデルに入力される画像と学習モデルから取得したい出力結果とを示す図である。
[Outline of the present invention]
FIG. 1 is a diagram showing an image input to the trained learning model and an output result desired to be acquired from the learning model.
 図1(A)は、対象物(本例では、薬剤)を撮影した画像であり、図1(B)は、学習済みの学習モデル(以下、「学習済みモデル」という)に図1(A)に示した画像を入力した場合に、学習済みモデルが出力して欲しい出力結果である。 FIG. 1 (A) is an image of an object (drug in this example) taken, and FIG. 1 (B) is shown in FIG. 1 (A) in a trained learning model (hereinafter referred to as “learned model”). This is the output result that the trained model wants to output when the image shown in) is input.
 学習済みモデルの出力結果は、図1(A)に示した薬剤の領域(薬剤領域)を推論した推論結果であり、本例では、薬剤領域と背景領域とを領域分類したマスク画像である。尚、推論結果は、マスク画像に限らず、例えば、薬剤領域を矩形の枠で囲むバウンディングボックス、又はバウンディングボックスの対角の2点の座標、又はこれらの組み合わせが考えられる。 The output result of the trained model is an inference result inferring the drug region (drug region) shown in FIG. 1 (A), and in this example, it is a mask image in which the drug region and the background region are classified into regions. The inference result is not limited to the mask image, and for example, a bounding box surrounding the drug region with a rectangular frame, coordinates of two diagonal points of the bounding box, or a combination thereof can be considered.
 学習済みモデルにより、任意の入力画像から所望の出力結果(推論結果)を得るためには、未学習の学習モデルを機械学習させるための学習データを大量に準備する必要がある。 In order to obtain a desired output result (inference result) from an arbitrary input image by the trained model, it is necessary to prepare a large amount of training data for machine learning the unlearned learning model.
 図2は、学習データの一例を示す図である。 FIG. 2 is a diagram showing an example of learning data.
 図2(A)~(C)は、それぞれ左側が薬剤の画像(学習用画像)であり、右側が薬剤の画像に対する正解画像(正解データ)であり、左右の薬剤の画像と正解画像とのペアが、学習データである。図2に示す右側の正解画像は、各薬剤の領域を背景から区別するマスク画像である。 In FIGS. 2A to 2C, the left side is a drug image (learning image), the right side is a correct image (correct data) for the drug image, and the left and right drug images and the correct image are The pair is the training data. The correct image on the right side shown in FIG. 2 is a mask image that distinguishes the region of each drug from the background.
 基本的に、学習データには、図2(A)~(C)に示した左側の対象物(薬剤)の画像が必要であるが、例えば、新薬等、数が少ない薬剤も存在するため、多くの画像が集まらないという問題がある。 Basically, the learning data requires images of the objects (drugs) on the left side shown in FIGS. 2 (A) to 2 (C), but since there are a small number of drugs such as new drugs, for example, there are some drugs. There is a problem that many images are not collected.
 学習用画像(例えば、薬剤の画像)に対する正解画像の作成は、薬剤の画像をディスプレイに表示させ、ディスプレイに表示された画像を見ながら薬剤の領域をユーザが画素単位で塗り潰して作成するのが一般的である。 To create a correct answer image for a learning image (for example, an image of a drug), the image of the drug is displayed on the display, and the user fills the area of the drug pixel by pixel while viewing the image displayed on the display. It is common.
 また、正解画像を自動で作成する場合、例えば、テンプレートマッチングにより薬剤の位置、回転角を計算することで求めることができる。 In addition, when the correct answer image is automatically created, it can be obtained by calculating the position and rotation angle of the drug by template matching, for example.
 図3は、正解画像を自動で作成する場合の画像処理を示す概念図である。 FIG. 3 is a conceptual diagram showing image processing when a correct image is automatically created.
 薬剤を撮影した画像(撮影画像)ITPに対して、その薬剤を示す画像であるテンプレート画像Itplを用意する。薬剤の形状が円形でない場合には、探索する回転角毎の複数のテンプレート画像Itplを用意することが好ましい。 For the image (photographed image) ITP obtained by photographing the drug, a template image Itpl which is an image showing the drug is prepared. When the shape of the drug is not circular, it is preferable to prepare a plurality of template images Itpl for each rotation angle to be searched.
 そして、撮影画像ITPの中からテンプレート画像Itplと相関が最も高くなる位置、及び回転角のテンプレート画像を探索すること(テンプレートマッチング)により、相関が最も高い時のテンプレート画像Itplの位置、及びテンプレート画像Itplの回転角に基づいて、撮影画像ITPにおける薬剤の領域を示す正解データを作成することができる。 Then, by searching for the position where the correlation with the template image Itpl is the highest from the captured image ITP and the template image with the rotation angle (template matching), the position of the template image Itpl when the correlation is the highest, and Based on the rotation angle of the template image Itpl , correct answer data indicating the region of the drug in the captured image ITP can be created.
 また、撮影画像ITPと正解画像(例えば、マスク画像)とを重ね合わせてディスプレイに表示し、マスク画像に誤差がある場合には、ユーザがマスク画像を画素単位で修正するようにしてもよい。 Further, the captured image ITP and the correct image (for example, a mask image) may be superimposed and displayed on the display, and if there is an error in the mask image, the user may correct the mask image on a pixel-by-pixel basis.
 [シミュレーションにより学習データを量産する方法]
 図4は、シミュレーションにより学習データを量産する方法を示す概念図である。
[How to mass-produce learning data by simulation]
FIG. 4 is a conceptual diagram showing a method of mass-producing learning data by simulation.
 まず、対象物の画像(学習用画像)と対象物の領域を示す正解画像とのペアを準備する。 First, prepare a pair of an image of the object (learning image) and a correct image showing the area of the object.
 続いて、学習用画像と正解画像とのペアを、同調させて平行移動(シフト)、回転移動、又は反転させ、あるいは同調して移動させた画像と移動前の画像とを合成(コピー・アンド・ペースト)し、新たな学習用画像と正解画像とのペアからなる学習データを作成する。回転移動とは、一定点を中心にして対象物の画像を回転させ、他の位置に移動させることをいう。本例では、回転移動は、対象物の一定点(例えば、重心)を中心にして回転させる場合をいい、以下、「回転移動」を、単に「回転」という。 Then, the pair of the learning image and the correct image is synchronized and translated (shifted), rotated, or inverted, or the image that is moved in synchronization and the image before the movement are combined (copy and paste). -Paste) to create training data consisting of a new pair of learning image and correct image. Rotational movement refers to rotating an image of an object around a certain point and moving it to another position. In this example, rotational movement refers to the case of rotating an object around a certain point (for example, the center of gravity), and hereinafter, "rotational movement" is simply referred to as "rotation".
 これらの操作を繰り返すことにより、学習データを量産することができる。また、このような方法による学習データの作成は、例えば、新たな学習用画像の作成後に正解画像を作成する場合に比べて処理が簡単である。 By repeating these operations, learning data can be mass-produced. Further, the creation of the learning data by such a method is simpler than, for example, the case where the correct answer image is created after the creation of a new learning image.
 <シミュレーションにより学習データを作成する第1実施形態>
 図5は、シミュレーションにより学習データを作成する第1実施形態を示す図である。
<First embodiment for creating learning data by simulation>
FIG. 5 is a diagram showing a first embodiment in which learning data is created by simulation.
 図5(A)は、対象物である薬剤を撮影した撮影画像と、その撮影画像に基づいて手動又は自動で生成したマスク画像とのペアを示す図である。 FIG. 5A is a diagram showing a pair of a photographed image of a drug as an object and a mask image manually or automatically generated based on the photographed image.
 本発明は、撮影画像とマスク画像とのペアから、シミュレーションにより学習データを作成する(学習データを水増しする)。 The present invention creates learning data by simulation from a pair of a photographed image and a mask image (inflates the learning data).
 図5(B)は、図5(A)に示した撮影画像とマスク画像とをそれぞれ反転した撮影画像及びマスク画像のペアを示す図である。 FIG. 5 (B) is a diagram showing a pair of a photographed image and a mask image in which the photographed image and the mask image shown in FIG. 5 (A) are inverted, respectively.
 図5(B)に示す右側の反転(左右反転)されたマスク画像は、図5(B)に示す左側の反転された撮影画像における薬剤の領域を示すマスク画像となる。したがって、反転された撮影画像は、新たな学習用画像とすることができ、反転されたマスク画像は、新たに生成された学習用画像に対する正解データとすることができる。 The inverted (left-right inverted) mask image on the right side shown in FIG. 5 (B) is a mask image showing the region of the drug in the inverted photographed image on the left side shown in FIG. 5 (B). Therefore, the inverted captured image can be used as a new learning image, and the inverted mask image can be used as correct answer data for the newly generated learning image.
 即ち、図5(A)に示した撮影画像とマスク画像とを、同期して反転させることで、図5(B)に示す学習用画像とマスク画像のペアからなる新たな学習データを作成することができる。尚、画像の反転は、左右反転に限らず、上下反転も含む。また、先に左の画像を作っておき、そこから薬剤画像の領域を検出することによって、右の画像を作成するようにしてもよい。 That is, by synchronizing and inverting the captured image and the mask image shown in FIG. 5 (A), new learning data consisting of the pair of the learning image and the mask image shown in FIG. 5 (B) is created. be able to. The image inversion is not limited to horizontal inversion, but also includes vertical inversion. Alternatively, the image on the left may be created first, and the image on the right may be created by detecting the region of the drug image from the image.
 図5(C)は、図5(A)及び図5(B)に示した画像を加算した画像を示す図である。 FIG. 5 (C) is a diagram showing an image obtained by adding the images shown in FIGS. 5 (A) and 5 (B).
 図5(C)に示す左側の撮影画像は、図5(A)に示した撮影画像と図5(B)に示した反転された撮影画像とを合成することで作成することができる。図5(C)に示す左側の撮影画像は、図5(A)に示した撮影画像と図5(B)に示した反転された撮影画像とを合成することで作成することができる。即ち、図5(C)に示す撮影画像は、図5(A)に示す撮影画像に、図5(B)に示す反転された撮影画像中の薬剤の領域を切り出した画像(薬剤画像)を貼り付けることで作成することができる。尚、反転された撮影画像からの薬剤画像の切り出しは、図5(B)の反転されたマスク画像に基づいて、図5(A)に示した撮影画像から薬剤の領域を切り出す処理により行うことができる。 The photographed image on the left side shown in FIG. 5 (C) can be created by synthesizing the photographed image shown in FIG. 5 (A) and the inverted photographed image shown in FIG. 5 (B). The photographed image on the left side shown in FIG. 5 (C) can be created by synthesizing the photographed image shown in FIG. 5 (A) and the inverted photographed image shown in FIG. 5 (B). That is, the captured image shown in FIG. 5 (C) is an image (drug image) obtained by cutting out a region of the drug in the inverted captured image shown in FIG. 5 (B) on the captured image shown in FIG. 5 (A). It can be created by pasting. The drug image is cut out from the inverted photographed image by the process of cutting out the drug region from the photographed image shown in FIG. 5 (A) based on the inverted mask image of FIG. 5 (B). Can be done.
 また、2以上の薬剤画像を合成する方法は、上記のようにマスク画像を使用する方法に限らない。例えば、薬剤が撮影されていない背景のみの背景画像を使用して、図5(A)及び図5(B)からそれぞれ薬剤画像のみを抽出し、抽出した各薬剤画像を背景画像に合成することで、図5(C)に示した撮影画像(学習用画像)を生成することができる。更に、背景が黒(画素値がゼロ)となるように撮影された撮影画像の場合、各撮影画像を加算することで各薬剤画像を有する学習用画像を生成することができる。 Further, the method of synthesizing two or more drug images is not limited to the method of using a mask image as described above. For example, using a background image of only the background in which the drug has not been photographed, only the drug image is extracted from FIGS. 5 (A) and 5 (B), and each extracted drug image is combined with the background image. Therefore, the captured image (learning image) shown in FIG. 5C can be generated. Further, in the case of a captured image captured so that the background is black (pixel value is zero), a learning image having each drug image can be generated by adding each captured image.
 一方、図5(C)に示す右側のマスク画像は、図5(A)に示したマスク画像と図5(B)に示す反転されたマスク画像とを加算することで作成することができる。尚、2つのマスク画像を加算する際に、インスタンス分離のために、本例では図5(C)の反転されたマスク画像の薬剤の領域の画素値を例えば「0.5」、背景の画素値を「0」として加算することで、生成されたマスク画像における2つの薬剤の領域の画素値を異ならせる。 On the other hand, the mask image on the right side shown in FIG. 5 (C) can be created by adding the mask image shown in FIG. 5 (A) and the inverted mask image shown in FIG. 5 (B). When adding the two mask images, in order to separate the instances, in this example, the pixel value of the drug region of the inverted mask image in FIG. 5C is set to, for example, "0.5", and the background pixels. By adding the values as "0", the pixel values of the two drug regions in the generated mask image are made different.
 このようにして、図5(A)に示した撮影画像とマスク画像のペアからなる1つの学習データから、図5(B)及び(C)に示した2つの学習データを作成することができる。 In this way, the two training data shown in FIGS. 5 (B) and 5 (C) can be created from one training data consisting of the pair of the captured image and the mask image shown in FIG. 5 (A). ..
 また、上記の第1実施形態では、図5(A)に示した撮影画像及びマスク画像をそれぞれ反転し、図5(B)に示した新たな撮影画像(学習用画像)及びマスク画像のペアからなる学習データを作成するようにしたが、これに限らず、図5(A)に示した撮影画像及びマスク画像をそれぞれ同期して平行移動、回転、又は拡縮させて、新たな学習用画像及びマスク画像のペアからなる学習データを作成してもよい。尚、撮影画像及びマスク画像をそれぞれ同期して平行移動、回転、又は縮小させることで、背景に余白が生じる場合には、背景と同様な画素値で余白を埋めることが好ましい。 Further, in the first embodiment described above, the captured image and the mask image shown in FIG. 5 (A) are inverted, and the new captured image (learning image) and the mask image pair shown in FIG. 5 (B) are inverted. The learning data consisting of the above is not limited to this, but the captured image and the mask image shown in FIG. 5 (A) are synchronizedly moved, rotated, or scaled in parallel to create a new learning image. And training data consisting of a pair of mask images may be created. When a margin is generated in the background by translating, rotating, or reducing the captured image and the mask image, respectively, it is preferable to fill the margin with the same pixel value as the background.
 更に、上記の第1実施形態では、1つの薬剤が撮影された撮影画像及びマスク画像から新たな撮影画像及びマスク画像を作成するが、複数の異なる薬剤が別々に撮影された複数の撮影画像及びマスク画像、又は複数の異なる薬剤が同時に撮影された撮影画像及びマスク画像から、新たな撮影画像及びマスク画像を作成するようにしてもよい。 Further, in the first embodiment described above, a new photographed image and a mask image are created from the photographed image and the mask image in which one drug is photographed, but a plurality of photographed images and a plurality of photographed images in which a plurality of different agents are separately photographed and A new photographed image and a mask image may be created from the mask image or the photographed image and the mask image in which a plurality of different agents are simultaneously photographed.
 図6は、2枚の撮影画像から1枚の学習用画像を生成する態様を示す図である。 FIG. 6 is a diagram showing a mode in which one learning image is generated from two captured images.
 図6(A)に示す例では、2つの対象物(薬剤)がそれぞれ撮影された2枚の撮影画像から、4つの薬剤画像を有する新たな1枚の学習用画像を生成する場合に関して示している。 In the example shown in FIG. 6 (A), a case where a new learning image having four drug images is generated from two captured images in which two objects (drugs) are captured is shown. There is.
 2枚の撮影画像から4つの薬剤画像を切り出し、切り出した薬剤画像をそれぞれ平行移動、回転、又は拡縮して合成することで、4つの薬剤画像を含む新たな学習用画像を生成している。 Four drug images are cut out from the two captured images, and the cut out drug images are translated, rotated, or scaled and combined to generate a new learning image including the four drug images.
 図6(B)に示す例では、図6(A)と同様に2枚の撮影画像から1枚の学習用画像を生成する場合に関して示しているが、2枚の撮影画像内の4つの薬剤画像のうちの3つの薬剤画像を使用して1枚の学習用画像を生成している。このように、新たに生成される学習用画像は、2枚の撮影画像内の全ての薬剤画像を使用しなくてもよい。また、生成される1枚の学習用画像内の複数の薬剤画像には、平行移動、回転、又は拡縮等の操作が行われていない薬剤画像(移動しない薬剤画像)が含まれていてもよい。 In the example shown in FIG. 6 (B), the case where one learning image is generated from the two captured images is shown as in FIG. 6 (A), but the four agents in the two captured images are shown. One learning image is generated using three drug images out of the images. As described above, the newly generated learning image does not have to use all the drug images in the two captured images. In addition, the plurality of drug images in one generated learning image may include drug images that have not been subjected to operations such as translation, rotation, or scaling (drug images that do not move). ..
 尚、上記のようにして新たな学習用画像を生成する場合、新たに生成される学習用画像に対応するマスク画像も生成され、新たに生成される学習用画像及びマスク画像のペアが新たな学習データとなる。 When a new learning image is generated as described above, a mask image corresponding to the newly generated learning image is also generated, and a new pair of the newly generated learning image and the mask image is generated. It becomes learning data.
 <シミュレーションにより学習データを作成する第2実施形態>
 図7は、シミュレーションにより学習データを作成する第2実施形態を示す図である。
<Second embodiment for creating learning data by simulation>
FIG. 7 is a diagram showing a second embodiment in which learning data is created by simulation.
 図7(A)は、薬剤を撮影した撮影画像と、その撮影画像に基づいて手動又は自動で生成したマスク画像とのペアを示す図であり、図5(A)に示したペアと同一である。 FIG. 7A is a diagram showing a pair of a photographed image obtained by photographing the drug and a mask image manually or automatically generated based on the photographed image, and is the same as the pair shown in FIG. 5A. be.
 図7(B)は、それぞれ図7(A)に示した撮影画像及びマスク画像からそれぞれ切り出す薬剤領域を示す図である。 FIG. 7B is a diagram showing drug regions cut out from the photographed image and the mask image shown in FIG. 7A, respectively.
 本例では、薬剤領域を囲む矩形の枠内の領域を、画像を切り出す領域(切出領域)としている。尚、マスク画像により薬剤領域は既知であるため、マスク画像に基づいて薬剤領域を囲む矩形の枠内の画像を切り出すことができる。 In this example, the area within the rectangular frame surrounding the drug area is defined as the area for cutting out the image (cutout area). Since the drug region is known from the mask image, the image in the rectangular frame surrounding the drug region can be cut out based on the mask image.
 図7(C)は、それぞれ図7(A)に示した撮影画像及びマスク画像から切り出された切出領域の画像を示す。撮影画像からの薬剤画像の切り出しは、図7(A)に示したマスク画像に基づいて、図7(A)に示した撮影画像から薬剤の領域を切り出す処理(薬剤画像取得処理)により行うことができる。尚、図7(A)に示したマスク画像は、薬剤の領域を示す情報(第1領域情報)を有するため、マスク画像から薬剤の領域(以下、「薬剤マスク画像」という)を切り出すことができる。また、薬剤画像取得処理には、切り出された後の状態の画像を、メモリ等から読み出す処理が含まれていてもよい。 FIG. 7 (C) shows an image of a cut-out area cut out from the photographed image and the mask image shown in FIG. 7 (A), respectively. The drug image is cut out from the captured image by a process (drug image acquisition process) of cutting out a drug region from the captured image shown in FIG. 7 (A) based on the mask image shown in FIG. 7 (A). Can be done. Since the mask image shown in FIG. 7A has information indicating the drug region (first region information), the drug region (hereinafter referred to as “drug mask image”) can be cut out from the mask image. can. Further, the drug image acquisition process may include a process of reading an image in a state after being cut out from a memory or the like.
 図7(D)は、切り出された薬剤画像及び薬剤マスク画像を任意の位置及び任意の回転角で貼り付けて作成した、新たな撮影画像及びマスク画像を示す図である。 FIG. 7 (D) is a diagram showing a new photographed image and a mask image created by pasting the cut out drug image and drug mask image at an arbitrary position and an arbitrary rotation angle.
 図7(D)に示す撮影画像及びマスク画像は、図7(A)に示した撮影画像及びマスク画像から上記の画像処理により作成した、新たな学習用画像及び正解データのペアからなる学習データとなる。 The captured image and mask image shown in FIG. 7 (D) are training data composed of a new pair of learning image and correct answer data created by the above image processing from the captured image and mask image shown in FIG. 7 (A). It becomes.
 図7(E)は、切り出された薬剤画像及び薬剤マスク画像を任意の位置及び任意に回転角で貼り付けて作成した、新たな撮影画像(学習用画像)及びマスク画像を示す図であり、特に複数の薬剤画像が点又は線で接触するように作成されている。 FIG. 7 (E) is a diagram showing a new photographed image (learning image) and a mask image created by pasting the cut out drug image and drug mask image at an arbitrary position and an arbitrary rotation angle. In particular, a plurality of drug images are created so as to be in contact with each other by dots or lines.
 学習モデルにおける推論結果を向上させるためには、薬剤同士が点又は線で接触している状態の学習データを大量に作成する必要がある。薬剤同士が点又は線で接触している撮影画像から、各薬剤の領域を精度よく推論するのは、各薬剤が接触せずに孤立している場合に比べて難しいからである。 In order to improve the inference result in the learning model, it is necessary to create a large amount of learning data in which the drugs are in contact with each other by points or lines. This is because it is more difficult to accurately infer the region of each drug from the photographed image in which the drugs are in contact with each other by a point or a line, as compared with the case where each drug is isolated without contact.
 図7(E)に示す左側のマスク画像は、各薬剤領域が接しないように画像処理することが好ましい。各薬剤領域が接触する箇所は既知であるため、その接触する箇所を背景色に置換することで、各薬剤領域が接触しないようにできる。 The mask image on the left side shown in FIG. 7 (E) is preferably image-processed so that the drug regions do not come into contact with each other. Since the contact points of the drug regions are known, the contact points can be prevented from contacting each other by replacing the contact points with a background color.
 また、薬剤同士が点又は線で接触する各薬剤が同一薬剤の場合、インスタンス分離のために、薬剤領域の画素値を異ならせることが好ましい。この場合、マスク画像における各薬剤領域は、その画素値の違いで認識できるため、薬剤領域が接触する箇所を背景色に置換しなくてもよい。 Further, when each drug in which the drugs are in contact with each other by a point or a line is the same drug, it is preferable to make the pixel value of the drug region different for instance separation. In this case, since each drug region in the mask image can be recognized by the difference in the pixel value, it is not necessary to replace the portion where the drug region contacts with the background color.
 以上のようにして、薬剤が撮影された撮影画像とその撮影画像内の薬剤の領域を示す第1領域情報(マスク画像)とを元に、多くの学習データを作成することができる。 As described above, a lot of learning data can be created based on the photographed image in which the drug was photographed and the first area information (mask image) indicating the area of the drug in the photographed image.
 [学習データ作成装置の構成]
 図8は、本発明に係る学習データ作成装置のハードウェア構成の一例を示すブロック図である。
[Configuration of learning data creation device]
FIG. 8 is a block diagram showing an example of the hardware configuration of the learning data creation device according to the present invention.
 図8に示す学習データ作成装置1は、例えば、コンピュータにより構成することができ、主として画像取得部22、CPU(Central Processing Unit)24、操作部25、RAM(Random Access Memory)26、ROM(Read Only Memory)27、メモリ28及び表示部29から構成されている。 The learning data creation device 1 shown in FIG. 8 can be configured by, for example, a computer, and is mainly composed of an image acquisition unit 22, a CPU (Central Processing Unit) 24, an operation unit 25, a RAM (Random Access Memory) 26, and a ROM (Read). It is composed of Only Memory) 27, a memory 28, and a display unit 29.
 画像取得部22は、撮影装置10により薬剤が撮影された撮影画像を、撮影装置10から取得する。 The image acquisition unit 22 acquires a photographed image in which the drug is photographed by the photographing device 10 from the photographing device 10.
 撮影装置10により撮影される薬剤は、例えば、服用1回分の薬剤、又は任意の薬剤であり、薬包に入っているものでもよいし、薬包に入っていないものでもよい。 The drug photographed by the imaging device 10 is, for example, a drug for one dose or an arbitrary drug, which may be contained in a drug package or may not be contained in the drug package.
 図9は、複数の薬剤が一包化された薬包を示す平面図である。 FIG. 9 is a plan view showing a drug package in which a plurality of drugs are packaged.
 図9に示す薬包TPは、1回に服用される複数の薬剤が透明な包に収納され、一包ずつパッキングされたものである。薬包TPは、図11及び図12に示すように帯状に連結されており、各薬包TPを切り離し可能にする切取線が入っている。尚、図9に示す薬包TPには、6個の薬剤Tが一包化されている。 The drug package TP shown in FIG. 9 is a package in which a plurality of drugs to be taken at one time are stored in a transparent package and packed one by one. The drug package TPs are connected in a band shape as shown in FIGS. 11 and 12, and have a cut line that enables each drug package TP to be separated. In the drug package TP shown in FIG. 9, six drug Ts are packaged in one package.
 図10は、図8に示した撮影装置の概略構成を示すブロック図である。 FIG. 10 is a block diagram showing a schematic configuration of the photographing apparatus shown in FIG.
 図10に示す撮影装置10は、薬剤を撮影する2台のカメラ12A、12Bと、薬剤を照明する2台の照明装置16A,16Bと、撮影制御部13とから構成されている。 The imaging device 10 shown in FIG. 10 includes two cameras 12A and 12B for photographing the drug, two lighting devices 16A and 16B for illuminating the drug, and a photographing control unit 13.
 図11及び図12は、それぞれ撮影装置の概略構成を示す平面図及び側面図である。 11 and 12 are a plan view and a side view showing a schematic configuration of the photographing apparatus, respectively.
 薬包TPは、水平(x-y平面)に設置された透明なステージ14の上に載置される。 The medicine package TP is placed on a transparent stage 14 installed horizontally (xy plane).
 カメラ12A、12Bは、ステージ14と直交する方向(z方向)に、ステージ14を挟んで互いに対向して配置される。カメラ12Aは、薬包TPの表面に正対し、薬包TPを上方から撮影する。カメラ12Bは、薬包TPの裏面に正対し、薬包TPを下方から撮影する。 The cameras 12A and 12B are arranged so as to face each other with the stage 14 in the direction orthogonal to the stage 14 (z direction). The camera 12A faces the surface of the medicine package TP and photographs the medicine package TP from above. The camera 12B faces the back surface of the medicine package TP and photographs the medicine package TP from below.
 ステージ14を挟んで、カメラ12Aの側には、照明装置16Aが備えられ、カメラ12Bの側には、照明装置16Bが備えられる。 A lighting device 16A is provided on the side of the camera 12A and a lighting device 16B is provided on the side of the camera 12B with the stage 14 in between.
 照明装置16Aは、ステージ14の上方に配置され、ステージ14に載置された薬包TPを上方から照明する。照明装置16Aは、放射状に配置された4つの発光部16A1~16A4を有し、直交する4方向から照明光を照射する。各発光部16A1~16A4の発光は、個別に制御される。 The lighting device 16A is arranged above the stage 14 and illuminates the medicine package TP placed on the stage 14 from above. The illuminating device 16A has four light emitting units 16A1 to 16A4 arranged radially, and irradiates the illuminating light from four orthogonal directions. The light emission of each light emitting unit 16A1 to 16A4 is individually controlled.
 照明装置16Bは、ステージ14の下方に配置され、ステージ14に載置された薬包TPを下方から照明する。照明装置16Bは、照明装置16Aと同様に放射状に配置された4つの発光部16B1~16B4を有し、直交する4方向から照明光を照射する。各発光部16B1~16B4の発光は、個別に制御される。 The lighting device 16B is arranged below the stage 14 and illuminates the medicine package TP placed on the stage 14 from below. The illuminating device 16B has four light emitting units 16B1 to 16B4 arranged radially like the illuminating device 16A, and irradiates the illuminating light from four orthogonal directions. The light emission of each light emitting unit 16B1 to 16B4 is individually controlled.
 撮影は、次のように行われる。まず、カメラ12Aを用いて、薬包TPを上方から撮影する。撮影の際には、照明装置16Aの各発光部16A1~16A4を順次発光させ、4枚の画像の撮影を行い、続いて、各発光部16A1~16A4を同時に発光させ、1枚の画像の撮影を行う。次に、下方の照明装置16Bの各発光部16B1~16B4を同時に発光させるとともに、図示しないリフレクタを挿入し、リフレクタを介して薬包TPを下から照明し、カメラ12Aを用いて上方から薬包TPの撮影を行う。 Shooting is done as follows. First, the medicine package TP is photographed from above using the camera 12A. At the time of shooting, the light emitting units 16A1 to 16A4 of the lighting device 16A are sequentially emitted to emit four images, and then the light emitting units 16A1 to 16A4 are simultaneously emitted to emit one image. I do. Next, each light emitting unit 16B1 to 16B4 of the lower lighting device 16B is made to emit light at the same time, a reflector (not shown) is inserted, the medicine package TP is illuminated from below through the reflector, and the medicine package is illuminated from above using the camera 12A. Take a picture of TP.
 各発光部16A1~16A4を順次発光させて撮影される4枚の画像は、それぞれ照明方向が異なっており、薬剤の表面に刻印(凹凸)がある場合に刻印による影の出方が異なるものとなる。これらの4枚の撮影画像は、薬剤Tの表面側の刻印を強調した刻印画像を生成するために使用される。 The four images taken by sequentially emitting light from each of the light emitting units 16A1 to 16A4 have different illumination directions, and when there is a marking (unevenness) on the surface of the drug, the appearance of the shadow due to the marking is different. Become. These four captured images are used to generate an engraved image that emphasizes the engraving on the surface side of the drug T.
 各発光部16A1~16A4を同時に発光させて撮影される1枚の画像は、輝度ムラのない画像であり、例えば、薬剤Tの表面側の画像(薬剤画像)を切り出す場合に使用され、また、刻印画像が重畳される撮影画像である。 One image taken by simultaneously emitting light of each light emitting unit 16A1 to 16A4 is an image having no uneven brightness, and is used, for example, when cutting out an image (drug image) on the surface side of the drug T, and also. It is a photographed image on which the engraved image is superimposed.
 また、リフレクタを介して薬包TPを下方から照明し、カメラ12Aを用いて上方から薬包TPが撮影される画像は、複数の薬剤Tの領域を認識する場合に使用される撮影画像である。 Further, the image in which the medicine package TP is illuminated from below via the reflector and the medicine package TP is photographed from above using the camera 12A is a photographed image used when recognizing a plurality of drug T regions. ..
 次に、カメラ12Bを用いて、薬包TPを下方から撮影する。撮影の際には、照明装置16Bの各発光部16B1~16B4を順次発光させ、4枚の画像の撮影を行い、続いて、各発光部16B1~16B4を同時に発光させ、1枚の画像の撮影を行う。 Next, using the camera 12B, the medicine package TP is photographed from below. At the time of shooting, the light emitting units 16B1 to 16B4 of the lighting device 16B are sequentially emitted to emit four images, and then the light emitting units 16B1 to 16B4 are simultaneously emitted to emit one image. I do.
 4枚の撮影画像は、薬剤Tの裏面側の刻印を強調した刻印画像を生成するために使用され、各発光部16B1~16B4を同時に発光させて撮影される1枚の画像は、輝度ムラのない画像であり、例えば、薬剤Tの裏面側の薬剤画像を切り出す場合に使用され、また、刻印画像が重畳される撮影画像である。 The four captured images are used to generate an engraved image emphasizing the engraving on the back surface side of the drug T, and one image taken by simultaneously emitting light of each light emitting unit 16B1 to 16B4 has uneven brightness. It is a non-existent image, for example, a photographed image used when cutting out a drug image on the back surface side of the drug T, and on which an engraved image is superimposed.
 図10に示した撮影制御部13は、カメラ12A、12B、及び照明装置16A、16Bを制御し、1つの薬包TPに対して11回の撮影(カメラ12Aで6回、カメラ12Bで5回の撮影)を行わせる。 The imaging control unit 13 shown in FIG. 10 controls the cameras 12A and 12B and the lighting devices 16A and 16B, and photographs 11 times for one medicine package TP (6 times with the camera 12A and 5 times with the camera 12B). (Shooting).
 また、撮影は暗室の状態で行われ、撮影の際に薬包TPに照射される光は、照明装置16A、又は照明装置16Bからの照明光のみである。したがって、上記のようにして撮影される11枚の撮影画像のうち、リフレクタを介して薬包TPを下方から照明し、カメラ12Aを用いて上方から薬包TPを撮影した画像は、背景が光源の色(白色)になり、各薬剤Tの領域が遮光されて黒くなる。一方、他の10枚の撮影画像は、背景が黒く、各薬剤の領域が薬剤の色になる。 Further, the photographing is performed in a dark room, and the light emitted to the medicine package TP at the time of photographing is only the illumination light from the lighting device 16A or the lighting device 16B. Therefore, of the 11 captured images taken as described above, the background is the light source of the image in which the medicine package TP is illuminated from below via the reflector and the medicine package TP is photographed from above using the camera 12A. (White), and the area of each drug T is shielded from light and becomes black. On the other hand, in the other 10 captured images, the background is black, and the region of each drug is the color of the drug.
 尚、リフレクタを介して薬包TPを下方から照明し、カメラ12Aを用いて上方から薬包TPを撮影した画像であっても、薬剤全体が透明(半透明)な透明薬剤、あるいは一部又は全部が透明なカプセルに粉末又は顆粒状の医薬が充填されたカプセル剤(一部が透明な薬剤)の場合、薬剤の領域から光が透過するため、不透明な薬剤のように真っ黒にならない。 Even if the medicine package TP is illuminated from below via a reflector and the medicine package TP is photographed from above using the camera 12A, the whole drug is transparent (semi-transparent), or a part or a part thereof. In the case of capsules (partially transparent drugs) in which powder or granular medicine is filled in all transparent capsules, light is transmitted from the area of the drug, so that it does not turn black like an opaque drug.
 図8に戻って、学習データ作成装置1は、薬剤が撮影された撮影画像から薬剤を推論(特に撮影画像内に存在する各薬剤Tの領域を推論)する学習モデルを、機械学習させるための学習データを作成するものである。 Returning to FIG. 8, the learning data creation device 1 machine-learns a learning model that infers the drug from the captured image in which the drug is captured (particularly, infers the region of each drug T existing in the captured image). It creates learning data.
 したがって、学習データ作成装置1の画像取得部22は、撮影装置10により撮影される11枚の撮影画像のうちの、複数の薬剤Tの領域を認識する場合に使用される撮影画像(即ち、リフレクタを介して薬包TPを下方から照明し、カメラ12Aを用いて上方から薬包TPを撮影した撮影画像)を取得することが好ましい。 Therefore, the image acquisition unit 22 of the learning data creation device 1 is a captured image (that is, a reflector) used when recognizing a plurality of regions of the drug T among the 11 captured images captured by the imaging device 10. It is preferable to illuminate the medicine package TP from below via the camera 12A and acquire a photographed image of the medicine package TP taken from above using the camera 12A).
 メモリ28は、学習データを記憶する記憶部分であり、例えば、ハードディスク装置、フラッシュメモリ等の不揮発性メモリである。 The memory 28 is a storage portion for storing learning data, and is, for example, a non-volatile memory such as a hard disk device or a flash memory.
 CPU24は、RAM26を作業領域とし、ROM27又はメモリ28に記憶された学習データ作成プログラムを含む各種のプログラムを使用し、プログラムを実行することで本装置の各種の処理を実行する。 The CPU 24 uses the RAM 26 as a work area, uses various programs including a learning data creation program stored in the ROM 27 or the memory 28, and executes various processes of the present apparatus by executing the programs.
 操作部25は、キーボード、ポインティングデバイス(マウス等)を含み、ユーザの操作により各種の情報や指示を入力する部分である。 The operation unit 25 includes a keyboard and a pointing device (mouse, etc.), and is a part for inputting various information and instructions by the user's operation.
 表示部29は、操作部25での操作に必要な画面を表示し、GUI(Graphical User Interface)を実現する部分として機能し、また、撮影画像等を表示することができる。 The display unit 29 displays a screen required for operation on the operation unit 25, functions as a part that realizes a GUI (Graphical User Interface), and can display a captured image or the like.
 [学習データ作成装置の実施形態]
 図13は、本発明に係る学習データ作成装置の実施形態を示すブロック図である。
[Embodiment of learning data creation device]
FIG. 13 is a block diagram showing an embodiment of the learning data creation device according to the present invention.
 図13に示す学習データ作成装置1は、図8に示した学習データ作成装置1のハードウェア構成により実行される機能を示す機能ブロック図であり、プロセッサ2とメモリ28とを備えている。 The learning data creating device 1 shown in FIG. 13 is a functional block diagram showing a function executed by the hardware configuration of the learning data creating device 1 shown in FIG. 8, and includes a processor 2 and a memory 28.
 プロセッサ2は、図8に示した画像取得部22、CPU24、RAM26、ROM27、及びメモリ28等から構成され、以下に示す各種の処理を行う。 The processor 2 is composed of the image acquisition unit 22, the CPU 24, the RAM 26, the ROM 27, the memory 28, and the like shown in FIG. 8, and performs various processes shown below.
 プロセッサ2は、取得部20、学習用画像生成部30、正解データ生成部32、及び記憶制御部34として機能する。 The processor 2 functions as an acquisition unit 20, a learning image generation unit 30, a correct answer data generation unit 32, and a memory control unit 34.
 取得部20は、画像取得部22及び第1領域情報取得部23を備えている。 The acquisition unit 20 includes an image acquisition unit 22 and a first area information acquisition unit 23.
 画像取得部22は、前述したように撮影装置10から薬剤Tを撮影した撮影画像ITPを取得する(撮影画像の取得処理を行う)。 The image acquisition unit 22 acquires the photographed image ITP obtained by photographing the drug T from the photographing device 10 as described above (performs the acquisition process of the photographed image).
 第1領域情報取得部23は、画像取得部22が取得した撮影画像ITP内の薬剤の領域を示す情報(第1領域情報)を取得する。この第1領域情報は、撮影画像を学習モデルの機械学習用の入力画像とした場合、学習モデルが推論する推論結果に対する正解データである。尚、正解データである第1領域情報としては、撮影画像内の薬剤の領域を示す正解画像(例えば、マスク画像)、薬剤の領域を矩形で囲むバウンディングボックス情報、及び薬剤の領域のエッジを示すエッジ情報の少なくとも1つを含むことが好ましい。 The first area information acquisition unit 23 acquires information (first area information) indicating the area of the drug in the captured image ITP acquired by the image acquisition unit 22. This first area information is correct answer data for the inference result inferred by the learning model when the captured image is used as an input image for machine learning of the learning model. The first region information, which is the correct answer data, indicates a correct answer image (for example, a mask image) showing the region of the drug in the captured image, bounding box information surrounding the region of the drug with a rectangle, and an edge of the region of the drug. It is preferable to include at least one of the edge information.
 図14は、画像取得部が取得する撮影画像及び第1領域情報取得部が取得する撮影画像内の薬剤の領域を示す第1領域情報の一例を示す図である。 FIG. 14 is a diagram showing an example of the first region information showing the captured image acquired by the image acquisition unit and the region of the drug in the captured image acquired by the first region information acquisition unit.
 図14(A)に示す撮影画像ITPは、リフレクタを介して薬包TPを下方から照明し、カメラ12Aを用いて上方から薬包TP(図9参照)を撮影した画像である。この薬包TPには、6個の薬剤T1~T6が一包化されている。 The photographed image ITP shown in FIG. 14 (A) is an image in which the drug package TP is illuminated from below via a reflector and the drug package TP (see FIG. 9) is photographed from above using the camera 12A. Six drugs T1 to T6 are packaged in this drug package TP.
 図14(A)に示す薬剤T1~T3は、下方からの照明光を遮光する不透明な薬剤であるため、黒く撮影されている。薬剤T4は、透明薬剤であるため、下方からの照明光が透過して白く撮影されている。薬剤T5、T6は、同一種類のカプセル剤であり、下方からの照明光の一部が漏れるため、部分的に僅かに白く撮影されている。 The agents T1 to T3 shown in FIG. 14A are opaque agents that block the illumination light from below, and thus are photographed in black. Since the drug T4 is a transparent drug, the illumination light from below is transmitted and the image is taken in white. The agents T5 and T6 are capsules of the same type, and because part of the illumination light from below leaks, they are partially photographed in white.
 図14(B)は、撮影画像ITP内の各薬剤T1~T6の領域を示す第1領域情報であり、本例ではマスク画像IMである。 FIG. 14B is first region information showing regions of each drug T1 to T6 in the captured image ITP, and is a mask image IM in this example.
 マスク画像IMは、例えば、撮影画像ITPを表示部29に表示させ、表示部29に表示された撮影画像ITPを見ながら、ユーザがマウス等のポインティングデバイスを使用して各薬剤T1~T6の領域を画素単位で塗り潰すことで作成することができる。例えば、塗り潰した各薬剤T1~T6の領域の画素値を「1」、背景の領域の画素値を「0」とすることで、2値化したマスク画像IMを作成することができる。 In the mask image IM, for example, the captured image ITP is displayed on the display unit 29, and while viewing the captured image ITP displayed on the display unit 29, the user uses a pointing device such as a mouse to display the regions of the respective agents T1 to T6. Can be created by filling in pixel units. For example, a binarized mask image IM can be created by setting the pixel value of each of the filled agents T1 to T6 to "1" and the pixel value of the background area to "0".
 尚、カプセル状の薬剤T5、T6は、同一種類であるが、インスタンス分離のために、両者の薬剤T5、T6の領域の画素値を異ならせてことが好ましい。例えば、薬剤T5の領域の画素値を「1」、薬剤T6の領域の画素値を「0.5」とすることができる。 Although the capsule-shaped agents T5 and T6 are of the same type, it is preferable that the pixel values of the regions of the two agents T5 and T6 are different for instance separation. For example, the pixel value in the region of the drug T5 can be set to "1", and the pixel value in the region of the drug T6 can be set to "0.5".
 上記の例では、第1領域情報であるマスク画像IMは、撮影画像ITP内の各薬剤T1~T6の領域をユーザがポインティングデバイスを使用して手動で設定することで生成される領域情報であるが、これに限らず、撮影画像内の薬剤の領域を画像処理により自動で抽出して生成したものでもよいし、撮影画像内の薬剤の領域を画像処理により自動で抽出し、かつ手動で調整することで生成されたものでもよい。 In the above example, the mask image IM, which is the first area information, is the area information generated when the user manually sets the areas of the respective agents T1 to T6 in the captured image ITP using the pointing device. However, the present invention is not limited to this, and the drug region in the captured image may be automatically extracted and generated by image processing, or the drug region in the captured image may be automatically extracted by image processing and manually adjusted. It may be generated by doing.
 図13に戻って、学習用画像生成部30は、画像取得部22から薬剤を撮影した撮影画像ITPを入力し、入力した撮影画像ITPから薬剤を移動させて学習用画像(I,I,I,…)を生成する。即ち、学習用画像生成部30は、撮影画像ITPに基づいて複数の学習用画像(I,I,I,…)を生成する学習用画像生成処理を行う。 Returning to FIG. 13, the learning image generating unit 30 receives the captured image ITP taken the drug from the image acquiring unit 22, to move the drug from the photographed image ITP input learning image (I A, I B , I C, ...) to generate. That is, the learning image generator 30, a plurality of learning images (I A, I B, I C, ...) the learning image generation processing for generating performed based on the captured image ITP.
 撮影画像ITPに撮影されている薬剤の移動は、ユーザがポインディンデバイスにより薬剤画像の位置や回転を指示して行うようにしてもよいし、図5を用いて説明したように撮影画像の反転や加算等により行うようにしてもよい。また、乱数を使用してランダムに薬剤画像の位置や回転を決定して、薬剤を移動させてもよい。この場合、薬剤画像が重ならないようにする必要がある。 The movement of the drug captured in the captured image ITP may be performed by the user instructing the position and rotation of the drug image by the pointing device, or as described with reference to FIG. 5, the captured image is inverted. Or by addition or the like. Further, the drug may be moved by randomly determining the position and rotation of the drug image using a random number. In this case, it is necessary to prevent the drug images from overlapping.
 正解データ生成部32は、第1領域情報取得部23から第1領域情報であるマスク画像IMを入力し、入力したマスク画像IMから複数の学習用画像(I,I,I,…)に対応する複数の正解データ(I,I,I,…)を生成する。即ち、正解データ生成部32は、マスク画像IMに基づいて複数の学習用画像(I,I,I,…)における薬剤の領域を示す第2領域情報を生成し、生成した第2領域情報を複数の学習用画像(I,I,I,…)にそれぞれ対する複数の正解データ(I,I,I,…)とする正解データ生成処理を行う。 Correct answer data generator 32, the first area information acquisition unit 23 inputs the mask image IM which is the first area information inputted mask image IM from a plurality of learning images (I A, I B, I C, ... ), A plurality of correct answer data (I a , I b , I c , ...) Are generated. That is, correct answer data generating unit 32, a plurality of learning images on the basis of the mask image IM (I A, I B, I C, ...) to generate a second area information indicating the area of the drug in the generated second performing region information several learning images (I a, I B, I C, ...) a plurality of answer data against each (I a, I b, I c, ...) the correct answer data generation processing to be.
 尚、複数の学習用画像(I,I,I,…)、及び複数の正解データ(I,I,I,…)の生成は、シミュレーションにより学習データを作成する第1実施形態、及び第2実施形態で説明したように、薬剤を撮影した撮影画像と、その撮影画像内の薬剤の領域を示す第1領域情報(例えば、マスク画像)とを使用し、撮影画像及びマスク画像をそれぞれ同期して反転、平行移動、回転、又は拡縮等を行い、あるいは撮影画像及びマスク画像から切り出した薬剤画像及び薬剤マスク画像を平行移動、回転、又は拡縮して貼り付けることで生成することができる。 Incidentally, a plurality of learning images (I A, I B, I C, ...), and a plurality of answer data (I a, I b, I c, ...) generated in the first to create a training data by simulation As described in the embodiment and the second embodiment, the photographed image obtained by photographing the drug and the first area information (for example, a mask image) indicating the area of the drug in the photographed image are used to obtain the photographed image and the photographed image. Generated by inverting, paralleling, rotating, or scaling the mask images in synchronization with each other, or by parallel-moving, rotating, scaling, or pasting the photographed image and the drug image and the drug mask image cut out from the mask image. can do.
 記憶制御部34は、学習用画像生成部30により生成される学習用画像(I,I,I,…)と、正解データ生成部32により生成される正解データ(I,I,I,…)とを入力し、それぞれ対応するペア(学習用画像Iと正解データI,学習用画像Iと正解データI,学習用画像Iと正解データI,…、)からなる学習データをメモリ28に記憶させる。 The storage control unit 34, the learning image generated by the learning image generating section 30 (I A, I B, I C, ...) and correct answer data generated by the correct answer data generating unit 32 (I a, I b , I c, ...) and enter the correct answer and the corresponding pairs (learning image I a data I a, learning image I B and solution data I b, learning image I C and solution data I c, ... ,) Is stored in the memory 28.
 これにより、メモリ28には、多くの学習データが記憶、蓄積される。尚、図8には図示されていないが、学習用画像生成部30及び正解データ生成部32にそれぞれ入力される撮影画像ITPとマスク画像IMのペアも学習データとしてメモリ28に記憶させることが好ましい。 As a result, a lot of learning data is stored and accumulated in the memory 28. Although not shown in FIG. 8, it is preferable that the pair of the photographed image ITP and the mask image IM input to the learning image generation unit 30 and the correct answer data generation unit 32 are also stored in the memory 28 as learning data. ..
 図15は、図14に示した撮影画像及びマスク画像から生成した学習データの一例を示す図である。 FIG. 15 is a diagram showing an example of learning data generated from the photographed image and the mask image shown in FIG.
 図15(A)は、学習用画像Iと正解データ(マスク画像)Iとのペアからなる学習データを示し、図15(B)は、学習用画像Iとマスク画像Iとのペアからなる学習データを示す。 FIG. 15 (A) shows the training data consisting of pairs of learning images I A and solution data (mask image) I a, FIG. 15 (B) is the learning image I B and the mask image I b The training data consisting of pairs is shown.
 図15(A)に示す学習用画像Iでは、カプセル状の薬剤T5,T6が線で接触し、薬剤T2,T3,T4が互いに点で接触している。この学習用画像Iに対応するマスク画像Iは、同一の薬剤である薬剤T5,T6の領域の画素値を異ならせることで、薬剤T5,T6の領域のインスタンス分離を可能にし、かつ線で接している薬剤T5,T6の境界も区別可能にしている。 In the learning image I A shown in FIG. 15 (A), capsular drug T5, T6 is in contact with the line, drug T2, T3, T4 are in contact at a point with each other. Mask image I a corresponding to the learning image I A, by varying the pixel values of the drug T5, T6 region of the same drug, allows for instance the separation area of the drug T5, T6, and the line The boundaries of the drugs T5 and T6 that are in contact with each other are also distinguishable.
 また、マスク画像Iは、薬剤T2,T3,T4の互いに点で接触している箇所を、背景の画素値と同一とすることで、各薬剤T2,T3,T4が互いに接触しないようにし、各薬剤T2,T3,T4の領域が明確になるようにしている。 The mask image I a is a portion in contact with each other point of the drug T2, T3, T4, by the same as the pixel value of the background, as each drug T2, T3, T4 is not in contact with each other, The regions of each drug T2, T3, and T4 are clarified.
 また、図15(B)に示す学習用画像Iでは、カプセル状の薬剤T5,T6が線で接触し、薬剤T6と薬剤T3が点で接触している。この学習用画像Iに対応するマスク画像Iは、同一の薬剤である薬剤T5,T6の領域の画素値を異ならせる(例えば、薬剤T6の領域の画素値を「0.5」とする)ことで、薬剤T5,T6の領域のインスタンス分離を可能にし、かつ線で接している薬剤T5,T6の境界、及び点で接している薬剤T6と薬剤T3の境界を区別可能にしている。 Further, in the learning image I B shown in FIG. 15 (B), a capsule form medicament T5, T6 is in contact with the line, drug T6 and drug T3 are in contact at a point. Mask image I b corresponding to the learning image I B is varied pixel values of the drug T5, T6 of the region which is the same drug (e.g., a pixel value of the region of the drug T6 to "0.5" ), It is possible to separate the instances of the regions of the drugs T5 and T6, and to distinguish the boundary between the drugs T5 and T6 which are in contact with each other by a line and the boundary between the drug T6 and the drug T3 which are in contact with each other by a point.
 図15に示した学習データは一例であり、各薬剤T1~T6を示す薬剤画像をそれぞれ平行移動及び回転等を組み合わせて配置し、各薬剤T1~T6の領域を示す薬剤マスク画像を同様に配置することで、多くの学習データを作成することができる。 The learning data shown in FIG. 15 is an example, and drug images showing each drug T1 to T6 are arranged in combination of translation, rotation, etc., and drug mask images showing regions of each drug T1 to T6 are arranged in the same manner. By doing so, a lot of learning data can be created.
 この場合、複数の薬剤画像の一部又は全部が点又は線で接触するように配置して学習データを生成することが好ましい。このような学習データにより機械学習された学習済み学習モデルが、点又は線で接触する薬剤を撮影した撮影画像を入力画像とする場合に、各薬剤の領域を正しく推論するためである。 In this case, it is preferable to generate learning data by arranging a part or all of the plurality of drug images so as to contact each other with points or lines. This is because the trained learning model machine-learned by such training data correctly infers the region of each drug when the photographed image obtained by photographing the drug in contact with a point or a line is used as an input image.
 また、図14(A)に示した撮影画像ITPのように透明な薬剤T4が撮影されている場合、下方からの照明光が透過して白く撮影されるが、薬剤T4の位置や角度により照明光の透過状況が変化する。即ち、透明な薬剤T4の薬剤画像は、撮影領域における透明な薬剤T4の位置や角度により輝度分布等が異なる画像になる。 Further, when a transparent drug T4 is photographed as in the photographed image ITP shown in FIG. 14 (A), the illumination light from below is transmitted and the image is photographed white, but it is illuminated depending on the position and angle of the drug T4. The light transmission status changes. That is, the drug image of the transparent drug T4 is an image in which the brightness distribution and the like differ depending on the position and angle of the transparent drug T4 in the imaging region.
 したがって、透明な薬剤を含む複数の薬剤を撮影した撮影画像から、薬剤画像を移動させて学習用画像を生成する場合、透明な薬剤の画像は移動させずに透明薬剤の画像以外の薬剤の画像を移動させて学習用画像を生成することが好ましい。 Therefore, when the drug image is moved to generate a learning image from the captured images of a plurality of drugs including the transparent drug, the transparent drug image is not moved and the image of the drug other than the transparent drug image is generated. It is preferable to move the image to generate a learning image.
 また、本例では、正解データとしてマスク画像を生成するようにしたが、薬剤画像の領域のエッジを示す薬剤画像毎のエッジ情報(エッジ画像)とすることができる。また、薬剤同士が点又は線で接触している場合には、点又は線で接触する箇所を背景色で置換し、薬剤毎のエッジ画像を離間させることが好ましい。 Further, in this example, the mask image is generated as the correct answer data, but it can be used as the edge information (edge image) for each drug image indicating the edge of the area of the drug image. When the drugs are in contact with each other by dots or lines, it is preferable to replace the points of contact with the dots or lines with a background color and separate the edge images for each drug.
 更に、薬剤同士が点又は線で接触する場合には、点又は線で接触する箇所のみを示すエッジ画像を、正解データとして生成してもよい。 Furthermore, when the drugs come into contact with each other by points or lines, an edge image showing only the points or lines of contact may be generated as correct answer data.
 図16は、複数の薬剤の点又は線で接触する箇所のみを示すエッジ画像の一例を示す図である。 FIG. 16 is a diagram showing an example of an edge image showing only a portion of contact with a plurality of drug points or lines.
 図16に示すエッジ画像IEは、複数の薬剤T1~T6のうちの2以上の薬剤が点又は線で接触する箇所E1、E2のみを示す画像であり、図16上で、実線で示した画像である。尚、図16上で、点線で示した領域は、複数の薬剤T1~T6が存在する領域を示す。 The edge image IE shown in FIG. 16 is an image showing only the locations E1 and E2 in which two or more drugs out of the plurality of drugs T1 to T6 are in contact with each other by a point or a line, and is an image shown by a solid line on FIG. Is. The region shown by the dotted line on FIG. 16 indicates a region in which a plurality of agents T1 to T6 are present.
 線で接触する箇所E1のエッジ画像は、カプセル状の薬剤T5とT6とが線で接触している箇所の画像であり、点で接触する箇所E2のエッジ画像は、3つの薬剤T2~T4が互いに点で接触している箇所の画像である。 The edge image of the portion E1 in contact with the line is an image of the portion where the capsule-shaped agents T5 and T6 are in contact with the line, and the edge image of the portion E2 in contact with the point is an image of the three agents T2 to T4. It is an image of a place where they are in contact with each other at a point.
 学習用画像における各薬剤画像の配置は既知であるため、複数の薬剤のうちの2以上の薬剤が点又は線で接触する箇所も既知である。したがって、図13に示した正解データ生成部32は、学習用画像生成部30により生成される学習用画像に対して、点又は線で接触する箇所のみを示すエッジ画像(正解データ)を自動的に作成することができる。 Since the arrangement of each drug image in the learning image is known, the location where two or more drugs out of the plurality of drugs contact with each other by a point or a line is also known. Therefore, the correct answer data generation unit 32 shown in FIG. 13 automatically generates an edge image (correct answer data) showing only the points or lines that come into contact with the learning image generated by the learning image generation unit 30. Can be created in.
 図16に示したエッジ画像IEは、図15(A)に示した学習用画像Iに対応する正解データとすることができる。即ち、図15(A)に示した学習用画像Iと図16に示したエッジ画像IEとのペアからなる学習データとするができる。 Edge image IE shown in FIG. 16 may be a correct answer data corresponding to the learning image I A shown in FIG. 15 (A). That is, it is the learning data consisting of pairs of edge image IE shown in learning images I A and 16 shown in FIG. 15 (A).
 このような学習データは、点又は線で接触する薬剤を撮影した薬剤画像を入力画像とし、その点又は線で接触する箇所のみのエッジ画像を推論結果として出力する学習モデルを機械学習させる場合に使用することができる。 Such learning data is used when a learning model is machine-learned in which a drug image obtained by photographing a drug in contact with a point or a line is used as an input image and an edge image of only the part in contact with the point or a line is output as an inference result. Can be used.
 また、点又は線で接触する箇所のみのエッジ画像(推論結果)は、例えば、点又は線で接触する複数の薬剤を撮影した薬剤画像と、その点又は線で接触する箇所のみのエッジ画像とを入力画像(マルチチャンネルの入力画像)とし、複数の薬剤の領域を推論する学習モデルに使用することができる。この学習モデルによれば、入力画像に加えて、点又は線で接触する箇所の情報を入力するため、各薬剤の領域をより正確に推論することができる。 Further, the edge image (inference result) of only the points or lines that come into contact with each other is, for example, a drug image obtained by photographing a plurality of drugs that come into contact with points or lines, and an edge image of only the points or lines that come into contact with each other. Can be used as an input image (multi-channel input image) as a learning model for inferring multiple drug regions. According to this learning model, in addition to the input image, information on the points or lines of contact is input, so that the region of each drug can be inferred more accurately.
 図17は、複数の透明薬剤及び複数の不透明な薬剤を含む撮影画像の一例を示す図である。 FIG. 17 is a diagram showing an example of a photographed image containing a plurality of transparent agents and a plurality of opaque agents.
 図17(A)に示す撮影画像は、複数の透明薬剤及び複数の不透明な薬剤が分包された薬包を、図12に示すように上方の照明装置16A(発光部16A1~16A4)から照明し、上方のカメラ12Aを用いて上方から薬包を撮影した画像である。 In the photographed image shown in FIG. 17A, a medicine package in which a plurality of transparent agents and a plurality of opaque agents are packaged is illuminated from the upper lighting device 16A (light emitting units 16A1 to 16A4) as shown in FIG. However, it is an image of the medicine package taken from above using the upper camera 12A.
 図17(B)に示す撮影画像は、同じ薬包を、下方の照明装置16B(発光部16B1~16B4)からリフレクタを介して照明し、カメラ12Aを用いて上方から薬包を撮影した画像である。 The photographed image shown in FIG. 17B is an image obtained by illuminating the same medicine package from the lower lighting device 16B (light emitting units 16B1 to 16B4) via a reflector and taking the medicine package from above using the camera 12A. be.
 図17(B)に示す薬剤画像は、不透明な薬剤の場合、いずれもシルエットのみの情報(図17上で黒く撮影される画像)となり、薬剤画像のエッジ情報を取得するのに好適である。 In the case of an opaque drug, the drug image shown in FIG. 17B is only silhouette information (the image taken in black on FIG. 17), which is suitable for acquiring edge information of the drug image.
 図18は、透明薬剤のレンズ効果を説明するために用いた図であり、図17(B)に示した薬剤画像と同様の方法で撮影された画像である。 FIG. 18 is a diagram used to explain the lens effect of the transparent drug, and is an image taken by the same method as the drug image shown in FIG. 17 (B).
 図18に示すように、不透明な薬剤は、いずれもシルエットのみの情報を有する黒く撮影される画像となる。そのため、不透明な薬剤の薬剤画像のエッジ情報は、不透明な薬剤の位置と照明装置16B(発光部16B1~16B4)の位置との相対位置に影響を受けず、一律のエッジ情報を有することになる。したがって、撮影画像から不透明な薬剤の薬剤画像をそれぞれ切り出し、切り出した各薬剤画像の平行移動及び回転等の操作を行うことで、その位置等に配置された薬剤を実際に撮影した撮影画像と同等の撮影画像を生成することができる。また、不透明な薬剤の薬剤画像のエッジ情報は多くとることができるため、不透明の薬剤の薬剤画像については、表面のテクスチャ情報を落とすことで、撮影に必要な画素数を大きく減らすことができる。 As shown in FIG. 18, all the opaque agents are images taken in black with only silhouette information. Therefore, the edge information of the drug image of the opaque drug is not affected by the relative position between the position of the opaque drug and the position of the lighting device 16B (light emitting units 16B1 to 16B4), and has uniform edge information. .. Therefore, by cutting out the drug image of the opaque drug from the photographed image and performing operations such as translation and rotation of each of the extracted drug images, it is equivalent to the photographed image in which the drug placed at that position or the like is actually photographed. It is possible to generate a captured image of. Further, since a large amount of edge information of the drug image of the opaque drug can be obtained, the number of pixels required for photographing can be greatly reduced by removing the texture information on the surface of the drug image of the opaque drug.
 一方、図18において、バウンディングボックスにより囲まれた透明薬剤の場合、不透明な薬剤と同様に移動させることはできない。透過光リッチな撮影環境下で撮影される透明薬剤の画像は、透明薬剤自体のレンズ効果により、照明装置16Bと透明薬剤との相対位置関係によって、エッジ情報が大きく変化するからである。 On the other hand, in FIG. 18, in the case of the transparent drug surrounded by the bounding box, it cannot be moved in the same manner as the opaque drug. This is because the edge information of the image of the transparent agent taken in an imaging environment rich in transmitted light changes greatly depending on the relative positional relationship between the lighting device 16B and the transparent agent due to the lens effect of the transparent agent itself.
 図18に示す4個の透明薬剤は、薬剤を透過する透過光が占める割合が大きい薬剤(例えば、透過部の信号:非透過部の信号=5:1以上の薬剤)であり、カプセル形状を有する同じカプセル剤である。 The four transparent agents shown in FIG. 18 are agents in which the transmitted light transmitted through the agent occupies a large proportion (for example, the signal in the transmitting portion: the signal in the non-transmissive portion = the agent having 5: 1 or more) and have a capsule shape. The same capsule that has.
 図18上で、横長に配置された透明薬剤と縦長に配置された透明薬剤とは、エッジ情報が大きく変化し、また、同じ向きに配置された透明薬剤であっても、透明薬剤が配置される位置によってエッジ情報が異なる。 In FIG. 18, the transparent agent arranged horizontally and the transparent agent arranged vertically have significantly different edge information, and even if the transparent agent is arranged in the same direction, the transparent agent is arranged. Edge information differs depending on the position.
 図19は、透明薬剤の移動の制限を説明するために用いた図である。 FIG. 19 is a diagram used to explain the limitation of movement of the transparent drug.
 いま、図19(B)に示す撮影画像に含まれる透明薬剤の画像は、照明の位置と透明薬剤の位置及び向きとによって、エッジ情報が異なるため、撮影画像から切り出した透明薬剤の画像を、制約を設けずに移動させて新たな学習用画像を生成すると、実際とは異なる状況の学習用画像が生成されてしまう。 Since the edge information of the transparent agent image included in the photographed image shown in FIG. 19B differs depending on the position of the illumination and the position and orientation of the transparent agent, the image of the transparent agent cut out from the photographed image is displayed. If a new learning image is generated by moving the image without any restrictions, a learning image in a situation different from the actual one is generated.
 そこで、透明薬剤を含む学習用画像の生成については、切り出し前の透明薬剤の情報を参照する。例えば、本実施形態については、切り出した透明薬剤の画像を貼り付ける場合、切り出し前の透明薬剤の位置、及び向きの事前情報を活用し、例えば、切り出し前の位置、及び向きのものを貼り付ける。 Therefore, for the generation of learning images containing transparent agents, refer to the information on transparent agents before cutting out. For example, in the present embodiment, when pasting the image of the transparent drug cut out, the position and orientation of the transparent drug before cutting out are utilized, and for example, the position and orientation before cutting out are pasted. ..
 図19において、図19(B)の撮影画像から切り出した透明薬剤の画像は、新たな学習用画像の生成において、図19(A)の位置(a)に貼り付けることは可能であっても、位置(b)に貼り付けることは不可とする。学習用画像は、全てのバリエーション(位置、向き等)を網羅することが望ましいが、現実的には不可能である。よって、貼り付ける透明薬剤を移動させる場合、切り出し前の透明薬剤の位置、向きからの平行移動量及び/又は回転量をそれぞれの閾値以内に制限する。 In FIG. 19, the image of the transparent agent cut out from the captured image of FIG. 19 (B) can be pasted at the position (a) of FIG. 19 (A) in the generation of a new learning image. , It is not possible to paste at position (b). It is desirable that the learning image covers all variations (position, orientation, etc.), but it is practically impossible. Therefore, when moving the transparent agent to be attached, the amount of translation and / or the amount of rotation from the position and orientation of the transparent agent before cutting is limited within the respective threshold values.
 例えば、平行移動量の閾値はnピクセル、回転量の閾値はm度として設定し、これらの閾値を超える透明薬剤の平行移動、及び/又は回転移動を制限する。 For example, the threshold of the amount of parallel movement is set to n pixels, the threshold of the amount of rotation is set to m degrees, and the parallel movement and / or rotational movement of the transparent drug exceeding these thresholds is restricted.
 尚、透明薬剤を平行移動、及び/又は回転移動させる場合、撮影環境(照明の位置、カメラの位置、撮影画角等)、及び透明薬剤の形状、大きさ等に応じて透明薬剤のエッジ情報の変化の態様が異なる。そこで、平行移動量の閾値、及び回転量の閾値は、透明薬剤のエッジ情報が殆ど変化しないと見なせる範囲の平行移動量、及び/又は回転移動量をシミュレーションにより求めて設定することが好ましい。 When the transparent agent is translated and / or rotated, the edge information of the transparent agent is determined according to the shooting environment (illumination position, camera position, shooting angle of view, etc.) and the shape and size of the transparent agent. The mode of change is different. Therefore, it is preferable that the threshold value of the translation amount and the threshold value of the rotation amount are set by simulating the translation amount and / or the rotation movement amount in a range in which the edge information of the transparent agent can be regarded as hardly changing.
 [機械学習装置]
 図20は、本発明に係る機械学習装置の実施形態を示すブロック図である。
[Machine learning device]
FIG. 20 is a block diagram showing an embodiment of the machine learning device according to the present invention.
 図20に示す機械学習装置50は、学習モデル(学習モデルの一つである畳み込みニューラルネットワーク(CNN:Convolution Neural Network))52と、損失値算出部54と、パラメータ制御部56とから構成される。 The machine learning device 50 shown in FIG. 20 is composed of a learning model (a convolutional neural network (CNN) which is one of the learning models) 52, a loss value calculation unit 54, and a parameter control unit 56. ..
 この機械学習装置50は、図13に示した学習データ作成装置1により作成され、メモリ28に記憶された学習データを使用し、CNN52を機械学習させる。 This machine learning device 50 is created by the learning data creating device 1 shown in FIG. 13, and uses the learning data stored in the memory 28 to machine-learn the CNN 52.
 CNN52は、薬剤を撮影した撮影画像を入力画像とするとき、その入力画像に写っている薬剤の領域を推論する部分であり、複数のレイヤ構造を有し、複数の重みパラメータを保持している。重みパラメータは、畳み込み層での畳み込み演算に使用されるカーネルと呼ばれるフィルタのフィルタ係数などである。 The CNN 52 is a part that infers the region of the drug shown in the input image when the captured image obtained by photographing the drug is used as the input image, has a plurality of layer structures, and holds a plurality of weight parameters. .. Weight parameters include the filter coefficients of a filter called the kernel used for convolution operations in the convolution layer.
 CNN52は、重みパラメータが初期値から最適値に更新されることで、未学習の学習モデルから学習済みの学習モデルに変化しうる。 CNN52 can change from an unlearned learning model to a learned learning model by updating the weight parameter from the initial value to the optimum value.
 このCNN52は、入力層52Aと、畳み込み層とプーリング層から構成された複数セットを有する中間層52Bと、出力層52Cとを備え、各層は複数の「ノード」が「エッジ」で結ばれる構造となっている。 The CNN 52 includes an input layer 52A, an intermediate layer 52B having a plurality of sets composed of a convolution layer and a pooling layer, and an output layer 52C, and each layer has a structure in which a plurality of "nodes" are connected by "edges". It has become.
 入力層52Aには、学習対象である学習用画像が入力画像として入力される。学習用画像は、メモリ28に記憶されている学習データ(学習用画像と正解データとのペアからなる学習データ)における学習用画像である。 A learning image to be learned is input to the input layer 52A as an input image. The learning image is a learning image in the learning data (learning data consisting of a pair of the learning image and the correct answer data) stored in the memory 28.
 中間層52Bは、畳み込み層とプーリング層とを1セットとする複数セットを有し、入力層52Aから入力した画像から特徴を抽出する部分である。畳み込み層は、前の層で近くにあるノードにフィルタ処理し(フィルタを使用した畳み込み演算を行い)、「特徴マップ」を取得する。プーリング層は、畳み込み層から出力された特徴マップを縮小して新たな特徴マップとする。「畳み込み層」は、画像からのエッジ抽出等の特徴抽出の役割を担い、「プーリング層」は抽出された特徴が、平行移動などによる影響を受けないようにロバスト性を与える役割を担う。 The intermediate layer 52B has a plurality of sets including a convolution layer and a pooling layer as one set, and is a portion for extracting features from an image input from the input layer 52A. The convolution layer filters nearby nodes in the previous layer (performs a convolution operation using the filter) and acquires a "feature map". The pooling layer reduces the feature map output from the convolution layer to a new feature map. The "convolution layer" plays a role of feature extraction such as edge extraction from an image, and the "pooling layer" plays a role of imparting robustness so that the extracted features are not affected by translation or the like.
 尚、中間層52Bには、畳み込み層とプーリング層とを1セットとする場合に限らず、畳み込み層が連続する場合や活性化関数による活性化プロセス、正規化層も含まれ得る。 The intermediate layer 52B is not limited to the case where the convolution layer and the pooling layer are set as one set, but may also include the case where the convolution layers are continuous, the activation process by the activation function, and the normalization layer.
 出力層52Cは、中間層52Bにより抽出された特徴を示す特徴マップを出力する部分である。また、出力層52Cは、学習済みCNN52では、例えば、入力画像に写っている薬剤領域等をピクセル単位、もしくはいくつかのピクセルを一塊にした単位で領域分類(セグメンテーション)した推論結果を出力する。 The output layer 52C is a part that outputs a feature map showing the features extracted by the intermediate layer 52B. Further, in the trained CNN 52, the output layer 52C outputs, for example, an inference result in which the drug region or the like shown in the input image is region-classified (segmented) in pixel units or in units of several pixels as a group.
 学習前のCNN52の各畳み込み層に適用されるフィルタの係数やオフセット値は、任意の初期値がセットされる。 Arbitrary initial values are set for the coefficient and offset value of the filter applied to each convolution layer of CNN52 before learning.
 学習制御部として機能する損失値算出部54及びパラメータ制御部56のうちの損失値算出部54は、CNN52の出力層52Cから出力される特徴マップと、入力画像(学習用画像)に対する正解データであるマスク画像(メモリ28から学習用画像に対応して読み出されるマスク画像)とを比較し、両者間の誤差(損失関数の値である損失値)を計算する。損失値の計算方法は、例えばソフトマックスクロスエントロピー、シグモイドなどが考えられる。 The loss value calculation unit 54 of the loss value calculation unit 54 and the parameter control unit 56 that function as the learning control unit is a feature map output from the output layer 52C of the CNN 52 and correct answer data for the input image (learning image). A certain mask image (a mask image read from the memory 28 corresponding to the learning image) is compared, and an error (loss value which is a value of the loss function) between the two is calculated. As a method of calculating the loss value, for example, softmax cross entropy, sigmoid, etc. can be considered.
 パラメータ制御部56は、損失値算出部54により算出された損失値を元に、誤差逆伝播法によりCNN52の重みパラメータを調整する。誤差逆伝播法では、誤差を最終レイヤから順に逆伝播させ、各レイヤにおいて確率的勾配降下法を行い、誤差が収束するまでパラメータの更新を繰り返す。 The parameter control unit 56 adjusts the weight parameter of the CNN 52 by the back-propagation method based on the loss value calculated by the loss value calculation unit 54. In the error back-propagation method, the error is back-propagated in order from the final layer, the stochastic gradient descent method is performed in each layer, and the parameter update is repeated until the error converges.
 この重みパラメータの調整処理を繰り返し行い、CNN52の出力と正解データであるマスク画像との差が小さくなるまで繰り返し学習を行う。 This weight parameter adjustment process is repeated, and learning is repeated until the difference between the output of CNN52 and the mask image which is the correct answer data becomes small.
 機械学習装置50は、メモリ28に記憶された学習データを使用した機械学習を繰り返すことで、CNN52が学習済みモデルとなる。学習済みのCNN52は、未知の入力画像(薬剤を撮影した撮影画像)を入力すると、撮影画像内の薬剤の領域を示すマスク画像等の推論結果を出力する。 The machine learning device 50 repeats machine learning using the learning data stored in the memory 28, so that the CNN 52 becomes a trained model. When the trained CNN 52 inputs an unknown input image (captured image obtained by photographing the drug), the trained CNN 52 outputs an inference result such as a mask image showing a region of the drug in the captured image.
 尚、CNN52としては、R-CNN(Regions with Convolutional Neural Networks)を適用することができる。R-CNNでは、撮影画像ITP内において、大きさを変えたバウンディングボックスをスライドさせ、薬剤が入るバウンディングボックスの領域を検出する。そして、バウンディングボックスの中の画像部分だけを評価(CNN特徴量を抽出)することで、薬剤のエッジを検出する。また、R-CNNに代えて、Fast R-CNN、Faster R-CNN、Mask R-CNN、SVM(Support vector machine)等を使用することができる。 As CNN52, R-CNN (Regions with Convolutional Neural Networks) can be applied. In R-CNN, the bounding box of different sizes is slid in the captured image ITP to detect the area of the bounding box in which the drug enters. Then, the edge of the drug is detected by evaluating only the image portion in the bounding box (extracting the CNN feature amount). Further, instead of R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN, SVM (Support vector machine) and the like can be used.
 このようにして構成される学習済みモデルの推論結果は、例えば、複数の薬剤が一包化された薬包を撮影した撮影画像から、各薬剤の画像を切り出す場合に使用することができる。尚、切り出された各薬剤の画像は、薬包に入っている各薬剤の監査・鑑別を行う場合に使用される。 The inference result of the trained model constructed in this way can be used, for example, when an image of each drug is cut out from a photographed image of a drug package in which a plurality of drugs are packaged. The image of each drug cut out is used when auditing / distinguishing each drug contained in the drug package.
 ところで、メモリ28には、前述したように薬剤を撮影した撮影画像とその撮影画像内の薬剤の領域を示す正解データとに基づいて、シミュレーションにより作成した多くの学習データが記憶されるが、撮影画像は、自薬局が取り扱っている薬剤を撮影した画像であることが好ましい。自薬局が取り扱っている薬剤を撮影した撮影画像を使用して学習データを作成し、その学習データを使用して学習モデルを構成することで、自薬局で取り扱っている薬剤の監査・鑑別を行う場合に、その学習モデルを有効に使用できるからである。 By the way, as described above, the memory 28 stores a lot of learning data created by simulation based on the photographed image obtained by photographing the drug and the correct answer data indicating the region of the drug in the photographed image. The image is preferably an image of a drug handled by the own pharmacy. By creating learning data using captured images of drugs handled by the own pharmacy and constructing a learning model using the learning data, auditing and discrimination of the drugs handled by the own pharmacy is performed. This is because the learning model can be used effectively in some cases.
 [学習データ作成方法]
 図21は、本発明に係る学習データ作成方法の実施形態を示すフローチャートである。
[Learning data creation method]
FIG. 21 is a flowchart showing an embodiment of the learning data creation method according to the present invention.
 図21に示す各ステップの処理は、例えば、図13に示した学習データ作成装置1のプロセッサ2により行われる。 The processing of each step shown in FIG. 21 is performed by, for example, the processor 2 of the learning data creation device 1 shown in FIG.
 図21において、画像取得部22は、撮影装置10から薬剤を撮影した撮影画像ITP(例えば、図14(A)に示す撮影画像ITP)を取得する(ステップS10)。尚、図14(A)に示した撮影画像ITPは、リフレクタを介して薬包を下から照明し、薬包の上方から薬包を撮影した画像であるが、薬剤を撮影した撮影画像は上記のようにして撮影したものに限らない。また、撮影される薬剤は、薬包に入っていないものでもよいし、薬剤の個数は1個でもよい。 In FIG. 21, the image acquisition unit 22 acquires a photographed image ITP (for example, the photographed image ITP shown in FIG. 14A) obtained by photographing the drug from the photographing apparatus 10 (step S10). The photographed image ITP shown in FIG. 14 (A) is an image in which the drug package is illuminated from below via a reflector and the drug package is photographed from above the drug package. It is not limited to those taken in this way. Further, the drug to be photographed may not be contained in the drug package, or the number of drugs may be one.
 また、第1領域情報取得部23は、画像取得部22が取得する撮影画像内の薬剤の領域を示す第1領域情報としてマスク画像IM(例えば、図14(B)に示すマスク画像IM)を取得する(ステップS12)。尚、マスク画像IMは、撮影画像ITPに基づいて手動又は自動で生成され、メモリ28等に記憶されたものである。 Further, the first region information acquisition unit 23 uses a mask image IM (for example, the mask image IM shown in FIG. 14B) as the first region information indicating the region of the drug in the captured image acquired by the image acquisition unit 22. Acquire (step S12). The mask image IM is manually or automatically generated based on the captured image ITP and stored in the memory 28 or the like.
 続いて、学習用画像生成部30は、ステップS10で取得する撮影画像ITPから薬剤T1~T6を移動させて学習用画像を生成する(ステップS14)。学習用画像の生成は、各薬剤を示す薬剤画像を平行移動、反転、回転、又は拡縮させる画像処理により行うことができる。尚、移動させる薬剤が透明薬剤の場合、移動させず、又は閾値以内に移動を制限する。 Subsequently, the learning image generation unit 30 moves the agents T1 to T6 from the captured image ITP acquired in step S10 to generate a learning image (step S14). The learning image can be generated by image processing in which a drug image showing each drug is translated, inverted, rotated, or scaled. If the drug to be moved is a transparent drug, it is not moved or the movement is restricted within the threshold value.
 また、正解データ生成部32は、ステップS12で取得したマスク画像IMに基づいて、ステップS14で生成された学習用画像に対応する正解データ(マスク画像)を生成する(ステップS16)。即ち、ステップS16では、マスク画像IMにおける各薬剤の領域を学習用画像における各薬剤と同様に配置し、その配置した各薬剤の領域を示す第2領域情報を生成し、生成した第2領域情報を学習用画像に対する正解データ(マスク画像)とする画像処理を行う。 Further, the correct answer data generation unit 32 generates correct answer data (mask image) corresponding to the learning image generated in step S14 based on the mask image IM acquired in step S12 (step S16). That is, in step S16, the region of each drug in the mask image IM is arranged in the same manner as each drug in the learning image, the second region information indicating the region of each arranged drug is generated, and the generated second region information is generated. Is used as the correct answer data (mask image) for the learning image.
 記憶制御部34は、ステップS14で生成した学習用画像とステップS16で生成したマスク画像とのペアを学習データとしてメモリ28に記憶させる(ステップS18)。図15(A)及び図15(B)は、上記のようにして生成され、メモリ28に記憶される学習用画像とマスク画像のペアからなる学習データの一例を示す。 The storage control unit 34 stores the pair of the learning image generated in step S14 and the mask image generated in step S16 in the memory 28 as learning data (step S18). 15 (A) and 15 (B) show an example of learning data composed of a pair of a learning image and a mask image generated as described above and stored in the memory 28.
 続いて、プロセッサ2は、学習データの生成を終了するか否かを判別する(ステップS20)。例えば、ユーザからの学習データの生成終了の指示入力があった場合や、1つの撮影画像ITPとマスク画像のペアから、予め設定された規定数の学習データの作成が終了した場合を学習データの生成終了と判別することができる。 Subsequently, the processor 2 determines whether or not to end the generation of the training data (step S20). For example, when there is an instruction input from the user to end the generation of learning data, or when the creation of a predetermined number of learning data from one pair of captured image ITP and mask image is completed, the learning data It can be determined that the generation is completed.
 学習データの生成を終了していないと判別されると(「No」の場合)、ステップS14、ステップS16に戻り、ステップS14~ステップS20により次の学習データを作成する。 When it is determined that the generation of the learning data has not been completed (in the case of "No"), the process returns to step S14 and step S16, and the next learning data is created in steps S14 to S20.
 学習データの生成を終了すると判別されると(「Yes」の場合)、ステップS10、ステップS12で取得した撮影画像ITP,マスク画像IMに基づく学習データの作成を終了させる。 When it is determined that the generation of the training data is completed (in the case of "Yes"), the creation of the training data based on the captured image ITP and the mask image IM acquired in steps S10 and S12 is completed.
 尚、ステップS10、ステップS12において、別の撮影画像ITP,マスク画像IMが取得される場合には、その撮影画像ITP,マスク画像IMに基づく複数の学習データの作成が行われることは言うまでもない。 Needless to say, when another captured image ITP and mask image IM are acquired in steps S10 and S12, a plurality of learning data based on the captured image ITP and mask image IM are created.
 [その他]
 本実施形態では、対象物の画像は対象物を撮影した画像であるが、これに限らず、例えば、対象物のCAD(computer-aided design)データにより作成された画像等を含む。また、本実施形態では、対象物として薬剤を例に説明したが、対象物はこれに限らず、例えば、医療用器具を含む工業製品やその部品、農産物、あるいは顕微鏡等で撮影される微生物を含む。
[others]
In the present embodiment, the image of the object is an image obtained by photographing the object, but the present invention is not limited to this, and includes, for example, an image created by CAD (computer-aided design) data of the object. Further, in the present embodiment, a drug has been described as an example as an object, but the object is not limited to this, and for example, an industrial product including a medical device, its parts, an agricultural product, or a microorganism photographed with a microscope or the like is used. include.
 また、本発明に係る学習データ作成装置の、例えば、CPU24等の各種の処理を実行する処理部(processing unit)のハードウェア的な構造は、次に示すような各種のプロセッサ(processor)である。各種のプロセッサには、ソフトウェア(プログラム)を実行して各種の処理部として機能する汎用的なプロセッサであるCPU(Central Processing Unit)、FPGA(Field Programmable Gate Array)などの製造後に回路構成を変更可能なプロセッサであるプログラマブルロジックデバイス(Programmable Logic Device:PLD)、ASIC(Application Specific Integrated Circuit)などの特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路などが含まれる。 Further, the hardware structure of the learning data creation device according to the present invention, for example, a processing unit that executes various processes such as a CPU 24, is various processors as shown below. .. For various processors, the circuit configuration can be changed after manufacturing the CPU (Central Processing Unit), FPGA (Field Programmable Gate Array), etc., which are general-purpose processors that execute software (programs) and function as various processing units. Includes a dedicated electric circuit, which is a processor with a circuit configuration specially designed to execute a specific process such as a programmable logic device (PLD), an ASIC (Application Specific Integrated Circuit), etc. Is done.
 1つの処理部は、これら各種のプロセッサのうちの1つで構成されていてもよいし、同種または異種の2つ以上のプロセッサ(例えば、複数のFPGA、あるいはCPUとFPGAの組み合わせ)で構成されてもよい。また、複数の処理部を1つのプロセッサで構成してもよい。複数の処理部を1つのプロセッサで構成する例としては、第1に、クライアントやサーバなどのコンピュータに代表されるように、1つ以上のCPUとソフトウェアの組合せで1つのプロセッサを構成し、このプロセッサが複数の処理部として機能する形態がある。第2に、システムオンチップ(System On Chip:SoC)などに代表されるように、複数の処理部を含むシステム全体の機能を1つのIC(Integrated Circuit)チップで実現するプロセッサを使用する形態がある。このように、各種の処理部は、ハードウェア的な構造として、上記各種のプロセッサを1つ以上用いて構成される。 One processing unit may be composed of one of these various processors, or may be composed of two or more processors of the same type or different types (for example, a plurality of FPGAs or a combination of a CPU and an FPGA). You may. Further, a plurality of processing units may be configured by one processor. As an example of configuring a plurality of processing units with one processor, first, one processor is configured by a combination of one or more CPUs and software, as represented by a computer such as a client or a server. There is a form in which a processor functions as a plurality of processing units. Secondly, as typified by System On Chip (SoC), there is a form in which a processor that realizes the functions of the entire system including a plurality of processing units with one IC (Integrated Circuit) chip is used. be. As described above, the various processing units are configured by using one or more of the above-mentioned various processors as a hardware-like structure.
 これらの各種のプロセッサのハードウェア的な構造は、より具体的には、半導体素子などの回路素子を組み合わせた電気回路(circuitry)である。 More specifically, the hardware structure of these various processors is an electric circuit (circuitry) that combines circuit elements such as semiconductor elements.
 また、本発明は、コンピュータにインストールされることにより、本発明に係る学習データ作成装置として各種の機能を実現させる学習データ作成プログラム、及びこの学習データ作成プログラムが記録された記録媒体を含む。 Further, the present invention includes a learning data creation program that realizes various functions as a learning data creation device according to the present invention by being installed on a computer, and a recording medium on which this learning data creation program is recorded.
 更に、本発明は上述した実施形態に限定されず、本発明の精神を逸脱しない範囲で種々の変形が可能であることは言うまでもない。 Furthermore, it goes without saying that the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the spirit of the present invention.
1 学習データ作成装置
2 プロセッサ
10 撮影装置
12A、12B カメラ
13 撮影制御部
14 ステージ
16A、16B 照明装置
16A1~16A4,16B1~16B4 発光部
20 取得部
22 画像取得部
23 第1領域情報取得部
24 CPU
25 操作部
26 RAM
27 ROM
28 メモリ
29 表示部
30 学習用画像生成部
32 正解データ生成部
34 記憶制御部
50 機械学習装置
52 学習モデル(CNN)
52A 入力層
52B 中間層
52C 出力層
54 損失値算出部
56 パラメータ制御部
、I、I 学習用画像
IE エッジ画像
IM、I、I、I マスク画像(正解データ)
ITP 撮影画像
tpl テンプレート画像
S10~S20 ステップ
T、T1~T6 薬剤
TP 薬包
1 Learning data creation device 2 Processor 10 Imaging device 12A, 12B Camera 13 Imaging control unit 14 Stage 16A, 16B Lighting device 16A1 to 16A4, 16B1 to 16B4 Light emitting unit 20 Acquisition unit 22 Image acquisition unit 23 First area information acquisition unit 24 CPU
25 Operation unit 26 RAM
27 ROM
28 Memory 29 Display unit 30 Learning image generation unit 32 Correct answer data generation unit 34 Storage control unit 50 Machine learning device 52 Learning model (CNN)
52A input layer 52B intermediate layer 52C output layer 54 loss value calculation unit 56 parameter control unit I A, I B, I C learning image IE edge image IM, I a, I b, I c mask image (correct answer data)
ITP photographed image Itpl template image S10-S20 Step T, T1-T6 Drug TP Drug package

Claims (26)

  1.  プロセッサと、メモリとを備え、前記プロセッサが機械学習用の学習データを作成する学習データ作成装置であって、
     前記プロセッサは、
     対象物の画像を取得する取得処理と、
     前記取得した前記対象物の画像を移動させて学習用画像を生成する学習用画像生成処理と、
     前記生成した前記学習用画像における前記対象物の領域に対応する第2領域情報を生成し、前記生成した前記第2領域情報を前記学習用画像に対する正解データとする正解データ生成処理と、
     前記生成した学習用画像と前記正解データとのペアを、学習データとして前記メモリに記憶させる記憶制御と、
     を行う学習データ作成装置。
    A learning data creation device including a processor and a memory, wherein the processor creates learning data for machine learning.
    The processor
    The acquisition process to acquire the image of the object and
    A learning image generation process for moving the acquired image of the object to generate a learning image, and
    A correct answer data generation process that generates second region information corresponding to the region of the object in the generated learning image and uses the generated second region information as correct data for the learning image.
    A storage control for storing a pair of the generated learning image and the correct answer data in the memory as learning data, and
    Learning data creation device to perform.
  2.  前記プロセッサの前記取得処理は、前記対象物の領域に対応する第1領域情報を取得し、
     前記正解データ生成処理は、前記取得した前記第1領域情報に基づいて前記第2領域情報を生成する、
     請求項1に記載の学習データ作成装置。
    The acquisition process of the processor acquires the first area information corresponding to the area of the object, and obtains the first area information.
    The correct answer data generation process generates the second area information based on the acquired first area information.
    The learning data creation device according to claim 1.
  3.  前記第1領域情報は、前記対象物の領域を手動で設定した領域情報、前記対象物の領域を画像処理により自動で抽出した領域情報、又は前記対象物の領域を画像処理により自動で抽出し、かつ手動で調整された領域情報である、
     請求項2に記載の学習データ作成装置。
    The first area information includes area information in which the area of the object is manually set, area information in which the area of the object is automatically extracted by image processing, or area information in which the area of the object is automatically extracted by image processing. , And manually adjusted area information,
    The learning data creation device according to claim 2.
  4.  前記正解データは、前記対象物の領域に対応する正解画像、前記対象物の領域を矩形で囲むバウンディングボックス情報、及び前記対象物の領域のエッジを示すエッジ情報のうちの少なくとも1つを含む、
     請求項2又は3に記載の学習データ作成装置。
    The correct answer data includes at least one of a correct image corresponding to the region of the object, bounding box information surrounding the region of the object with a rectangle, and edge information indicating an edge of the region of the object.
    The learning data creation device according to claim 2 or 3.
  5.  前記学習用画像生成処理は、前記対象物の画像を平行移動、回転移動、反転、又は拡縮させて前記学習用画像を生成し、
     前記正解データ生成処理は、前記第1領域情報を前記対象物の画像に対応して平行移動、回転移動、反転、又は拡縮させて前記正解データを生成する、
     請求項2から4のいずれか1項に記載の学習データ作成装置。
    The learning image generation process generates the learning image by translating, rotating, reversing, or scaling the image of the object.
    The correct answer data generation process generates the correct answer data by translating, rotating, reversing, or scaling the first region information corresponding to the image of the object.
    The learning data creation device according to any one of claims 2 to 4.
  6.  前記学習用画像生成処理は、前記対象物の画像を平行移動、回転移動、反転、又は拡縮させた2以上の画像を合成して前記学習用画像を生成し、
     前記正解データ生成処理は、前記2以上の画像の各々に対応する前記第1領域情報を前記対象物の画像に対応して平行移動、回転移動、反転、又は拡縮させて前記正解データを生成する、
     請求項2から5のいずれか1項に記載の学習データ作成装置。
    The learning image generation process generates the learning image by synthesizing two or more images obtained by translating, rotating, reversing, or scaling the image of the object.
    The correct answer data generation process generates the correct answer data by translating, rotating, reversing, or scaling the first region information corresponding to each of the two or more images in accordance with the image of the object. ,
    The learning data creation device according to any one of claims 2 to 5.
  7.  前記学習用画像生成処理は、複数の対象物の画像を含む前記学習用画像を生成する際に、前記複数の対象物の画像の全部又は一部が点又は線で接触する前記学習用画像を生成する、
     請求項1から6のいずれか1項に記載の学習データ作成装置。
    In the learning image generation process, when a learning image including an image of a plurality of objects is generated, the learning image in which all or a part of the images of the plurality of objects are in contact with a point or a line is generated. Generate,
    The learning data creation device according to any one of claims 1 to 6.
  8.  前記正解データは、前記複数の対象物の画像の全部又は一部が点又は線で接触する箇所のみを示すエッジ画像を含む、
     請求項7に記載の学習データ作成装置。
    The correct answer data includes an edge image showing only a part where all or a part of the images of the plurality of objects contact with a point or a line.
    The learning data creation device according to claim 7.
  9.  前記対象物は、少なくとも一部が透明である、
     請求項1から8のいずれか1項に記載の学習データ作成装置。
    The object is at least partially transparent,
    The learning data creation device according to any one of claims 1 to 8.
  10.  前記プロセッサによる前記学習用画像生成処理は、複数の前記対象物の画像を含む前記学習用画像を生成する際に、前記透明な対象物の画像以外の対象物の画像を移動させる、
     請求項9に記載の学習データ作成装置。
    The learning image generation process by the processor moves an image of an object other than the transparent object image when generating the learning image including a plurality of images of the object.
    The learning data creation device according to claim 9.
  11.  前記対象物は、少なくとも一部が透明であり、
     前記学習用画像生成処理は、前記対象物の画像を閾値以内で移動させて前記学習用画像を生成する、
     請求項1から8のいずれか1項に記載の学習データ作成装置。
    The object is at least partially transparent and
    The learning image generation process generates the learning image by moving the image of the object within a threshold value.
    The learning data creation device according to any one of claims 1 to 8.
  12.  前記移動は、平行移動及び回転移動のいずれか一方を含む、
     請求項11に記載の学習データ作成装置。
    The movement includes either translation or rotation.
    The learning data creation device according to claim 11.
  13.  対象物の画像を移動させて生成した学習用画像と、
     前記学習用画像における前記対象物の領域を示す第2領域情報を有する正解データと、
     のペアからなる、学習データ。
    A learning image generated by moving the image of the object and
    Correct answer data having a second area information indicating the area of the object in the learning image, and
    Learning data consisting of a pair of.
  14.  学習モデルと、
     請求項13に記載の学習データを使用し、前記学習モデルを機械学習させる学習制御部と、
     を備えた機械学習装置。
    Learning model and
    A learning control unit that machine-learns the learning model using the learning data according to claim 13.
    Machine learning device equipped with.
  15.  前記学習モデルは、畳み込みニューラルネットワークで構成される、
     請求項14に記載の機械学習装置。
    The learning model is composed of a convolutional neural network.
    The machine learning device according to claim 14.
  16.  プロセッサが、以下の各ステップの処理を行うことにより機械学習用の学習データを作成する学習データ作成方法であって、
     対象物の画像を取得するステップと、
     前記取得した前記対象物の画像を移動させて学習用画像を生成するステップと、
     前記生成した学習用画像における前記対象物の領域に対応する第2領域情報を生成し、前記生成した前記第2領域情報を前記学習用画像に対する正解データとするステップと、
     前記生成した学習用画像と前記正解データとのペアを、学習データとしてメモリに記憶させるステップと、
     を含む学習データ作成方法。
    It is a learning data creation method in which the processor creates learning data for machine learning by performing the processing of each of the following steps.
    Steps to get an image of an object,
    A step of moving the acquired image of the object to generate a learning image, and
    A step of generating second region information corresponding to the region of the object in the generated learning image and using the generated second region information as correct answer data for the learning image.
    A step of storing the pair of the generated learning image and the correct answer data in the memory as learning data, and
    How to create learning data including.
  17.  前記対象物の領域に対応する第1領域情報を取得するステップを含み、
     前記正解データを生成するステップは、前記取得した前記第1領域情報に基づいて前記第2領域情報を生成する、
     請求項16に記載の学習データ作成方法。
    Including the step of acquiring the first area information corresponding to the area of the object.
    The step of generating the correct answer data generates the second area information based on the acquired first area information.
    The learning data creation method according to claim 16.
  18.  前記正解データは、前記対象物の領域に対応する正解画像、前記対象物の領域を矩形で囲むバウンディングボックス情報、及び前記対象物の領域のエッジを示すエッジ情報のうちの少なくとも1つを含む、
     請求項16又は17に記載の学習データ作成方法。
    The correct answer data includes at least one of a correct image corresponding to the region of the object, bounding box information surrounding the region of the object with a rectangle, and edge information indicating an edge of the region of the object.
    The learning data creation method according to claim 16 or 17.
  19.  前記学習用画像を生成するステップは、複数の前記対象物の画像を配置する際に、前記複数の対象物の画像の全部又は一部を点又は線で接触させる、
     請求項16から18のいずれか1項に記載の学習データ作成方法。
    The step of generating the learning image is to bring all or a part of the images of the plurality of objects into contact with points or lines when arranging the images of the plurality of objects.
    The learning data creation method according to any one of claims 16 to 18.
  20.  前記正解データは、前記複数の対象物の画像の前記点又は線で接触する箇所のみを示すエッジ画像を含む、
     請求項19に記載の学習データ作成方法。
    The correct answer data includes an edge image showing only the points or lines of the images of the plurality of objects that come into contact with each other.
    The learning data creation method according to claim 19.
  21.  前記対象物は、少なくとも一部が透明である、
     請求項16から20のいずれか1項に記載の学習データ作成方法。
    The object is at least partially transparent,
    The learning data creation method according to any one of claims 16 to 20.
  22.  前記学習用画像を生成するステップは、複数の対象物の画像を含む学習用画像を生成する際に、前記透明な対象物の画像以外の対象物の画像を移動させる、
     請求項21に記載の学習データ作成方法。
    The step of generating the learning image moves an image of an object other than the transparent object image when generating a learning image including an image of a plurality of objects.
    The learning data creation method according to claim 21.
  23.  対象物の画像を取得する機能と、
     前記取得した前記対象物の画像を移動させて学習用画像を生成する機能と、
     前記生成した学習用画像における前記対象物の領域に対応する第2領域情報を生成し、前記生成した前記第2領域情報を前記学習用画像に対する正解データとする機能と、
     前記生成した学習用画像と前記正解データとのペアを、学習データとしてメモリに記憶させる機能と、
     をコンピュータにより実現させる学習データ作成プログラム。
    The function to acquire the image of the object and
    A function of moving the acquired image of the object to generate a learning image, and
    A function of generating second region information corresponding to the region of the object in the generated learning image and using the generated second region information as correct answer data for the learning image.
    A function of storing a pair of the generated learning image and the correct answer data in a memory as learning data,
    A learning data creation program that realizes this with a computer.
  24.  前記対象物の領域に対応する第1領域情報を取得する機能を含み、
     前記正解データを生成する機能は、前記取得した前記第1領域情報に基づいて前記第2領域情報を生成する、
     請求項23に記載の学習データ作成プログラム。
    Includes a function to acquire first area information corresponding to the area of the object.
    The function of generating the correct answer data generates the second area information based on the acquired first area information.
    The learning data creation program according to claim 23.
  25.  非一時的かつコンピュータ読取可能な記録媒体であって、請求項23に記載のプログラムが記録された記録媒体。 A non-temporary, computer-readable recording medium on which the program according to claim 23 is recorded.
  26.  非一時的かつコンピュータ読取可能な記録媒体であって、請求項24に記載のプログラムが記録された記録媒体。 A non-temporary, computer-readable recording medium on which the program according to claim 24 is recorded.
PCT/JP2021/008789 2020-03-13 2021-03-05 Learning data creation device, method, program, learning data, and machine learning device WO2021182343A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022507149A JP7531578B2 (en) 2020-03-13 2021-03-05 Learning data creation device, method, program, and recording medium

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2020044140 2020-03-13
JP2020-044140 2020-03-13
JP2020116457 2020-07-06
JP2020-116457 2020-07-06

Publications (1)

Publication Number Publication Date
WO2021182343A1 true WO2021182343A1 (en) 2021-09-16

Family

ID=77671698

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/008789 WO2021182343A1 (en) 2020-03-13 2021-03-05 Learning data creation device, method, program, learning data, and machine learning device

Country Status (2)

Country Link
JP (1) JP7531578B2 (en)
WO (1) WO2021182343A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019046269A (en) * 2017-09-04 2019-03-22 株式会社Soat Machine learning training data generation
JP2019212106A (en) * 2018-06-06 2019-12-12 日本電信電話株式会社 Area extraction model learning device, area extraction model learning method, and program
JP2020014799A (en) * 2018-07-27 2020-01-30 コニカミノルタ株式会社 X-ray image object recognition system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6015112B2 (en) 2012-05-11 2016-10-26 株式会社ニコン Cell evaluation apparatus, cell evaluation method and program
JP6984908B2 (en) * 2017-03-03 2021-12-22 国立大学法人 筑波大学 Target tracking device
JP6441980B2 (en) * 2017-03-29 2018-12-19 三菱電機インフォメーションシステムズ株式会社 Method, computer and program for generating teacher images
JP6977513B2 (en) * 2017-12-01 2021-12-08 コニカミノルタ株式会社 Machine learning methods and equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019046269A (en) * 2017-09-04 2019-03-22 株式会社Soat Machine learning training data generation
JP2019212106A (en) * 2018-06-06 2019-12-12 日本電信電話株式会社 Area extraction model learning device, area extraction model learning method, and program
JP2020014799A (en) * 2018-07-27 2020-01-30 コニカミノルタ株式会社 X-ray image object recognition system

Also Published As

Publication number Publication date
JP7531578B2 (en) 2024-08-09
JPWO2021182343A1 (en) 2021-09-16

Similar Documents

Publication Publication Date Title
Jiang et al. Neuman: Neural human radiance field from a single video
CN106797458B (en) The virtual change of real object
CN104937635B (en) More hypothesis target tracking devices based on model
US11688075B2 (en) Machine learning feature vector generator using depth image foreground attributes
DE102018121282A1 (en) DIFFERENTIAL RENDERING PIPELINE FOR INVERSE GRAPHICS
WO2019167453A1 (en) Image processing device, image processing method, and program
CN109712223B (en) Three-dimensional model automatic coloring method based on texture synthesis
CN106503724A (en) Grader generating means, defective/zero defect determining device and method
JP7083037B2 (en) Learning device and learning method
US20180189955A1 (en) Augumented reality (ar) method and system enhanced through efficient edge detection of image objects
JP6911123B2 (en) Learning device, recognition device, learning method, recognition method and program
CN111243051B (en) Portrait photo-based simple drawing generation method, system and storage medium
CN106062824A (en) Edge detection device, edge detection method, and program
Cukovic et al. Engineering design education for industry 4.0: implementation of augmented reality concept in teaching CAD courses
WO2021182345A1 (en) Training data creating device, method, program, training data, and machine learning device
US20240290027A1 (en) Synthetic Images for Machine Learning
JP2001216518A (en) Method and device for matching and recording medium
WO2021182343A1 (en) Learning data creation device, method, program, learning data, and machine learning device
JP2007315777A (en) Three-dimensional shape measurement system
WO2021161903A1 (en) Object recognition apparatus, method, program, and learning data
Gevorgyan et al. OpenCV 4 with Python Blueprints: Build creative computer vision projects with the latest version of OpenCV 4 and Python 3
US20230041693A1 (en) Method of data augmentation and non-transitory computer readable media
CN117939100B (en) Folded screen video processing method and related device
WO2022244578A1 (en) Information processing system and information processing method
WO2023090090A1 (en) Device and method for generating learning data, and device and method for generating learning model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21768422

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022507149

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21768422

Country of ref document: EP

Kind code of ref document: A1