WO2021085258A1 - Image processing device, image processing device control method, identifier generation method, identification method, identification device, identifier generation device, and identifier - Google Patents

Image processing device, image processing device control method, identifier generation method, identification method, identification device, identifier generation device, and identifier Download PDF

Info

Publication number
WO2021085258A1
WO2021085258A1 PCT/JP2020/039496 JP2020039496W WO2021085258A1 WO 2021085258 A1 WO2021085258 A1 WO 2021085258A1 JP 2020039496 W JP2020039496 W JP 2020039496W WO 2021085258 A1 WO2021085258 A1 WO 2021085258A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
learning
image
data set
information
Prior art date
Application number
PCT/JP2020/039496
Other languages
French (fr)
Japanese (ja)
Inventor
泰 吉正
彰大 田谷
河村 英孝
Original Assignee
キヤノン株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by キヤノン株式会社 filed Critical キヤノン株式会社
Publication of WO2021085258A1 publication Critical patent/WO2021085258A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to an image processing device, a control method of the image processing device, a method of generating a classifier for identifying identification target information in data, a method of discriminating using the classifier generated by the method of generating the classifier, and the like.
  • the present invention relates to a discriminator, a method of generating a discriminator, a discriminator generator, and a discriminator.
  • Segmentation is a process for specifying the class (classification) to which the pixel belongs for each region, and is used for diagnosis using medical images, infrastructure inspection, various particle analysis, and the like.
  • Patent Document 1 describes a technique for distinguishing between benign and malignant of the target abnormal shadow by acquiring the region and the feature amount of the target abnormal shadow (hereinafter referred to as the target abnormal shadow) from the medical image.
  • This technology extracts the region of interest from the region of interest in the medical image using multiple position coordinates that are different from each other, and performs learning to perform differential diagnosis with high accuracy even if there are variations due to the work of the doctor. is there.
  • Increasing the data used for learning to give diversity in this way is called “Data Augmentation”, and is a technique often used to improve the accuracy of inference results.
  • the image processing apparatus for solving the above problems is an image processing apparatus that acquires information of a specific region in an image based on inference, and extracts information from the image based on a predetermined inference condition. It has an information acquisition means for acquiring the information of the specific region, which is inferred by inputting each of the information of the plurality of attention regions to the trained model, and the plurality of attention regions are the first attention. A region and a second region of interest are included, and each of the first region of interest and the second region of interest has a region that overlaps with each other and a region that does not overlap with each other.
  • the control method of the image processing device is a control method of an image processing device that acquires information of a specific region in an image based on inference, and is extracted from the image based on a predetermined inference condition. It has an information acquisition step of acquiring the information of the specific area, which is inferred by inputting the information of the plurality of areas of interest into the trained model, and the plurality of areas of interest are the first areas of interest.
  • a control method for an image processing apparatus including a second attention region, wherein the first attention region and the second attention region each have a region that overlaps with each other and a region that does not overlap with each other.
  • Another invention is a method for generating a classifier for identifying identification target information in data, and is a first training data in an initial data set including a plurality of learning data created from the data.
  • the amount of the identification target information included in the first learning data set includes a second learning step of updating the information contained in the classifier by learning using the data set for learning.
  • the present invention relates to a method for generating a classifier, which is characterized in that the amount of the identification target information included in the second learning data set is larger than the amount of the identification target information.
  • Yet another generation method is a method for generating a classifier for estimating identification target information in data, and includes input data and learning data composed of teacher data for the input data.
  • the first training data set For a training data set group having 1 training data set and a second training data set including a larger number of the training data than the first training data set, the first training data set
  • the amount of the identification target information contained in the input data included in the first training data set includes a generation step of generating the classifier using the training data set group, and the amount of the identification target information is the second training data.
  • the input data included in the set is characterized in that it is larger than the amount of the identification target information.
  • Yet another generator is a generator for estimating identification target information in data, and includes input data and learning data composed of teacher data for the input data.
  • the first training data set For a training data set group having 1 training data set and a second training data set including a larger number of the training data than the first training data set, the first training data set
  • the amount of the identification target information contained in the input data included in the first training data set includes a generation means for generating the classifier using the training data set group, and the amount of the identification target information is the second training data.
  • the input data included in the set is characterized in that it is larger than the amount of the identification target information.
  • the image processing apparatus since each of the information of the plurality of areas of interest in the image is input to the trained model to perform inference, the inference accuracy of the information of the specific area in the image can be improved. ..
  • the image processing apparatus 1-100 acquires information of a specific region in an image based on inference. Specifically, each of the information of the plurality of attention regions (1-540 to 1-542) extracted from the image (1-500) based on a predetermined inference condition is input to the trained model 1-47.
  • the information acquisition means 1-50 for acquiring the information of the specific area (1-520) inferred by the above is provided.
  • the plurality of attention regions include a first attention region (for example, 1-540) and a second attention region (for example, 1-541).
  • a trained model is used to extract a specific region (1-520) in image 1-500 by inference.
  • the trained model is obtained by training an image whose specific region is known as teacher data.
  • each of the information of the plurality of areas of interest extracted from the image is input to the above-mentioned trained model.
  • the plurality of areas of interest are selected so as to have a region that overlaps with each other and a region that does not overlap with each other.
  • the plurality of inference results can be obtained in a certain region A (area in which the regions of interest overlap each other) in the image, but also inference results in the region around the region A can be obtained.
  • the image in the present embodiment is, for example, an image including an image of a first material and an image of a second material different from the first material.
  • the information in the specific region includes at least one of the position of the image of the second material in the image and the size of the image of the second material.
  • the size of the first attention area and the size of the second attention area are the same because it is easy to input to the trained model.
  • the information of the region of interest includes information on at least one of the position and size of the region extracted from the image.
  • the image processing device may further have a reception unit 1-41 that accepts the setting of inference conditions.
  • the reception unit may be one that receives an instruction issued by the user operating the operation unit 1-140, one that receives an automatic instruction by the image processing device, or another.
  • the information acquisition means 1-50 may have a model acquisition unit 1-42 for acquiring the trained model 1-47.
  • the model acquisition unit has a generation unit (not shown) that generates a trained model, and the trained model may be acquired from the generation unit or may be acquired from the data server 1-120.
  • the information acquisition means may have extraction units 1-43 that extract a plurality of regions of interest from the image based on the inference conditions received by the reception unit. Further, the information acquisition means acquires a plurality of inference results by inputting each of the plurality of attention regions extracted by the extraction unit into the trained model, and obtains information in a specific region based on the plurality of inference results. It may have the information acquisition unit 1-45 to be acquired.
  • the extraction unit may extract a plurality of areas of interest using random numbers, may extract areas of interest regularly from end to end of the image, or may use both methods.
  • the inference conditions are, for example, the number of inferences performed on average for each pixel of the image, the threshold value of the ratio of the number of times the area of interest is inferred to be a specific area to the number of times the area of interest is inferred, and the attention. Includes at least one of the size of the area.
  • the image processing apparatus when the image includes a plurality of specific areas and the areas of the plurality of specific areas have a distribution, the image processing apparatus according to the present embodiment is preferably used. Further, when the ratio of the maximum value to the minimum value of the area of the plurality of specific regions is 50 or more, particularly when the ratio is 100 or more, the image processing apparatus according to the present embodiment is preferably used.
  • the image processing apparatus is a display control unit that displays on the display unit so that the display mode of the specific area in the image and the display mode other than the specific area are different based on the information of the specific area. May further have. For example, as shown in FIG. 4, it is possible to display the specific areas 1-520 as black and the other areas as white. As a means for changing the display mode, a means other than changing the color may be used.
  • the control method of the image processing device is the control method of the image processing device that acquires the information of the specific region in the image based on the inference.
  • the plurality of areas of interest include a first area of interest and a second area of interest, and the first area of interest and the second area of interest do not overlap with each other. Has an area.
  • the image processing apparatus processes the inference process using the trained model.
  • the user sets inference conditions, and the image processing device extracts a plurality of regions of interest from the inference image based on the inference conditions.
  • the image processing device makes inferences using a common trained model for each of the plurality of areas of interest, and calculates the final inference result based on each inference result.
  • the inference result refers to, for example, an object detection result or a segmentation result.
  • the image processing system 1-190 includes an image capturing device 1-110 for capturing an image, a data server 1-120 for storing the captured image, and an image processing device 1-100 for performing image processing. Further, it has a display unit 1-130 for displaying the acquired input image and the image processing result, and an operation unit 1-140 for inputting an instruction from the user.
  • the image processing device 1-100 acquires an input image and performs image processing on the region of interest reflected in the input image.
  • the input image is, for example, an image obtained by subjecting image data acquired by the image capturing apparatus 1-110 to image processing or the like to obtain an image suitable for analysis. Further, the input image in the present embodiment is an inference image.
  • the image processing device 1-100 is, for example, a computer, and performs image processing according to the present embodiment.
  • the image processing device 1-100 has at least a CPU 1-31, a communication IF 1-32, a ROM 1-33, a RAM 1-34, a storage unit 1-35, and a common bus 1-36.
  • the CPU 1-31 integrally controls the operation of each component of the image processing device 1-100.
  • the image processing device 1-100 may also control the operation of the image capturing device 1-110 by controlling the CPU 1-31.
  • the data server 1-120 holds an image captured by the image capturing device 1-110.
  • Communication IF (Interface) 1-32 is realized by, for example, a LAN card. Communication between the external device (for example, data server 1-120) and the image processing device 1-100 is performed by the communication IF1-32.
  • the ROM 1-33 is realized by a non-volatile memory or the like, stores a control program executed by the CPU 1-31, and provides a work area when the program is executed by the CPU 1-31.
  • RAM (Random Access Memory) 1-34 is realized by a volatile memory or the like, and temporarily stores various information.
  • the storage unit 1-35 is realized by, for example, an HDD (Hard Disk Drive) or the like. Then, the storage unit 1-35 stores various application software including an operating system (OS: Operating System), a device driver of a peripheral device, and a program for performing image processing according to the present embodiment described later.
  • the operation unit 1-140 is realized by, for example, a keyboard, a mouse, or the like, and inputs an instruction from the user into the device.
  • the display unit 1-130 is realized by, for example, a display or the like, and displays various information toward the user.
  • the operation unit 1-140 and the display unit 1-130 provide a function as a GUI (Graphical User Interface) under the control of the CPU 1-31.
  • the display unit 1-130 may be a touch panel monitor that accepts operation input, and the operation unit 1-140 may be a stylus pen.
  • Each of the above components is communicably connected to each other by common bus 1-36.
  • the imaging apparatus 1-110 is, for example, a scanning electron microscope (SEM), a transmission electron microscope (TEM: Transmission Electron Microscope), or an optical microscope.
  • the image capturing device 1-110 may also be a device having an image capturing function such as a digital camera or a smartphone.
  • the image capturing device 1-110 transmits the acquired image to the data server 1-120.
  • An imaging control unit (not shown) that controls the imaging apparatus 1-110 may be included in the image processing apparatus 1-100.
  • the main body that executes the program may be one or more CPUs, and the ROM that stores the program may also be one or more memories. Further, another processor such as GPU (Graphics Processing Unit) may be used instead of the CPU or in combination with the CPU. That is, the functions of the respective parts shown in FIG. 2 are realized by executing a program stored in at least one or more memories in which at least one or more processors (hardware) are communicably connected to the processors.
  • processors hardware
  • the image processing device 1-100 has a reception unit 1-41, a model acquisition unit 1-42, an extraction unit 1-43, an inference unit 1-44, an information acquisition unit 1-45, and a display control unit 1-46 as functional configurations. Has.
  • the image processing device 1-100 is communicably connected to the data server 1-120 and the display unit 1-130.
  • Reception unit 1-41 receives the inference condition input from the user via operation unit 1-140. That is, the operation unit 1-140 corresponds to an example of a reception means that accepts the setting of the inference condition.
  • the inference condition includes at least one of information on the number of inferences (described later), a threshold value, and a patch size.
  • the model acquisition unit 1-42 acquires the trained model 1-47 constructed in advance and the inference image from the data server 1-120.
  • the extraction unit 1-43 extracts a plurality of regions of interest from the inference image based on the inference conditions received by the reception unit 1-41. That is, it corresponds to an example of an extraction means for extracting a plurality of regions of interest from an image for inference.
  • the area of interest refers to a part cut out from the inference image.
  • the inference unit 1-44 makes inferences for each of the plurality of areas of interest using the trained model 1-47 acquired by the model acquisition unit 1-42. That is, it corresponds to an example of an inference means that makes an inference using a common trained model for each of a plurality of areas of interest.
  • the information acquisition unit 1-45 calculates the final inference result based on the inference result performed by the inference unit 1-44. That is, it corresponds to an example of a calculation means for calculating the final inference result based on a plurality of inference results.
  • the display control unit 1-46 outputs the information regarding the inference result acquired in each process to the display unit 1-130, and causes the display unit 1-130 to display the result of each process.
  • each part of the image processing device 1-100 may be realized as an independent device.
  • the image processing device 1-100 may be a workstation.
  • the functions of each part may be realized as software that operates on a computer, and the software that realizes the functions of each part may be realized on a server via a network such as a cloud.
  • a network such as a cloud.
  • each part is realized by software running on a computer installed in a local environment.
  • FIG. 3 is a diagram showing a processing procedure of processing executed by the image processing apparatus 1-100 of the present embodiment.
  • This embodiment is realized by the CPU 1-31 executing a program that realizes the functions of each part stored in the ROM 1-33.
  • an example in which the image to be processed is a TEM image will be described.
  • the TEM image is acquired as a two-dimensional shading image.
  • carbon black in the coating film of the melamine / alkyd resin paint will be described as an example of the object to be processed included in the image to be processed.
  • the reception unit 1-41 receives the inference condition input by the user in the operation unit 1-140.
  • the inference condition in the present embodiment includes at least one of information regarding the number of inferences, a threshold value, and a patch size.
  • the information regarding the number of inferences is information such as the average number of inferences and the number of extractions of each pixel, which will be described later.
  • step S1-202 the model acquisition unit 1-42 acquires the trained model constructed in advance and the inference image.
  • the inference image is acquired from the data server 1-120. If the patch size is set in steps S1-201, the trained model trained with the same patch size is acquired.
  • the patch size is the number of pixels in the vertical and horizontal directions of the cropped image when a part of the target image is cropped.
  • a pair of a TEM image, which is an image to be processed, and a teacher image is prepared.
  • the teacher image is an image processed image to be processed by using an appropriate image processing method. For example, it is an image obtained by binarizing an area to be detected and an area not to be detected, an image in which an area to be detected is filled, and an image in which an area not not to be detected is not filled.
  • the trained model 1-47 is generated by performing machine learning according to a predetermined algorithm using the image to be processed and the teacher image.
  • U-Net is used as a predetermined algorithm.
  • a learning method using U-Net a known technique can be used.
  • SVM Small Vector machine
  • DNN Deep Neural Network
  • CNN Convolutional Neural Network
  • FCN Full Convolutional Network
  • SegNet and the like can also be used as an algorithm used for Semantic Segmentation that classifies classes in 1-pixel units.
  • GAN Geneative Adversarial Networks
  • step S1-203 the extraction unit 1-43 extracts a plurality of regions of interest from the inference image.
  • FIG. 4 shows an example in which the attention region 1-540, the attention region 1-541, and the attention region 1-542 are extracted with respect to the position coordinates 1-530, the position coordinates 1-531, and the position coordinates 51-32. Shown.
  • the inference image in this embodiment is composed of a plurality of pixels whose positions can be specified by two-dimensional Cartesian coordinates (x, y).
  • a set of random numbers (x- i , y- i- ) satisfying 0 ⁇ x-i ⁇ x_size and 0 ⁇ y i - ⁇ y_size is generated.
  • the region of interest is set with (x- i , y- i-) as the upper left coordinate.
  • the size of the area of interest should be equal to the patch size.
  • the user sets the average number of inferences for each pixel in the operation unit 1-140.
  • the average number of inferences is the average number of extractions for each pixel when performing extraction. When extracting, it can be obtained by recording the number of times of extraction for each pixel.
  • (x- i, y i - ) is positioned near the end of the image, when the size of the region of interest becomes smaller than the patch size may fill the periphery of the image pixel values 0, so-called padding process By doing something like this, adjust the size of the area of interest so that it is the same as the patch size.
  • step S1-204 the inference unit 1-44 makes an inference using the trained model 1-47 for each of the plurality of areas of interest extracted in step S1-203.
  • step S1-205 the information acquisition unit 1-45 calculates and acquires the final inference result based on the inference result in step S1-204.
  • the number of times inferred and the number of times determined to be carbon black are recorded for each pixel, and the number of times determined to be carbon black / the number of times inferred becomes equal to or greater than the threshold value. Finally, it is determined that it is carbon black.
  • the threshold value may be set by the user in the operation unit 1-140. If the inference is not classification but regression processing, a new threshold is set in addition to the above threshold, and if it is above the threshold, the result is classified in advance by assuming that it is carbon black, and then the final result. Judgment processing is performed.
  • step S1-206 the display control unit 1-46 causes the display unit 1-130 to display the final inference result.
  • the display control unit 1-46 controls the display unit 1-130 to transmit the final inference result to the display unit 1-130 connected to the image processing device 1-100 and display the final inference result on the display unit 1-130.
  • it is determined for each pixel whether or not it is carbon black, and the pixel determined to be carbon black is displayed with a brightness of 255, and the pixel determined to be not carbon black is displayed with a brightness of 0.
  • IoU Intersection over Union
  • TP True Positive
  • FP False Positive
  • FN False Negative
  • the image processing apparatus 1-100 in the present embodiment can improve the inference accuracy by performing inference using a common trained model for each of a plurality of areas of interest. Further, since the user can set the threshold value, the inference accuracy can be controlled according to the purpose. For example, if you want to reduce undetected, lower the threshold, and if you want to reduce false positives, raise the threshold. You can make inferences according to the purpose while using the same trained model.
  • the reception unit 1-41 receives the inference condition input by the user in the operation unit 1-140.
  • the inference condition in the present embodiment includes at least one of information regarding the number of inferences, a threshold value, and a patch size.
  • the information regarding the number of inferences is information such as the number of times the reference coordinates are set, which will be described later.
  • step S1-203 the extraction unit 1-43 extracts a plurality of regions of interest with respect to the inference image 1-501. 6, reference coordinates 1-560 of (x- 1, y 1) as a reference, an example that extracts the attention region 1-550-interest region 1-558.
  • a plurality of reference coordinates (x- j, y j) ( j 1,2, ⁇ , N) and then, (x- j, y j) is 0 ⁇ x- j ⁇ p x, 0 ⁇ y j - a set of random numbers satisfying ⁇ p y.
  • the coordinates of the upper left of the other areas of interest are (x- j + p x x m, y j + p y x n) (where n is 1 or more, x_size / p x -1 or less integer. M is 1 or more, y_size / p y -1 an integer).
  • the user sets the reference coordinate setting number of times in the operation unit 1-140.
  • the reference coordinate setting number is the number of times that the upper left reference coordinate (x- j , yj )) is set by using a random number when extracting.
  • the reception unit 1-41 receives the inference condition input by the user in the operation unit 1-140.
  • the inference condition in the present embodiment includes at least one of information regarding the number of inferences, a threshold value, and a patch size.
  • the information regarding the number of inferences is information such as the number of times the reference coordinates are set, which will be described later.
  • step S1-203 the extraction unit 1-43 extracts a plurality of regions of interest with respect to the inference image 1-502. 8, reference coordinates 1-660 of (x- 1, y 1) as a reference, an example that extracts the attention region 1-560-interest region 1-568.
  • a plurality of reference coordinates (x- j, y j) ( j 1,2, ⁇ , N) and then, (x- j, y j) is 0 ⁇ x- j ⁇ p x, 0 ⁇ y j - a set of random numbers satisfying ⁇ p y.
  • the coordinates of the upper left of the other areas of interest are (x- j + p x x m, y j + p y x n) (where n is 1 or more, x_size / p x -1 or less integer. M is 1 or more, y_size / p y -1 an integer).
  • the user sets the reference coordinate setting number of times in the operation unit 1-140.
  • the reference coordinate setting number is the number of times that the upper left reference coordinate (x- j , yj )) is set by using a random number when extracting.
  • mIoU is defined by equation (1-2).
  • the patch size was 128 ⁇ 128 and the threshold was 0.2.
  • the reception unit 1-41 receives the inference condition input by the user in the operation unit 1-140.
  • the inference condition in the present embodiment includes at least one of information regarding the number of inferences, a threshold value, and a patch size.
  • the information regarding the number of inferences is information such as pitch, which will be described later.
  • step S1-203 the extraction unit 1-43 extracts a plurality of regions of interest from the inference image. 10, the reference coordinates 1-580 of (x- 1, y 1) as a reference, an example that extracts the attention region 1-570-interest region 1-572.
  • a plurality of areas of interest are extracted by shifting the areas of interest by the pitch vertically or horizontally.
  • the upper left coordinates of the region of interest 1-571 and the region of interest 1-572 are (x- 1 + pitch_x, y 1 ) and (x- 1 + 2 pitch_x, y 1 ), respectively.
  • mIoU was used as in the first and second embodiments.
  • evaluation was performed using mIoU.
  • c 3.
  • the trained model (discriminator) that has been learned by the following first learning step and the second learning step can be used.
  • the first learning step is a step in which learning is performed using the first learning data set among the initial data sets including a plurality of learning data created from the data including the identification target information.
  • the second learning process is performed by learning using the information contained in the trained model generated by learning in the first learning process and the second training data set of the initial data sets.
  • the amount of identification target information included in the first learning data set is larger than the amount of identification target information included in the second learning data set.
  • the trained model (identifier) can be used by the following padding step and the generation step.
  • the padding process consists of a first training data set containing input data and training data composed of teacher data for the input data, and a second training data set containing a larger number of training data than the first training data set. And, for the training data set group having, the training data is inflated so that the number of training data contained in the first training data set is equal to or larger than the number of training data contained in the second training data set. I do.
  • the generation step generates a trained model using the padding step and the training data set group having the padded training data.
  • the amount of identification target information contained in the input data included in the first learning data set is larger than the amount of identification target information contained in the input data included in the second learning data set. Since the contents of the third embodiment will be described later, they will be omitted here.
  • the first learning step when the first embodiment, the second embodiment, and the third embodiment are combined, the first learning step, the second, when generating the trained model of the first embodiment, Perform learning process, padding process, and generation process.
  • the image processing device and the image processing system in each of the above-described embodiments may be realized as a single device, or may be a form in which devices including a plurality of information acquisition devices are combined so as to be able to communicate with each other to execute the above-mentioned processing. Often, both are included in the embodiments of the present invention.
  • the above-mentioned processing may be executed by a common server device or a group of servers.
  • the common server device corresponds to the image processing device according to the embodiment
  • the server group corresponds to the image processing system according to the embodiment.
  • the image processing device and the plurality of devices constituting the image processing system need not be present in the same facility or in the same country as long as they can communicate at a predetermined communication rate.
  • the present invention can take an embodiment as, for example, a system, an apparatus, a method, a program, a recording medium (storage medium), or the like. Specifically, it may be applied to a system composed of a plurality of devices (for example, a host computer, an interface device, an imaging device, a Web application, etc.), or it may be applied to a device composed of one device. good.
  • a recording medium (or storage medium) in which a software program code (computer program) that realizes the functions of the above-described embodiment is recorded is supplied to the system or device.
  • the storage medium is a computer-readable storage medium.
  • the computer or CPU or GPU
  • the program code itself read from the recording medium realizes the function of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention.
  • Second Embodiment >> (Background of the second embodiment)
  • image processing, voice processing, text processing and the like are known.
  • the discrimination accuracy is improved by using deep learning, but various efforts are being made to further improve the discrimination accuracy.
  • Japanese Unexamined Patent Publication No. 2019-118670 (Reference 2-1) describes a diagnostic support device that supports diagnosis of a diseased area by using deep learning. This technique makes it possible to perform highly accurate diagnosis by normalizing the color brightness of an image in advance and separating the diseased part and the non-diseased part.
  • identification target information information to be identified in one data
  • the literature It was found that the methods described in 2-1 and Document 2-2 are difficult to identify. Further, when there is a large difference in the amount of identification target information for each data, it has been difficult to construct a classifier capable of accurately identifying the identification target information regardless of the amount of identification target information by the conventional method. ..
  • the object of the second embodiment is to accurately identify the identification target information even when there are a plurality of identification target information in one data or when it is difficult to distinguish the identification target information from other information.
  • the purpose is to provide a method for generating a discriminator that can be identified.
  • Another object of the present invention is to provide an identification method and an identification device using the identification device generated by the identification device generation method.
  • the method of generating the classifier according to the present embodiment includes a first learning step of learning using the first learning data set among the initial data sets including a plurality of learning data created from the data. Further, it is included in the classifier by learning using the information included in the classifier generated by learning in the first learning step and the second learning data set of the initial data sets. It has a second learning step of updating information. At that time, the amount of identification target information included in the first learning data set is larger than the amount of identification target information included in the second learning data set. In this way, the classifier is trained in two steps, starting with a data set having a large amount of information to be identified. As a result, it is possible to first learn the parameters of image conversion having a large degree of conversion and gradually change the parameters, so that the identification target information can be accurately identified.
  • Data is a representation of information that is formalized for transmission, interpretation, or processing and can be reinterpreted as information. Examples of data include image data, voice data, text data, and the like.
  • the identification target information is information to be identified in the data.
  • the data is image data, for example, at least one piece of information on the position, area, and distribution of the identification target area in the image data is the identification target information.
  • the classifier generated by the generation method according to the present embodiment can estimate and extract the identification target area in the image data, which is difficult to extract visually by the user.
  • the data is voice data
  • at least one of the frequency and intensity of the identification target sound in the voice data is the identification target information.
  • the classifier generated by the generation method according to the present embodiment can estimate and extract the sound to be identified in the sound data including noise, which is difficult for the user to extract.
  • the sound data is the voice data of a plurality of speakers
  • the voice data of at least one speaker can be used as the identification target information.
  • the data is text data
  • at least one of the characters of the identification target character and the character string in the text data is the identification target information.
  • the classifier generated by the generation method according to the present embodiment can estimate and extract a character string to be identified in text data, which is difficult for the user to extract.
  • the amount of identification target information contained in the training data set is the value obtained by dividing the total amount of identification target information contained in the training data set by the number of training data contained in the training data set (average value).
  • the learning data is a pair of input data and teacher data, and the learning data set includes a plurality of learning data.
  • the amount of identification target information included in the learning data set is, for example, the area of the identification target area in the image.
  • the area of the identification target area in the image can be calculated from the number of pixels.
  • the data is voice data, it is the length of the identification target information in the data separated by voice breaks.
  • the initial data set may be a collection of data in which the input data and the teacher data are separated by audio breaks, etc., and the data sets may be sorted in descending order of the difference between the input data and the teacher data signals. ..
  • the initial data set may be a collection of data in which the input data and the teacher data are separated by a break in a sentence, and the data may be sorted in descending order of the difference between the input data and the teacher data text. ..
  • FIG. 12 is a diagram showing an example of the device configuration of the learning system (identifier generation system) according to the second embodiment.
  • the learning system 2-190 composed of the learning device (identifier generator) 2-100 and each device connected to the learning device 2-100 will be described in detail.
  • the learning system 2-190 includes a learning device 2-100 for learning, a data acquisition device 2-110 for acquiring data, and a data server 2-120 for storing the acquired data.
  • the learning system 2-190 is a data processing device 2-130 that processes data to create teacher data, a display unit 2-140 that displays the acquired input data and the learning result, and instructions from the user. It has an operation unit 2-150 for inputting.
  • the learning device 2-100 acquires a pair (learning data) of the input data and the teacher data created by processing the input data with the data processing device 2-130.
  • the learning data set including the plurality of learning data created in this way is the initial data set.
  • the training data set is acquired from the initial data set and training is performed.
  • the data acquisition device 2-110 in the present embodiment is a transmission electron microscope (TEM: Transmission Electron Microscope), and the input data is a TEM image.
  • TEM Transmission Electron Microscope
  • the learning device 2-100 is, for example, a computer, and performs learning according to the present embodiment.
  • the learning device 2-100 has at least a CPU 2-31, a communication IF2-32, a ROM 2-33, a RAM 2-34, a storage unit 2-35, and a common bus 2-36.
  • the CPU 2-31 integrally controls the operation of each component of the learning device 2-100. By controlling the CPU 2-31, the learning device 2-100 may also control the operations of the data acquisition device 2-110 and the data processing device 2-130.
  • the data server 2-120 holds the data acquired by the data acquisition device 2-110.
  • the data processing device 2-130 processes the input data stored in the database so that it can be used for learning.
  • Communication IF (Interface) 2-32 is realized by, for example, a LAN card.
  • the communication IF2-32 controls communication between the external device (for example, the data server 2-120) and the learning device 2-100.
  • the ROM 2-33 is realized by a non-volatile memory or the like, stores a control program executed by the CPU 2-31, and provides a work area when the program is executed by the CPU 2-31.
  • RAM (Random Access Memory) 2-34 is realized by a volatile memory or the like, and temporarily stores various information.
  • the storage unit 2-35 is realized by, for example, an HDD (Hard Disk Drive) or the like, and includes an operating system (OS: Operating System), a device driver of a peripheral device, and a program for performing learning according to the present embodiment described later. Stores various application software.
  • OS Operating System
  • the operation unit 2-150 is realized by, for example, a keyboard or a mouse, and inputs an instruction from the user into the device.
  • the display unit 2-140 is realized by, for example, a display or the like, and displays various information toward the user.
  • the operation unit 2-150 and the display unit 2-140 provide a function as a GUI (Graphical User Interface) under the control of the CPU 2-31.
  • the display unit 2-140 may be a touch panel monitor that accepts operation input, and the operation unit 2-150 may be a stylus pen.
  • Each component of the learning device 100 is communicably connected to each other by a common bus 2-36.
  • the data acquisition device 2-110 is, for example, a scanning electron microscope (SEM), a transmission electron microscope (TEM), an optical microscope, a digital camera, a smartphone, or the like.
  • the data acquisition device 2-110 transmits the acquired data to the data server 2-120.
  • a data acquisition control unit (not shown) that controls the data acquisition device 2-110 may be included in the learning device 2-100.
  • FIG. 13 is a diagram showing an example of the functional configuration of the learning system according to the second embodiment.
  • the main body that executes the program may be one or more CPUs, and the ROM that stores the program may also be one or more memories.
  • another processor such as GPU (Graphics Processing Unit) may be used instead of the CPU or in combination with the CPU. That is, the functions of the respective parts shown in FIG. 13 are realized by executing a program stored in at least one or more memories in which at least one or more processors (hardware) are communicably connected to the processors.
  • the learning device 2-100 includes a reception unit 2-41, an acquisition unit 2-42, a selection unit 2-43, a learning unit 2-44, a classifier 2-45, a display control unit 2-48, and a display unit 2-140. Have at least.
  • the learning device 2-100 is communicably connected to the data server 2-120 and the display unit 2-140.
  • Reception unit 2-41 accepts data set selection conditions (described later) via operation unit 2-150.
  • Acquisition unit 2-42 acquires the initial data set from the data server 2-120.
  • the selection unit 2-43 processes the initial data set acquired by the acquisition unit 2-42, and selects the first learning data set and the second learning data set.
  • the learning unit 2-44 sequentially executes learning using the first learning data set acquired by the selection unit 2-43 and the second learning data set. That is, by learning using at least the first learning data set, the first learning, and the information contained in the classifier generated in the first learning and the second learning data set. , Perform a second learning to update the information contained in the classifier.
  • the information included in the classifier generated in the first learning is stored in the information storage unit in the classifier.
  • each part of the learning device 2-100 may be realized as an independent device.
  • the learning device 2-100 may be a workstation.
  • the functions of each part may be realized as software that operates on a computer, and the software that realizes the functions of each part may be realized on a server via a network such as a cloud.
  • a network such as a cloud.
  • each part is realized by software running on a computer installed in a local environment.
  • FIG. 14 is a flow chart showing an example of a method for generating a classifier according to the second embodiment.
  • This embodiment is realized by the CPU 2-31 executing a program that realizes the functions of each part stored in the ROM 2-33.
  • the image to be processed will be described as a TEM image.
  • the TEM image is acquired as a two-dimensional shading image.
  • carbon black in the coating film of the melamine / alkyd resin paint will be described as identification target information.
  • 2000 images of size 128 ⁇ 128 and 1000 pairs of initial data sets were used.
  • the learning and evaluation were divided into 8: 2 and used.
  • the learning data set includes the learning data.
  • the learning data is composed of input data and teacher data for the input data.
  • the teacher data is the image data with the identification target information attached. For example, the identification target area is shown in the image data.
  • the correct image is an image obtained by processing the identification target information in the identification target image by using an appropriate image processing method. For example, an image obtained by binarizing the identification target information and other information, or an image filled with the identification target information.
  • the carbon black in the TEM image will be described using an image filled with a luminance value (0,255,0).
  • the reception unit 2-41 receives the data set selection condition via the operation unit 2-150.
  • the dataset selection criteria are entered by the user.
  • the data set selection condition includes at least a method of dividing the initial data set, information on the data set used for training among the divided data sets, and a learning order.
  • a method of classifying the data set a method of classifying by the threshold value of the amount of identification target information is used.
  • the amount of identification target information is defined by the number of pixels filled with the luminance value (0,255,0). Further, here, the threshold value is set to 5000pixel.
  • step S2-202 the acquisition unit 2-42 acquires the initial data set from the data server 2-120.
  • step S2-203 the selection unit 2-43 processes the initial data set acquired by the acquisition unit 2-42, and selects the first learning data set and the second learning data set.
  • step S2-203 the carbon black in the melamine / alkyd resin is used as the identification target information.
  • the data sets are sorted in order from the one with the largest amount of identification target information. That is, the data sets are sorted in descending order of the number of pixels filled with the luminance value (0,255,0).
  • the data set is divided according to the threshold value received by the reception unit 2-41.
  • the learning process is determined according to the information of the data set used for learning received by the reception unit 2-41 and the learning order.
  • a data set containing images having an amount of identification target information of 5000 pixels or more is referred to as a first training data set
  • a data set containing images having an amount of identification target information of 0 pixels or more is referred to as a second learning data set.
  • the second learning data set includes the first learning data set, it is possible to generate a classifier having higher discrimination accuracy.
  • step S2-204 the learning unit 2-44 executes learning using the first learning data set selected by the selection unit.
  • learning refers to generating a classifier by performing machine learning according to a predetermined algorithm using a learning data set.
  • U-Net is used as a predetermined algorithm. Since the learning method of U-Net is a well-known technique, detailed description thereof will be omitted in the present embodiment.
  • SVM Small Vector machine
  • DNN Deep Neural Network
  • CNN Convolutional Neural Network
  • SVM Small Vector machine
  • U-Net Deep Neural Network
  • FCN Full Convolutional Network
  • SegNet and the like can also be used as an algorithm used for Semantic Segmentation that classifies classes in 1-pixel units.
  • GAN Geneative Adversarial Networks
  • the padding in the present embodiment is to generate new data used for learning and increase the amount of data by performing at least one of rotation, inversion, luminance conversion, distortion addition, enlargement, and reduction, for example. .. Inflating data can also be rephrased as data augmentation. Further, when the input data is audio data, it is possible to generate new data used for learning and increase the amount of data by adding a sound combining sounds of one or more kinds of frequencies to the input data. it can.
  • the initial data set into a learning data set and an evaluation data set in advance.
  • step S2-205 the information generated in step S2-204 is stored in the information storage unit 2-46 of the classifier.
  • step S2-206 learning is performed using the information contained in the classifier saved in step S2-205 and the second learning data set.
  • the information contained in the classifier refers to the structure, weight, bias, and the like of the model.
  • the weight and bias are parameters when calculating the output from the input. For example, in the case of a neural network, when x in the equation (2-1) is input, w is the weight and b is the bias.
  • the model structure is not changed, and training is performed so as to optimize the weights and biases for the second training data set.
  • the display control unit 2-48 displays the learning result on the display unit 2-140.
  • the display control unit 2-48 controls to transmit the learning result to the display unit 2-140 connected to the learning device 2-100 and display the learning result on the display unit 2-140.
  • the progress of learning can be confirmed by displaying the input image, the correct answer image, and the image subjected to the inference processing using the generated discriminator side by side. Further, in order to confirm the progress of learning in more detail, the value of IoU (described later) may be displayed.
  • IoU Intersection over Union
  • TP True Positive
  • FP False Positive
  • FN False Negative
  • the learning device 2-100 in the present embodiment sequentially learns from the data set having a large amount of identification target information. Therefore, it is possible to first learn the parameters of image conversion having a large degree of conversion and gradually change the parameters, so that the identification target information can be accurately identified.
  • FIG. 15 is a diagram showing an example of the functional configuration of the learning system (identifier generation system) according to the second embodiment.
  • the learning device (identifier generator) 2-200 includes a reception unit 2-41, an acquisition unit 2-42, a selection unit 2-43, a learning unit 2-44, a discriminator 2-45, and a display control unit 2-48. , Data expansion unit 2-49, and display unit 2-140.
  • the data expansion unit 2-49 expands the initial data set acquired by the acquisition unit 2-42. That is, the data expansion unit 2-49 can increase the number of images of input data.
  • FIG. 16 is a flow chart showing an example of a method for generating a classifier according to the second embodiment.
  • the reception unit 2-41 receives the data set selection condition via the operation unit 2-150.
  • the dataset selection criteria are entered by the user.
  • the data set selection condition is at least the number of data expansions per image in the initial data set, the patch size, the method of dividing the data set, and the data used for training among the divided data sets. Includes set information and learning order.
  • the patch size is the number of vertical and horizontal pixels of the selected image when a part of the image is selected.
  • the method of classifying the data set shall be based on the threshold value of the amount of identification target information.
  • the amount of identification target information is defined by the number of pixels filled with the luminance value (0,255,0). Further, the threshold value is set to two, 5000pixel and 1000pixel.
  • step S2-302 the acquisition unit 2-42 acquires the initial data set from the data server 2-120.
  • step S2-303 the data expansion unit 2-43 expands the initial data set acquired by the acquisition unit 2-42.
  • 2000 input data are generated by cutting out 100 images of patch size 128 ⁇ 128 from each of the initial data sets including 20 images of 40 pairs having a size of 1280 ⁇ 960.
  • learning and evaluation were divided into 8: 2 and used.
  • FIG. 17 is a diagram showing an example of the data expansion processing procedure according to the second embodiment.
  • the process of step S2-303 will be described with reference to FIG.
  • the carbon black in the melamine / alkyd resin is used as the identification target information.
  • the data expansion unit 2-43 expands the data by extracting a plurality of areas of interest with respect to the initial data set.
  • FIG. 17 shows an example in which the area of interest 2-540, the area of interest 2-541, and the area of interest 2-542 are extracted for each of the position coordinates 2-530, the position coordinates 2-531, and the position coordinates 2-532.
  • the input image in this embodiment is composed of a plurality of pixels whose positions can be specified by two-dimensional Cartesian coordinates (x, y). Assuming that the number of pixels in the horizontal direction and the vertical direction of the image is x_size and y_size, respectively, 0 ⁇ x ⁇ x_size and 0 ⁇ y ⁇ y_size hold.
  • the size of the area of interest should be equal to the patch size. Further, (x- i, y i - ) is positioned near the end of the image, when the size of the region of interest becomes smaller than the patch size may fill the periphery of the image pixel values 0, so-called padding process By doing something like this, adjust the size of the area of interest so that it is the same as the patch size.
  • step S2-304 the selection unit 2-43 processes the initial data set acquired by the acquisition unit 2-42, and selects the first learning data set and the second learning data set.
  • step S2-304 The process of step S2-304 will be described.
  • the data sets are sorted in order from the one with the largest amount of identification target information. That is, the data sets are sorted in descending order of the number of pixels filled with the luminance value (0,255,0).
  • the data set is divided according to the threshold value received by the reception unit 2-41.
  • the learning process is determined according to the information of the data set used for learning received by the reception unit 2-41 and the learning order.
  • a data set containing an image having an amount of identification target information of 5000 pixels or more is referred to as a first training data set
  • a data set containing an image having an amount of identification target information of 1000 pixels or more is referred to as a second learning data set. To do.
  • a data set including an image in which the amount of identification target information is 0pixel or more may be used as a third learning data set for further learning.
  • the second training data set includes the first training data set
  • the third training data set includes the first training data set and the second training data set. .. This makes it possible to generate a classifier with higher discrimination accuracy.
  • step S2-305 the learning unit 2-44 executes learning using the first learning data set selected by the selection unit.
  • step S2-306 the information generated in step S2-305 is stored in the information storage unit 2-46 of the classifier.
  • step S2-307 the information contained in the classifier is obtained by learning using the information stored in the classifier stored in the information storage unit in step S2-306 and the second learning data set.
  • the information contained in the classifier refers to the structure, weight, bias, and the like of the model.
  • further learning may be performed using the information contained in the discriminator generated by the learning using the second learning data set and the third learning data set.
  • the number of data sets may be large, the amount of identification target information preferably decreases monotonically as n increases when the number of learning steps is n (n is an integer of 2 or more). .. That is, it is preferable that the slope when plotting the amount of identification target information with respect to the number of learnings is negative.
  • the display control unit 2-48 causes the display unit 2-140 to display the learning result.
  • the learning device 2-100 in the present embodiment can accurately identify the identification target information by sequentially learning from the data set having a large amount of identification target information.
  • the data set is automatically selected, and the learning process is repeated until the evaluation value reaches the target value.
  • the red blood cell part in the image is filled with the brightness value (255,0,0)
  • the white blood cell part is filled with the brightness value (0,255,0)
  • the platelet part is filled with the brightness value (0,0,255).
  • 2000 images of size 128 ⁇ 128 and 1000 pairs of initial data sets were used.
  • the learning and evaluation were divided into 8: 2 and used.
  • FIG. 18 is a diagram showing an example of input data according to the second and third embodiments.
  • FIG. 19 is a diagram showing an example of the functional configuration of the learning system (identifier generation system) according to the second to third embodiments.
  • the learning device (identifier generator) 2-300 has a functional configuration of a reception unit 2-41, an acquisition unit 2-42, a selection unit 2-43, a learning unit 2-44, a discriminator 2-45, and an evaluation unit. It has at least 2-50, a display control unit 2-48, and a display unit 2-140.
  • Evaluation unit 2-50 makes inferences using the discriminator stored in discriminator 2-45, ends learning when the value of IoUavg is higher than the target value, and ends learning when the value of IoUavg is lower than the target value. , Repeat the learning process.
  • FIG. 20 is a flow chart showing an example of a method for generating a classifier according to the second and third embodiments.
  • the reception unit 2-41 receives the data set selection condition via the operation unit 2-150.
  • the dataset selection criteria are entered by the user.
  • the selection condition includes at least the target value of IoU, the upper limit learning time, and the initial value of the class width.
  • the method of classifying the initial data set is to classify the initial data set according to the amount of identification target information.
  • the initial value of the class width is 1000.
  • step S2-402 the acquisition unit 2-42 acquires the initial data set from the data server 2-120.
  • step S2-403 the selection unit 2-43 processes the initial data set acquired by the acquisition unit 2-42, and selects the first learning data set and the second learning data set.
  • the selection unit 2-43 divides the initial data set into classes according to the initial value of the width of the class received by the reception unit 2-41.
  • the data set belonging to the class having the largest amount of identification target information is set as the first learning data set, and belongs to the class having the largest amount of identification target information and the class having the second largest amount of identification target information.
  • the combined data set is used as the second training data set.
  • step S2-404 the learning unit 2-44 executes learning using the first learning data set selected by the selection unit.
  • step S2-405 the information generated in step S2-404 is stored in the information storage unit 46.
  • step S2-406 the information contained in the classifier is obtained by learning using the information stored in the classifier stored in the information storage unit in step S2-405 and the second learning data set. Update.
  • the information contained in the classifier refers to the structure, weight, bias, etc. of the model.
  • step S2-407 the evaluation unit 2-50 makes an inference using the classifier 2-45, ends learning when the value of IoUavg is higher than the target value, and ends learning when the value of IoUavg is lower than the target value. , Repeat the learning process.
  • the display control unit 2-48 causes the display unit 2-140 to display the learning result.
  • mIoU is defined by equation (2-3).
  • IoUavg 0.08
  • IoUavg 0.45.
  • identification is performed. It is possible to provide a method of generating a classifier that can accurately identify target information. Further, according to the present invention, it is possible to provide an identification method and an identification device using the identification device generated by the identification device generation method capable of accurately identifying the identification target information.
  • the learning device and the learning system in each of the above-described embodiments may be realized as a single device, or may be a form in which devices including a plurality of information processing devices are combined so as to be able to communicate with each other to execute the above-mentioned processing. Both are included in the embodiments of the present invention.
  • the above-mentioned processing may be executed by a common server device or a group of servers.
  • the common server device corresponds to the learning device according to the embodiment
  • the server group corresponds to the learning system according to the embodiment.
  • the learning device and the plurality of devices constituting the learning system need not be present in the same facility or in the same country as long as they can communicate at a predetermined communication rate.
  • the present invention can take an embodiment as, for example, a system, an apparatus, a method, a program, or a recording medium (storage medium). Specifically, it may be applied to a system composed of a plurality of devices (for example, a host computer, an interface device, an imaging device, a Web application, etc.), or it may be applied to a device composed of one device. good.
  • a system composed of a plurality of devices (for example, a host computer, an interface device, an imaging device, a Web application, etc.), or it may be applied to a device composed of one device. good.
  • a recording medium (or storage medium) in which a software program code (computer program) that realizes the functions of the above-described embodiment is recorded is supplied to the system or device.
  • the storage medium is a computer-readable storage medium.
  • the computer or CPU or GPU
  • the program code itself read from the recording medium realizes the function of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention.
  • the above document 2-1 describes a diagnostic support device that supports diagnosis of a diseased area by using deep learning. This technique performs highly accurate diagnosis by normalizing the color brightness of an image in advance and separating the diseased part and the non-diseased part.
  • the above-mentioned Document 2-2 discloses a technique for accurately identifying a nodule from a nodule candidate image by connecting a plurality of classifiers and learning while removing a sample that is clearly normal. Connecting a plurality of classifiers in this way is called a cascade type classifier, and is a technique often used to improve the discrimination accuracy.
  • identification target areas areas to be identified
  • identification target areas areas to be identified
  • the method of generating the classifier according to the present embodiment is a method of generating the classifier for estimating the identification target information in the data. Specifically, a padding step (S3-102) in which training data is padded with respect to the training data set group, and a generation step (S3-102) in which a classifier is generated by performing training using the padded learning data set group. S3-103) and at least (FIG. 21).
  • the training data set group includes at least the first training data set and the second training data set.
  • the first and second training data sets include training data.
  • the learning data is composed of input data and teacher data for the input data.
  • the second training data set contains a larger number of training data than the first training data set.
  • the amount of identification target information contained in the input data included in the first learning data set is larger than the amount of identification target information contained in the input data included in the second learning data set.
  • the present inventors have found that if the first learning data set and the second learning data set are trained without going through the padding step, the identification target information cannot be accurately identified. It is considered that this is due to the small amount of identification target information of the input data included in the second learning data set. That is, it was found that when learning is performed with input data in which the amount of identification target information is small, inference without the identification target information tends to be performed even if the inference data includes the identification target information. .. Therefore, the training data of the first training data set having the input data having a large amount of identification target information is inflated, and the number of training data included in the first training data set is included in the second training data set. Make sure that the number of training data is greater than or equal to the number of training data. By doing so, the amount of input data having a large amount of identification target information increases, and the identification target information can be accurately identified.
  • learning data set group reception process (S3-101) may be provided.
  • the data is an expression of information, which is formalized to be suitable for transmission, interpretation or processing, and can be reinterpreted as information.
  • Examples of data include image data, sound data (voice data, etc.), text data, and the like.
  • the input data is input image data, sound input data, and input text data.
  • the identification target information is information to be identified in the data.
  • the data is image data
  • at least one piece of information on the position, area, and distribution of the identification target area in the image data is the identification target information.
  • the classifier generated by the generation method according to the present embodiment can estimate and extract the identification target area in the image data, which is difficult to extract visually by the user.
  • the amount of identification target information can be the number of pixels included in the identification target area.
  • the data is sound data
  • at least one of the frequency and intensity of the sound to be identified (identification target sound) in the sound data is the identification target information.
  • the classifier generated by the generation method according to the present embodiment can estimate and extract the sound to be identified in the noise-containing sound data, which is difficult for the user to extract.
  • the sound data is the voice data of a plurality of speakers
  • the voice data of at least one speaker can be used as the identification target information.
  • the information of the character, the character string, and the number of the identification target characters in the text data is the identification target information.
  • the classifier generated by the generation method according to the present embodiment can estimate and extract a character string to be identified in text data, which is difficult for the user to extract.
  • the learning data in the present embodiment is learning data for generating a discriminator, and is composed of input data and teacher data for the input data.
  • the input data is image data (input image data)
  • the teacher data is the image data with the identification target information attached. For example, the identification target area is shown in the image data.
  • the amount of identification target information contained in the input data is, for example, the ratio of the identification target area to the image data when the input data is image data. That is, a large amount of identification target information means that, for example, when the input data is image data, the ratio of the identification target region to the image data is large. Further, when the input data is sound data, a large amount of identification target information means that the intensity of the identification target sound in the sound data is large, or the sound data is voice data of a plurality of speakers. In the case of, it means that the number of speakers to be extracted is large.
  • a large amount of identification target information means, for example, a large number of characters or character strings to be identified in the text data.
  • the training data set in the present embodiment includes the above-mentioned training data.
  • the number of training data contained in the second training data set is larger than the number of training data contained in the first training data set.
  • the learning data set group in the present embodiment includes at least a first learning data set and a second learning data set.
  • the training data set group may include three or more training data sets.
  • the data padding is to generate new input data and increase the number of input image data by performing at least one of rotation, inversion, luminance conversion, distortion addition, enlargement, and reduction, for example. That is. Inflating data can also be rephrased as data augmentation.
  • the input data is sound, it is possible to generate new sound input data and inflate it by adding a sound that is a combination of sounds of one type or a plurality of types to the input data.
  • the classifier generator is a classifier generator for estimating identification target information in data. Specifically, the inflated unit 3-22 that inflates the learning data for the learning data set group and the generation unit 3-23 that generates a classifier by performing learning using the inflated learning data set group. And at least (Fig. 22).
  • the training data set group includes at least the first training data set and the second training data set.
  • the first and second training data sets include training data.
  • the learning data is composed of input data and teacher data for the input data.
  • the second training data set contains a larger number of training data than the first training data set.
  • the amount of identification target information contained in the input data included in the first learning data set is larger than the amount of identification target information contained in the input data included in the second learning data set.
  • the generation device can be configured such that the acquisition unit 3-21 acquires the learning data set group by operating the operation unit 3-150. Further, the generator according to the present embodiment can be configured to send and receive data to and from the data server 3-120.
  • the classifier according to the present embodiment is generated by the generation method and the generation device according to the present embodiment.
  • the discriminator generated by the generation method and the generation device according to the present embodiment can accurately infer the identification target information included in the input inference data.
  • the information processing apparatus includes the above-mentioned classifier, and has an inference unit that infers the identification target information included in the data for inference using the classifier.
  • the information processing method includes the above-mentioned classifier and has an inference step of inferring the identification target information included in the inference data using the classifier.
  • the area identification system 3-190 has a data input device 3-110 that captures an image for learning, and a data server 3-120 that stores the captured image. Then, the user identifies the area of the image, and has a data processing device 3-130 for coloring the identified area and a classifier learning device 3-100 for learning the classifier. Then, it has a display unit 3-140 for displaying the learning result and the frequency distribution, and an operation unit 3-150 for the user to input an operation instruction of the discriminator learning device.
  • the classifier learning device 3-100 acquires a learning input image and a learning correct answer image at the time of learning, learns them, and outputs a learned model.
  • inference can be performed using the classifier generated by the classifier learning device 3-100.
  • an input image for inference is acquired, the generated trained model is used, an identification area in the input image is extracted, the entire area or its boundary is colored with a certain color, and the image is output as an inferred image. be able to.
  • the classifier learning device 3-100 has at least 3-CPU31, communication IF3-32, ROM3-33, RAM3-34, storage unit 3-35, and common bus 3-36.
  • the CPU 3-31 integrally controls the operation of each component of the classifier learning device 3-100.
  • the classifier learning device 3-100 may also control the operation of the data input device 3-110.
  • the data server 3-120 holds an image taken by the data input device 3-110.
  • Communication IF (Interface) 3-32 is realized by, for example, a LAN card.
  • the communication IF3-32 controls communication between the external device (for example, the data server 3-120) and the classifier learning device 3-100.
  • the ROM 3-33 is realized by a non-volatile memory or the like, stores a control program executed by the CPU 3-31, and provides a work area when the program is executed by the CPU 3-31.
  • the RAM (Random Access Memory) 3-34 is realized by a volatile memory or the like, and temporarily stores various information.
  • the storage unit 3-35 is realized by, for example, an HDD (Hard Disk Drive) or the like. Then, various application software including an operating system (OS: Operating System), a device driver of a peripheral device, and a program for identifying an area according to the present embodiment described later are stored.
  • the operation unit 3-150 is realized by, for example, a keyboard, a mouse, or the like, and inputs an instruction from the user into the device.
  • the display unit 3-140 is realized by, for example, a display or the like, and displays various information toward the user.
  • the operation unit 3-150 and the display unit 3-140 provide a function as a GUI (Graphical User Interface) under the control of the CPU 3-31.
  • the display unit 3-140 may be a touch panel monitor that accepts operation input, and the operation unit 3-150 may be a stylus pen.
  • Each of the above components is communicably connected to each other by a common bus 3-36.
  • the data input device 3-110 is, for example, a scanning electron microscope (SEM), a transmission electron microscope (TEM: Transmission Electron Microscope), an optical microscope, a digital camera, a smartphone, or the like.
  • SEM scanning electron microscope
  • TEM Transmission Electron Microscope
  • the data input device 3-110 transmits the acquired image to the data server 3-120.
  • An imaging control unit (not shown) that controls the data input device 3-110 may be included in the classifier learning device 3-100.
  • the functional configuration of the area identification system including the classifier learning device 3-100 according to the present embodiment will be described with reference to FIG. 24.
  • the main body that executes the program may be one or more CPUs, and the ROM that stores the program may also be one or more memories.
  • another processor such as GPU (Graphics Processing Unit) may be used instead of the CPU or in combination with the CPU. That is, the functions of the respective parts shown in FIG. 24 are realized by executing a program stored in at least one or more memories in which at least one or more processors (hardware) are communicably connected to the processors.
  • the classifier learning device 3-100 has a functional configuration of a reception unit 3-41, an acquisition unit 3-42, a frequency distribution calculation unit 3-44, a data expansion unit 3-45, a learning unit 3-46, and a storage unit 3-47. , And display control unit 3-48. Further, it may have an extraction unit 3-43.
  • the classifier learning device 3-100 is communicably connected to the data server 3-120 and the display unit 3-140.
  • the reception unit 3-41 receives the data expansion condition input from the user via the operation unit 3-150. That is, the operation unit 3-150 corresponds to an example of the reception means for setting the extended condition, patch size (described later), and receiving.
  • the expansion condition includes at least one of a frequency distribution (described later), a number of bins, a bin width, and an augmentation method (described later).
  • a bin is an interval or class in which each is relatively prime in a frequency distribution (histogram).
  • the acquisition unit 3-42 acquires a plurality of learning data (which can also be called a learning data pair) composed of a learning input image and a learning correct answer image from the data server 3-120.
  • the extraction unit 3-43 When the extraction unit 3-43 has, a plurality of small area (data block) pairs are extracted from each of the learning input image and the learning correct answer image based on the patch size received by the reception unit 3-41. ..
  • the frequency distribution calculation unit 3-44 determines the area or the number of pixels of the extraction region for each of a plurality of correct answer images for learning or, if there is an extracted data block group, the data block group extracted from the correct answer image for learning. calculate. Further, using the number of bins and the width of the bins received by the reception unit 3-41, a frequency distribution is created with the calculated area or the number of pixels as the characteristic value.
  • the data expansion unit 3-45 expands the data of the learning input image and the learning correct answer image based on the created frequency distribution and the instruction to execute the augmentation received by the reception unit 3-41.
  • Learning unit 3-46 learns based on the above teacher data and creates a learned model.
  • the storage unit 3-47 stores the trained model.
  • the display control unit 3-48 uses the display unit 3-140 to output information on the frequency distribution and the learning result.
  • the start command of the inference operation input from the user is received via the operation unit 3-150.
  • Acquisition unit 3-42 acquires an inference image from the data server 3-120.
  • the inference unit (not shown) makes inferences based on teacher data 3-49. Subsequently, the display control unit 48 outputs the inference result using the display unit 3-140.
  • each part of the classifier learning device 3-100 may be realized as an independent device.
  • the classifier learning device 3-100 may be a workstation.
  • the functions of each part may be realized as software that operates on a computer, and the software that realizes the functions of each part may be realized on a server via a network such as a cloud.
  • a network such as a cloud.
  • each part is realized by software running on a computer installed in a local environment.
  • FIG. 25 is a diagram showing a processing procedure of processing executed by the classifier learning device 3-100 of the present embodiment. This embodiment is realized by the CPU 3-31 executing a program that realizes the functions of each part stored in the ROM 3-33.
  • the image to be processed will be described as a TEM image.
  • the TEM image is acquired as a two-dimensional shading image.
  • the identification target included in the image will be described as an example of the processing target object included in the processing target image.
  • step S3-201 the reception unit 3-41 receives the data expansion condition input by the user in the operation unit 3-150.
  • the data expansion condition in the present embodiment includes at least one of the number of bins, the width of the bins, and the augmentation method regarding the frequency distribution to be created.
  • step S3-202 the acquisition unit 3-42 acquires a learning data pair consisting of a learning input image and a learning correct answer image from the data server 3-120.
  • a learning data pair consisting of a learning input image and a learning correct answer image from the data server 3-120.
  • the same image pair can be used except that the entire extraction region portion or the boundary portion is colored.
  • step S3-202b a small area (data block) pair is extracted from the learning input image and the learning correct answer image according to the patch size.
  • the patch size is the number of pixels in the vertical and horizontal directions of the cropped image when a part of the target image is cropped.
  • Each pair of extracted data blocks is extracted from the same coordinates on the image.
  • step S3-203 the frequency distribution calculation unit 3-44 for a plurality of learning correct answer images, and for the data block group extracted from the learning correct answer image when the extraction unit 3-43 is provided, respectively.
  • the area of the extraction area is calculated, and a frequency distribution is created using this area value as a characteristic value.
  • step S3-204 the data expansion unit 3-45 expands the data of the learning input image and the learning correct answer image based on the frequency distribution and the instruction to execute the augmentation received by the reception unit 3-41.
  • a method called augmentation such as inversion, enlargement, reduction, distortion addition, and brightness change is used to increase the input image for learning and the correct answer image for learning so that they are included in the same frequency distribution. ..
  • teacher data is generated in which the frequency of the bin containing a large amount of the identification target area is higher than the frequency of the bin containing a smaller identification target area.
  • Augmentation executes, for example, rotation, inversion, enlargement, reduction, etc., and each process can be performed as follows. That is, a blank image (white) having a size 10 times the length and width of the patch size is prepared in advance, and the image to be rotated is arranged in the center portion thereof. Next, the affine transformation is performed at each coordinate according to Eq. (3-1) and Table 1. In the equation, x and y indicate the coordinates before conversion, and x'and y'indicate the coordinates after conversion. Further, in a normal case, ⁇ may be set between 30 ° and 330 ° in terms of rotation angle. Further, a and d are enlargement / reduction ratios in the vertical and horizontal directions, respectively, and are usually set between 0.1 and 10. Next, the center is cut out at the patch size to make the image after augmentation.
  • an arbitrary value is added to the x-coordinate and translated, but this arbitrary value is changed according to the y-coordinate.
  • the maximum value of any value is usually preferably between 20% and 60% of the length of the patch size in the X direction.
  • gamma correction can be used as an example of changing the brightness.
  • the gamma value at this time is usually 1.2 or more or 1 / 1.2 or less.
  • linear interpolation processing may be performed on the augmentation image. This makes it possible to smooth out a mosaic-like jagged image.
  • the learning unit 3-46 generates a trained model 3-49 by performing machine learning according to a predetermined algorithm using the learning teacher data.
  • U-Net As a predetermined algorithm, for example, SVM (Support Vector machine), DNN (Deep Neural Network), CNN (Convolutional Neural Network), or the like may be used.
  • SVM Small Vector machine
  • DNN Deep Neural Network
  • CNN Convolutional Neural Network
  • FCN Full Convolutional Network
  • SegNet and the like can also be used as an algorithm used for Semantic Segmentation that classifies classes in 1-pixel units.
  • an algorithm in which a so-called generative model such as GAN (Generative Adversarial Networks) is combined with the above algorithm may be used.
  • step S3-206 the storage unit 3-47 stores the trained model.
  • step S3-207 the display control unit 3-48 uses the display unit 3-140 to output information related to the frequency distribution and learning.
  • step S3-201 the same processing as in step S3-201 (not shown) is performed except that the information received by the reception unit is an inference start command instead of the data extension condition.
  • step S3-202 a process (not shown) similar to step S3-202 is performed except that the inference input data is acquired from the acquisition unit instead of the learning data pair.
  • the area is inferred using the same algorithm as the learning process by using the learning data and the inference input data.
  • step S3-207 the same non-illustrated process as in step S3-207 is performed except that the inference result is output instead of the information related to the frequency distribution and learning.
  • the inference accuracy can be improved by the above processing.
  • the data handled in the third embodiment can be audio data instead of an image, and the input device can be a microphone. Further, by supporting voice data such as using the difference amount between the learning input data and the learning correct answer data instead of the area, it can be used for voice processing such as speaker identification and noise cancellation in the voice data.
  • noise cancellation it is possible to identify which voice component in all voices is unnecessary voice, that is, noise by the same method.
  • this classifier it is possible to clarify the noise by eliminating it from all the voice.
  • the processing content is the same as that of the third embodiment except that the data expansion method is to increase / decrease the volume, frequency, and speed.
  • the classifier learning device and the area identification system in each of the above-described embodiments may be realized as a single device, or as a mode in which devices including a plurality of information processing devices are combined so as to be able to communicate with each other to execute the above-described processing. Also, both are included in the embodiments of the present invention.
  • the above-mentioned processing may be executed by a common server device or a group of servers.
  • the common server device corresponds to the classifier learning device according to the embodiment
  • the server group corresponds to the area identification system according to the embodiment.
  • the classifier learning device and the plurality of devices constituting the area identification system need only be able to communicate at a predetermined communication rate, and do not need to exist in the same facility or in the same country.
  • the present invention can take an embodiment as, for example, a system, an apparatus, a method, a program, a recording medium (storage medium), or the like. Specifically, it may be applied to a system composed of a plurality of devices (for example, a host computer, an interface device, an imaging device, a Web application, etc.), or it may be applied to a device composed of one device. good.
  • a recording medium (or storage medium) in which a software program code (computer program) that realizes the functions of the above-described embodiment is recorded is supplied to the system or device.
  • the storage medium is a computer-readable storage medium.
  • the computer or CPU or GPU
  • the program code itself read from the recording medium realizes the function of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention.
  • Example 1 the classifier learning device of the embodiment of the present invention was used to grasp the amount of magenta pigment in the cross-sectional TEM image of the color toner.
  • Toner preparation A pulverized toner containing a magenta pigment was obtained according to a conventional method. As a method for obtaining the pulverized toner, the methods described in JP-A-2010-140062 and JP-A-2003-233215 can be used.
  • FIG. 26 shows an example of a learning input image cut out from the TEM image of the toner.
  • FIG. 26 shows an example of a colored correct answer image for learning.
  • the batch size was 128 ⁇ 128, and 100 small areas (data blocks) at the same position for each of the learning input image and the learning correct answer image were created for a total of 1800 pairs.
  • data expansion such as rotation, inversion, enlargement, reduction, distortion addition, and brightness change was performed under the conditions shown in Table 2, and learning was performed to create a classifier.
  • Example 2 In this example, the same as in Example 1 except for the data expansion condition, the same TEM image was used to learn for measuring the amount of magenta pigment in the toner, and a classifier was created. As shown in Table 1, the data expansion conditions are such that the larger the number of pixels in the target area, the higher the frequency.
  • Comparative Example 1 Comparative Example 1
  • a discriminator was created by learning for measuring the amount of magenta pigment in the toner using the same TEM image as in Example 1 except that the data was not expanded.
  • the classifier learning device of the embodiment of the present invention is used to identify the area of the automobile part in order to measure the number of cars in the city from the aerial photograph.
  • the image used is https: // gdo152. llnl. Four aerial photographs of Potsdam City obtained from gov / cowc / (as of October 2019).
  • data expansion such as rotation, inversion, enlargement, reduction, distortion addition, and brightness change was performed under the conditions shown in Table 3, and learning was performed to create a classifier.
  • Comparative Example 2 In this comparative example, a classifier was created by learning to measure the number of cars in the city from the same aerial photograph as in Example 3 except that the data was not expanded.
  • IoU Intersection over Union
  • TP True Positive
  • FP False Positive
  • FN FalseNegative
  • the IoU value of the example according to the embodiment of the present invention is larger than the IoU value of the comparative example, and it can be confirmed that the identification accuracy is improved. That is, in the classifier generated by the generation method according to the above embodiment, since the input data having a large number of pixels in the magenta pigment region and the automobile portion region increases, the identification target information (magenta pigment region and automobile portion) It was found that the area) can be identified with high accuracy.
  • a classifier with high inference accuracy can be generated.
  • the fourth embodiment of the present invention is a combination of the first embodiment, the second embodiment, and the third embodiment of the present invention.
  • the effect of further improving the identification accuracy can be obtained.
  • An example of the fourth embodiment will be described in detail with reference to the drawings. The description of the configuration, function, and operation similar to those of the first to third embodiments will be omitted, and the differences from the above embodiments will be mainly described.
  • the image to be processed is a TEM image
  • the TEM image is acquired as a two-dimensional shading image.
  • carbon black in the coating film of the melamine / alkyd resin paint will be described as an example of the object to be identified.
  • the initial data set including 25 images and 50 pairs of images having a size of 1280 ⁇ 960
  • 40 pairs of 20 images were used for learning and 10 pairs of 5 images were used for evaluation.
  • 2000 input data were generated by cutting out 100 images each having a patch size of 128 ⁇ 128 from the learning image.
  • the value of the maximum area / minimum area of the identification object in one cropped image was 30 to 120, and the amount of the identification object in one cropped image was 0pixel to 16384pixel.
  • IoUavg obtained by calculating the value of IoU for each image for evaluation and averaging was used.
  • the learning processing portion is the same as that of the 3-1 embodiment, and the inference processing portion is the same as that of the 1-1 embodiment.
  • the data of the image having a low power is expanded to the same power as the image having a high power, and then the image of the second embodiment is 2-1.
  • the learning was performed in two stages.
  • the inference processing part is the same as that of the first embodiment.
  • Table 6 shows a list of processing contents, identification targets, and evaluation values of each of the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

An image processing device that acquires information about a specific region of an image on the basis of inference. The image processing device has an information acquisition means that acquires the information about the specific region as inferred by inputting information about a plurality of regions of interest that have been extracted from the image on the basis of prescribed inference conditions into a trained model. The plurality of regions of interest include a first region of interest and a second region of interest. The first region of interest and the second region of interest each have a region that overlaps the other and a region that does not overlap the other.

Description

画像処理装置、画像処理装置の制御方法、識別器の生成方法、識別方法、識別装置、識別器の生成装置、及び識別器Image processing device, image processing device control method, classifier generation method, discrimination method, classifying device, classifier generating device, and classifying device
 本発明は、画像処理装置、画像処理装置の制御方法、データ中の識別対象情報を識別するための識別器の生成方法、前記識別器の生成方法により生成された識別器を用いた識別方法、識別装置、識別器の生成方法、識別器の生成装置、及び識別器に関する。 The present invention relates to an image processing device, a control method of the image processing device, a method of generating a classifier for identifying identification target information in data, a method of discriminating using the classifier generated by the method of generating the classifier, and the like. The present invention relates to a discriminator, a method of generating a discriminator, a discriminator generator, and a discriminator.
 近年、深層学習を用いて画像を処理し、有用な情報を得る試みが数多く行われている。主な処理の種類としては、画像分類、物体検知、セグメンテーションなどが知られている。セグメンテーションは、領域毎にその画素が属するクラス(分類)を特定する処理であり、医用画像を用いた診断やインフラ検査、各種粒子解析等に用いられている。 In recent years, many attempts have been made to process images using deep learning to obtain useful information. Known types of processing include image classification, object detection, and segmentation. Segmentation is a process for specifying the class (classification) to which the pixel belongs for each region, and is used for diagnosis using medical images, infrastructure inspection, various particle analysis, and the like.
 特許文献1には、医用画像から対象となる異常陰影(以下、対象異常陰影)の領域と特徴量を取得することによって、対象異常陰影の良悪性を鑑別する技術が記載されている。この技術は、医用画像中の注目部位から、互いに異なる複数の位置座標を用いて注目領域を抽出し、学習を行うことで、医師の作業によるばらつきがあっても精度よく鑑別診断を行うものである。このように学習に使うデータを増やして多様性を与えることを「水増し(Data Augmentation)」と言い、推論結果の精度向上のためにしばしば行われるテクニックである。 Patent Document 1 describes a technique for distinguishing between benign and malignant of the target abnormal shadow by acquiring the region and the feature amount of the target abnormal shadow (hereinafter referred to as the target abnormal shadow) from the medical image. This technology extracts the region of interest from the region of interest in the medical image using multiple position coordinates that are different from each other, and performs learning to perform differential diagnosis with high accuracy even if there are variations due to the work of the doctor. is there. Increasing the data used for learning to give diversity in this way is called "Data Augmentation", and is a technique often used to improve the accuracy of inference results.
特開2019-30584号公報JP-A-2019-30584
 しかし、学習データを水増ししてもなお、推論結果の精度が十分ではないことがあった。 However, even if the learning data was inflated, the accuracy of the inference results was not sufficient in some cases.
 上記課題を解決するための、本発明に係る画像処理装置は、画像中の特定領域の情報を推論に基づいて取得する画像処理装置であって、前記画像から、所定の推論条件に基づいて抽出された複数の注目領域の情報の各々を、学習済みモデルに入力することにより推論された、前記特定領域の情報を取得する情報取得手段を有し、前記複数の注目領域は、第一の注目領域と第二の注目領域とを含み、前記第一の注目領域と前記第二の注目領域とは、各々が、互いに重複する領域と、互いに重複しない領域とを有する。 The image processing apparatus according to the present invention for solving the above problems is an image processing apparatus that acquires information of a specific region in an image based on inference, and extracts information from the image based on a predetermined inference condition. It has an information acquisition means for acquiring the information of the specific region, which is inferred by inputting each of the information of the plurality of attention regions to the trained model, and the plurality of attention regions are the first attention. A region and a second region of interest are included, and each of the first region of interest and the second region of interest has a region that overlaps with each other and a region that does not overlap with each other.
 本発明に係る画像処理装置の制御方法は、画像中の特定領域の情報を推論に基づいて取得する画像処理装置の制御方法であって、前記画像から、所定の推論条件に基づいて抽出された複数の注目領域の情報の各々を、学習済みモデルに入力することにより推論された、前記特定領域の情報を取得する情報取得工程を有し、前記複数の注目領域は、第一の注目領域と第二の注目領域とを含み、前記第一の注目領域と前記第二の注目領域とは、各々が、互いに重複する領域と、互いに重複しない領域とを有する画像処理装置の制御方法。 The control method of the image processing device according to the present invention is a control method of an image processing device that acquires information of a specific region in an image based on inference, and is extracted from the image based on a predetermined inference condition. It has an information acquisition step of acquiring the information of the specific area, which is inferred by inputting the information of the plurality of areas of interest into the trained model, and the plurality of areas of interest are the first areas of interest. A control method for an image processing apparatus including a second attention region, wherein the first attention region and the second attention region each have a region that overlaps with each other and a region that does not overlap with each other.
 別の本発明は、データ中の識別対象情報を識別するための識別器の生成方法であって、前記データから作成した学習用データを複数含む初期データセットのうちの、第1の学習用データセットを用いて学習する第1の学習工程と、前記第1の学習工程で学習することで生成された第1の識別器に含まれる情報と、前記初期データセットのうちの、第2の学習用データセットを用いて学習することで、前記識別器に含まれる情報を更新する第2の学習工程と、を有し、前記第1の学習用データセットに含まれる前記識別対象情報の量は、前記第2の学習用データセットに含まれる前記識別対象情報の量に比べて、多いことを特徴とする識別器の生成方法に関する。 Another invention is a method for generating a classifier for identifying identification target information in data, and is a first training data in an initial data set including a plurality of learning data created from the data. The first learning step of learning using the set, the information contained in the first classifier generated by learning in the first learning step, and the second learning of the initial data set. The amount of the identification target information included in the first learning data set includes a second learning step of updating the information contained in the classifier by learning using the data set for learning. The present invention relates to a method for generating a classifier, which is characterized in that the amount of the identification target information included in the second learning data set is larger than the amount of the identification target information.
 更に別の本発明に係る生成方法は、データ中の識別対象情報を推定するための識別器の生成方法であって、入力データ、及び前記入力データに対する教師データで構成される学習データを含む第1の学習データセットと、前記第1の学習データセットよりも多い数の前記学習データを含む第2の学習データセットと、を有する学習データセット群に対して、前記第1の学習データセットに含まれる前記学習データの数が、前記第2の学習データセットに含まれる前記学習データの数以上となるように、前記学習データの水増しを行う、水増し工程と、水増しされた学習データを有する前記学習データセット群を用いて前記識別器を生成する生成工程と、を有し、前記第1の学習データセットに含まれる前記入力データが有する前記識別対象情報の量は、前記第2の学習データセットに含まれる前記入力データが有する前記識別対象情報の量よりも多いことを特徴とする。 Yet another generation method according to the present invention is a method for generating a classifier for estimating identification target information in data, and includes input data and learning data composed of teacher data for the input data. For a training data set group having 1 training data set and a second training data set including a larger number of the training data than the first training data set, the first training data set The padding step of padding the training data so that the number of the training data included is equal to or greater than the number of the training data included in the second training data set, and the padded training data having the padded training data. The amount of the identification target information contained in the input data included in the first training data set includes a generation step of generating the classifier using the training data set group, and the amount of the identification target information is the second training data. The input data included in the set is characterized in that it is larger than the amount of the identification target information.
 更に別の本発明に係る生成装置は、データ中の識別対象情報を推定するための識別器の生成装置であって、入力データ、及び前記入力データに対する教師データで構成される学習データを含む第1の学習データセットと、前記第1の学習データセットよりも多い数の前記学習データを含む第2の学習データセットと、を有する学習データセット群に対して、前記第1の学習データセットに含まれる前記学習データの数が、前記第2の学習データセットに含まれる前記学習データの数以上となるように、前記学習データの水増しを行う、水増し手段と、水増しされた学習データを有する前記学習データセット群を用いて前記識別器を生成する生成手段と、を有し、前記第1の学習データセットに含まれる前記入力データが有する前記識別対象情報の量は、前記第2の学習データセットに含まれる前記入力データが有する前記識別対象情報の量よりも多いことを特徴とする。 Yet another generator according to the present invention is a generator for estimating identification target information in data, and includes input data and learning data composed of teacher data for the input data. For a training data set group having 1 training data set and a second training data set including a larger number of the training data than the first training data set, the first training data set The inflated means for inflating the training data and the inflated training data so that the number of the training data included is equal to or greater than the number of the training data included in the second training data set. The amount of the identification target information contained in the input data included in the first training data set includes a generation means for generating the classifier using the training data set group, and the amount of the identification target information is the second training data. The input data included in the set is characterized in that it is larger than the amount of the identification target information.
 本発明に係る画像処理装置によれば、画像中の複数の注目領域の情報の各々を学習済みモデルに入力して推論行うため、画像中の特定領域の情報に関する推論精度を向上することができる。 According to the image processing apparatus according to the present invention, since each of the information of the plurality of areas of interest in the image is input to the trained model to perform inference, the inference accuracy of the information of the specific area in the image can be improved. ..
本発明の第1-1の実施形態に係る画像処理システムの構成の一例を示す図である。It is a figure which shows an example of the structure of the image processing system which concerns on embodiment 1-1 of this invention. 本発明の第1-1の実施形態に係る画像処理システムの構成の一例を示す図である。It is a figure which shows an example of the structure of the image processing system which concerns on embodiment 1-1 of this invention. 本発明の第1-1の実施形態に係る画像処理装置100による処理手順の一例を示す図である。It is a figure which shows an example of the processing procedure by the image processing apparatus 100 which concerns on 1st Embodiment of this invention. 本発明の第1-1の実施形態に係る注目領域の抽出処理の一例を説明する図である。It is a figure explaining an example of the extraction process of the region of interest which concerns on embodiment 1-1 of this invention. 本発明の第1-1の実施形態に係る画像処理システムの効果の一例を示す図である。It is a figure which shows an example of the effect of the image processing system which concerns on 1st Embodiment of this invention. 本発明の第1-2の実施形態に係る注目領域の抽出処理の一例を説明する図である。It is a figure explaining an example of the extraction process of the region of interest which concerns on embodiment 1-2 of this invention. 本発明の第1-2の実施形態に係る画像処理システムの効果の一例を示す図である。It is a figure which shows an example of the effect of the image processing system which concerns on embodiment 1-2 of this invention. 本発明の第1-3の実施形態に係る注目領域の抽出処理の一例を説明する図である。It is a figure explaining an example of the extraction process of the region of interest which concerns on embodiment 1-3 of this invention. 本発明の第1-3の実施形態に係る画像処理システムの効果の一例を示す図である。It is a figure which shows an example of the effect of the image processing system which concerns on embodiment 1-3 of this invention. 本発明の第1-4の実施形態に係る注目領域の抽出処理の一例を説明する図である。It is a figure explaining an example of the extraction process of the region of interest which concerns on embodiment 1-4 of this invention. 本発明の第1-4の実施形態に係る画像処理システムの効果の一例を示す図である。It is a figure which shows an example of the effect of the image processing system which concerns on embodiment 1-4 of this invention. 第2-1の実施形態に係る学習システムの装置構成の一例を示す図である。It is a figure which shows an example of the apparatus configuration of the learning system which concerns on embodiment 2-1. 第2-1の実施形態に係る学習システムの機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of the learning system which concerns on embodiment 2-1. 第2-1の実施形態に係る識別器の生成方法の一例を示すフロー図である。It is a flow chart which shows an example of the generation method of the classifier which concerns on embodiment 2-1. 第2-2の実施形態に係る学習システムの機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of the learning system which concerns on embodiment 2-2. 第2-2の実施形態に係る識別器の生成方法の一例を示すフロー図である。It is a flow chart which shows an example of the generation method of the classifier which concerns on 2-2nd Embodiment. 第2-2の実施形態に係るデータ拡張処理手順の一例を示す図である。It is a figure which shows an example of the data expansion processing procedure which concerns on Embodiment 2-2. 第2-3の実施形態に係る入力データの一例を示す図である。It is a figure which shows an example of the input data which concerns on embodiment 2-3. 第2-3の実施形態に係る学習システムの機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of the learning system which concerns on embodiment 2-3. 第2-3の実施形態に係る識別器の生成方法の一例を示すフロー図である。It is a flow chart which shows an example of the generation method of the classifier which concerns on embodiment 2-3. 本発明の第3の実施形態に係る識別器の生成方法のフローを説明するための図である。It is a figure for demonstrating the flow of the method of generating the classifier which concerns on 3rd Embodiment of this invention. 本発明の第3の実施形態に係る識別器の生成装置の構成を説明するための図である。It is a figure for demonstrating the structure of the generator of the classifier which concerns on 3rd Embodiment of this invention. 本発明の第3-1の実施形態に係る生成装置を備えた生成システムの構成の一例を示す図である。It is a figure which shows an example of the structure of the generation system provided with the generation apparatus which concerns on embodiment 3-1 of this invention. 本発明の第3-1の実施形態に係る生成装置の機能構成の一例を示す図である。It is a figure which shows an example of the functional structure of the generator which concerns on 3rd Embodiment of this invention. 本発明の第3-1の実施形態に係る生成装置100のフローの一例を示す図である。It is a figure which shows an example of the flow of the generation apparatus 100 which concerns on embodiment 3-1 of this invention. 本発明の実施例1に係る学習データの一例を示す図である。It is a figure which shows an example of the learning data which concerns on Example 1 of this invention. 本発明の実施例3に係る学習データの一例の拡大図である。It is an enlarged view of an example of learning data which concerns on Example 3 of this invention.
 以下、図面を参照して、実施形態を例示的に詳しく説明する。ただし、この実施形態に記載されている構成要素はあくまで例示であり、本発明の技術的範囲は、特許請求の範囲によって確定されるのであって、以下の個別の実施形態によって限定されるわけではない。 Hereinafter, the embodiment will be described in detail by way of example with reference to the drawings. However, the components described in this embodiment are merely examples, and the technical scope of the present invention is determined by the scope of claims and is not limited by the following individual embodiments. Absent.
 <<第1の実施形態>>
 本発明の第1の実施形態に係る画像処理装置について、図2、4を参照しながら説明する。本実施形態に係る画像処理装置1-100は、画像中の特定領域の情報を推論に基づいて取得するものである。具体的には、画像(1-500)から、所定の推論条件に基づいて抽出された複数の注目領域(1-540~1-542)の情報の各々を、学習済みモデル1-47に入力することにより推論された、特定領域(1-520)の情報を取得する情報取得手段1-50を有する。そして、複数の注目領域は、第一の注目領域(例えば1-540)と第二の注目領域(例えば1-541)とを含む。
<< First Embodiment >>
The image processing apparatus according to the first embodiment of the present invention will be described with reference to FIGS. 2 and 4. The image processing apparatus 1-100 according to the present embodiment acquires information of a specific region in an image based on inference. Specifically, each of the information of the plurality of attention regions (1-540 to 1-542) extracted from the image (1-500) based on a predetermined inference condition is input to the trained model 1-47. The information acquisition means 1-50 for acquiring the information of the specific area (1-520) inferred by the above is provided. The plurality of attention regions include a first attention region (for example, 1-540) and a second attention region (for example, 1-541).
 まず、画像1-500中の特定領域(1-520)を推論によって抽出するために学習済みモデルを用いる。学習済みモデルは、特定領域が既知の画像を教師データとして学習して得られる。そして、前述の学習済みモデルに対して、画像から抽出した複数の注目領域の情報の各々を入力する。この際、図4に示すように、複数の注目領域は、互いに重複する領域と、互いに重複しない領域とを有するように選択する。それによって、画像中のある領域A(注目領域同士が重複する領域)において複数の推論結果を得られるだけでなく、その領域Aの周辺の領域の推論結果をも得られる。これらの複数の推論結果を用いることによって、推論の精度が上がり、画像中の特定領域に関する情報を正しく得ることができると考えられる。 First, a trained model is used to extract a specific region (1-520) in image 1-500 by inference. The trained model is obtained by training an image whose specific region is known as teacher data. Then, each of the information of the plurality of areas of interest extracted from the image is input to the above-mentioned trained model. At this time, as shown in FIG. 4, the plurality of areas of interest are selected so as to have a region that overlaps with each other and a region that does not overlap with each other. As a result, not only a plurality of inference results can be obtained in a certain region A (area in which the regions of interest overlap each other) in the image, but also inference results in the region around the region A can be obtained. By using these plurality of inference results, it is considered that the accuracy of inference is improved and information on a specific region in the image can be obtained correctly.
 本実施形態における画像は、例えば、第1の材料の像と、前記第1の材料とは異なる第2の材料の像とを含む画像である。この場合、特定領域の情報は、画像中の第2の材料の像の位置、及び第2の材料の像のサイズの少なくともいずれか一つを含む。 The image in the present embodiment is, for example, an image including an image of a first material and an image of a second material different from the first material. In this case, the information in the specific region includes at least one of the position of the image of the second material in the image and the size of the image of the second material.
 なお、第一の注目領域のサイズと第二の注目領域のサイズとが同じであると、学習済みモデルに入力しやすいため好ましい。 It is preferable that the size of the first attention area and the size of the second attention area are the same because it is easy to input to the trained model.
 本実施形態において、注目領域の情報は、画像から抽出された領域の位置、及びサイズの少なくともいずれか一つの情報を含む。 In the present embodiment, the information of the region of interest includes information on at least one of the position and size of the region extracted from the image.
 また、本実施形態に係る画像処理装置は、推論条件の設定を受け付ける受付部1-41をさらに有していてもよい。受付部は、ユーザが操作部1-140を操作することで出される指示を受け付けるものであっても、画像処理装置による自動の指示を受け付けるものであっても、それ以外であってもよい。 Further, the image processing device according to the present embodiment may further have a reception unit 1-41 that accepts the setting of inference conditions. The reception unit may be one that receives an instruction issued by the user operating the operation unit 1-140, one that receives an automatic instruction by the image processing device, or another.
 情報取得手段1-50は、学習済みモデル1-47を取得するモデル取得部1-42を有していても良い。モデル取得部は、学習済みモデルを生成する生成部(不図示)を有し、生成部から学習済みモデルを取得してもよいし、データサーバ1-120から取得してもよい。また、情報取得手段は、受付部で受け付けた推論条件に基づいて画像から複数の注目領域を抽出する抽出部1-43を有していてもよい。さらに、情報取得手段は、抽出部で抽出された複数の注目領域の各々を、学習済みモデルに入力することで複数の推論結果を取得し、複数の推論結果に基づいて、特定領域の情報を取得する情報取得部1-45を有していてもよい。 The information acquisition means 1-50 may have a model acquisition unit 1-42 for acquiring the trained model 1-47. The model acquisition unit has a generation unit (not shown) that generates a trained model, and the trained model may be acquired from the generation unit or may be acquired from the data server 1-120. Further, the information acquisition means may have extraction units 1-43 that extract a plurality of regions of interest from the image based on the inference conditions received by the reception unit. Further, the information acquisition means acquires a plurality of inference results by inputting each of the plurality of attention regions extracted by the extraction unit into the trained model, and obtains information in a specific region based on the plurality of inference results. It may have the information acquisition unit 1-45 to be acquired.
 なお、抽出部は、乱数を用いて複数の注目領域を抽出してもよいし、画像の端から端まで規則的に注目領域を抽出してもよく、両方の手法を用いても良い。 Note that the extraction unit may extract a plurality of areas of interest using random numbers, may extract areas of interest regularly from end to end of the image, or may use both methods.
 推論条件は、例えば、画像の各画素毎に平均して行われる推論の回数、注目領域のうち、特定領域であると推論される回数と注目領域が推論された回数の比の閾値、及び注目領域のサイズのうち少なくとも一つを含む。 The inference conditions are, for example, the number of inferences performed on average for each pixel of the image, the threshold value of the ratio of the number of times the area of interest is inferred to be a specific area to the number of times the area of interest is inferred, and the attention. Includes at least one of the size of the area.
 また、画像が複数の特定領域を含み、複数の特定領域の面積が分布をもつ場合に、本実施形態に係る画像処理装置が好ましく用いられる。また、複数の特定領域の面積の最小値に対する最大値の比が50以上である場合、特に、比が100以上である場合に、本実施形態に係る画像処理装置が好ましく用いられる。 Further, when the image includes a plurality of specific areas and the areas of the plurality of specific areas have a distribution, the image processing apparatus according to the present embodiment is preferably used. Further, when the ratio of the maximum value to the minimum value of the area of the plurality of specific regions is 50 or more, particularly when the ratio is 100 or more, the image processing apparatus according to the present embodiment is preferably used.
 また、本実施形態に係る画像処理装置は、特定領域の情報に基づいて、画像中の特定領域の表示態様と、特定領域以外の表示態様とが異なるように、表示部に表示させる表示制御部をさらに有していてもよい。例えば、図4のように、特定領域1-520を黒とし、それ以外を白、となるように表示することができる。表示態様を異ならせる手段としては、色を変える以外の手段であってもよい。 Further, the image processing apparatus according to the present embodiment is a display control unit that displays on the display unit so that the display mode of the specific area in the image and the display mode other than the specific area are different based on the information of the specific area. May further have. For example, as shown in FIG. 4, it is possible to display the specific areas 1-520 as black and the other areas as white. As a means for changing the display mode, a means other than changing the color may be used.
 本発明の実施形態に係る画像処理装置の制御方法は、前述のように、画像中の特定領域の情報を推論に基づいて取得する画像処理装置の制御方法である。具体的には、画像から、所定の推論条件に基づいて抽出された複数の注目領域の情報の各々を、学習済みモデルに入力することにより推論された、特定領域の情報を取得する情報取得工程を有する。そして、複数の注目領域は、第一の注目領域と第二の注目領域とを含み、第一の注目領域と前記第二の注目領域とは、各々が、互いに重複する領域と、互いに重複しない領域とを有する。 As described above, the control method of the image processing device according to the embodiment of the present invention is the control method of the image processing device that acquires the information of the specific region in the image based on the inference. Specifically, an information acquisition step of acquiring information of a specific region inferred by inputting each of the information of a plurality of regions of interest extracted from an image based on a predetermined inference condition into a trained model. Has. The plurality of areas of interest include a first area of interest and a second area of interest, and the first area of interest and the second area of interest do not overlap with each other. Has an area.
 以下、図面を参照して、実施形態を例示的に詳しく説明する。ただし、この実施形態に記載されている構成要素はあくまで例示であり、本発明の技術的範囲は、以下の個別の実施形態によって限定されるわけではない。 Hereinafter, the embodiment will be described in detail by way of example with reference to the drawings. However, the components described in this embodiment are merely examples, and the technical scope of the present invention is not limited to the following individual embodiments.
 (第1-1の実施形態)
 (概要)
 本発明の第1-1の実施形態に係る画像処理装置は、学習済みモデルを用いて推論過程の処理を行う。推論過程では、ユーザーは、推論条件を設定し、画像処理装置は、推論条件に基づき、推論用画像から複数の注目領域を抽出する。次に、画像処理装置は、複数の注目領域それぞれに対して共通の学習済みモデルを使用して推論を行い、それぞれの推論結果に基づいて最終的な推論結果を算出する。ここで、推論結果とは、例えば物体検知結果やセグメンテーション結果を指す。以下の説明では、透過型電子顕微鏡(TEM)画像上の樹脂の画像を処理対象とする場合について説明するが、本実施形態の適用範囲は、検出対象物、画像取得方法の種類に限定されるものではない。以下、具体的な装置構成、機能構成および処理フローを説明する。
(Embodiment 1-1)
(Overview)
The image processing apparatus according to the first embodiment of the present invention processes the inference process using the trained model. In the inference process, the user sets inference conditions, and the image processing device extracts a plurality of regions of interest from the inference image based on the inference conditions. Next, the image processing device makes inferences using a common trained model for each of the plurality of areas of interest, and calculates the final inference result based on each inference result. Here, the inference result refers to, for example, an object detection result or a segmentation result. In the following description, the case where the resin image on the transmission electron microscope (TEM) image is to be processed will be described, but the scope of application of this embodiment is limited to the detection target and the type of image acquisition method. It's not a thing. Hereinafter, a specific device configuration, functional configuration, and processing flow will be described.
 (装置構成)
 図1に基づいて本発明の第1-1の実施形態に係る画像処理装置及び、画像処理装置1-100と接続される各装置から構成される画像処理システム1-190について説明する。画像処理システム1-190は、画像を撮影する画像撮影装置1-110と、撮影された画像を記憶するデータサーバ1-120と、画像処理を行う画像処理装置1-100を有する。さらに、取得された入力画像及び画像処理結果を表示する表示部1-130と、ユーザからの指示を入力するための操作部1-140を有する。画像処理装置1-100は、入力画像を取得し、当該入力画像に写った注目領域に対して、画像処理を実施する。入力画像とは、例えば画像撮影装置1-110により取得された画像データを、解析に好適な画像とするための画像処理等を施して得られる画像である。また、本実施形態における入力画像は、推論用画像となる。以下、各部について説明する。画像処理装置1-100は、例えばコンピュータであり、本実施形態に係る画像処理を行う。画像処理装置1-100は、少なくともCPU1-31、通信IF1-32、ROM1-33、RAM1-34、記憶部1-35、共通バス1-36を有する。CPU1-31は、画像処理装置1-100の各構成要素の動作を統合的に制御する。
(Device configuration)
An image processing system 1-190 including an image processing device according to an embodiment of 1-1 of the present invention and each device connected to the image processing device 1-100 will be described with reference to FIG. The image processing system 1-190 includes an image capturing device 1-110 for capturing an image, a data server 1-120 for storing the captured image, and an image processing device 1-100 for performing image processing. Further, it has a display unit 1-130 for displaying the acquired input image and the image processing result, and an operation unit 1-140 for inputting an instruction from the user. The image processing device 1-100 acquires an input image and performs image processing on the region of interest reflected in the input image. The input image is, for example, an image obtained by subjecting image data acquired by the image capturing apparatus 1-110 to image processing or the like to obtain an image suitable for analysis. Further, the input image in the present embodiment is an inference image. Each part will be described below. The image processing device 1-100 is, for example, a computer, and performs image processing according to the present embodiment. The image processing device 1-100 has at least a CPU 1-31, a communication IF 1-32, a ROM 1-33, a RAM 1-34, a storage unit 1-35, and a common bus 1-36. The CPU 1-31 integrally controls the operation of each component of the image processing device 1-100.
 CPU1-31の制御により、画像処理装置1-100が画像撮影装置1-110の動作も併せて制御するようにしてもよい。データサーバ1-120は、画像撮影装置1-110が撮影した画像を保持する。通信IF(Interface)1-32は、例えば、LANカードで実現される。通信IF1-32によって、外部装置(例えば、データサーバ1-120)と画像処理装置1-100との間の通信が行われる。ROM1-33は、不揮発性のメモリ等で実現され、CPU1-31が実行する制御プログラムを格納し、CPU1-31によるプログラム実行時の作業領域を提供する。RAM(Random Access Memory)1-34は、揮発性のメモリ等で実現され、各種情報を一時的に記憶する。記憶部1-35は、例えば、HDD(HardDisk Drive)等で実現される。そして記憶部1-35は、オペレーティングシステム(OS:Operating System)、周辺機器のデバイスドライバ、後述する本実施形態に係る画像処理を行うためのプログラムを含む各種アプリケーションソフトウェアを格納する。操作部1-140は、例えば、キーボードやマウス等で実現され、ユーザからの指示を装置内に入力する。表示部1-130は、例えば、ディスプレイ等で実現され、各種情報をユーザに向けて表示する。操作部1-140や表示部1-130は、CPU1-31からの制御によりGUI(Graphical User Interface)としての機能を提供する。表示部1-130は操作入力を受け付けるタッチパネルモニタであってもよく、操作部1-140はスタイラスペンであってもよい。上記の各構成要素は共通バス1-36により互いに通信可能に接続されている。 The image processing device 1-100 may also control the operation of the image capturing device 1-110 by controlling the CPU 1-31. The data server 1-120 holds an image captured by the image capturing device 1-110. Communication IF (Interface) 1-32 is realized by, for example, a LAN card. Communication between the external device (for example, data server 1-120) and the image processing device 1-100 is performed by the communication IF1-32. The ROM 1-33 is realized by a non-volatile memory or the like, stores a control program executed by the CPU 1-31, and provides a work area when the program is executed by the CPU 1-31. RAM (Random Access Memory) 1-34 is realized by a volatile memory or the like, and temporarily stores various information. The storage unit 1-35 is realized by, for example, an HDD (Hard Disk Drive) or the like. Then, the storage unit 1-35 stores various application software including an operating system (OS: Operating System), a device driver of a peripheral device, and a program for performing image processing according to the present embodiment described later. The operation unit 1-140 is realized by, for example, a keyboard, a mouse, or the like, and inputs an instruction from the user into the device. The display unit 1-130 is realized by, for example, a display or the like, and displays various information toward the user. The operation unit 1-140 and the display unit 1-130 provide a function as a GUI (Graphical User Interface) under the control of the CPU 1-31. The display unit 1-130 may be a touch panel monitor that accepts operation input, and the operation unit 1-140 may be a stylus pen. Each of the above components is communicably connected to each other by common bus 1-36.
 画像撮影装置1-110は、例えば、走査型電子顕微鏡(SEM:Scanning Electron Microscope)、透過型電子顕微鏡(TEM:Transmission Electron Microscope)、光学顕微鏡である。画像撮影装置1-110は、その他に、デジタルカメラ、スマートフォンといった画像撮影機能をもった装置であってもよい。画像撮影装置1-110は取得した画像をデータサーバ1-120へ送信する。画像撮影装置1-110を制御する不図示の撮影制御部が、画像処理装置1-100に含まれていてもよい。 The imaging apparatus 1-110 is, for example, a scanning electron microscope (SEM), a transmission electron microscope (TEM: Transmission Electron Microscope), or an optical microscope. The image capturing device 1-110 may also be a device having an image capturing function such as a digital camera or a smartphone. The image capturing device 1-110 transmits the acquired image to the data server 1-120. An imaging control unit (not shown) that controls the imaging apparatus 1-110 may be included in the image processing apparatus 1-100.
 (機能構成)
 次に、図2に基づいて本実施形態に係る画像処理装置1-100を含む画像処理システムの機能構成について説明する。ROM1-33に格納されたプログラムをCPU1-31が実行することにより、図2に示した各部の機能が実現される。なお、プログラムを実行する主体は1以上のCPUであってもよいし、プログラムを記憶するROMも1以上のメモリであってもよい。また、CPUに替えてもしくはCPUと併用してGPU(Graphics Processing Unit)など他のプロセッサを用いることとしてもよい。すなわち、少なくとも1以上のプロセッサ(ハードウエア)が当該プロセッサと通信可能に接続された少なくとも1以上のメモリに記憶されたプログラムを実行することで、図2に示した各部の機能が実現される。
(Functional configuration)
Next, the functional configuration of the image processing system including the image processing apparatus 1-100 according to the present embodiment will be described with reference to FIG. When the CPU 1-31 executes the program stored in the ROM 1-33, the functions of the respective parts shown in FIG. 2 are realized. The main body that executes the program may be one or more CPUs, and the ROM that stores the program may also be one or more memories. Further, another processor such as GPU (Graphics Processing Unit) may be used instead of the CPU or in combination with the CPU. That is, the functions of the respective parts shown in FIG. 2 are realized by executing a program stored in at least one or more memories in which at least one or more processors (hardware) are communicably connected to the processors.
 画像処理装置1-100は、機能構成として、受付部1-41、モデル取得部1-42、抽出部1-43、推論部1-44、情報取得部1-45及び表示制御部1-46を有する。画像処理装置1-100は、データサーバ1-120、表示部1-130と通信可能に接続されている。 The image processing device 1-100 has a reception unit 1-41, a model acquisition unit 1-42, an extraction unit 1-43, an inference unit 1-44, an information acquisition unit 1-45, and a display control unit 1-46 as functional configurations. Has. The image processing device 1-100 is communicably connected to the data server 1-120 and the display unit 1-130.
 受付部1-41は、操作部1-140を介してユーザから入力された推論条件を受け付ける。すなわち操作部1-140は、推論条件の設定を受け付ける受付手段の一例に相当する。 Reception unit 1-41 receives the inference condition input from the user via operation unit 1-140. That is, the operation unit 1-140 corresponds to an example of a reception means that accepts the setting of the inference condition.
 推論条件は、推論回数に関する情報(後述)、閾値、パッチサイズのうち少なくとも一つを含む。モデル取得部1-42は、あらかじめ構築した学習済みモデル1-47と、データサーバ1-120からの推論用画像を取得する。抽出部1-43は、受付部1-41で受け付けた推論条件に基づき、推論用画像から複数の注目領域を抽出する。すなわち、推論用画像から複数の注目領域を抽出する抽出手段の一例に相当する。 The inference condition includes at least one of information on the number of inferences (described later), a threshold value, and a patch size. The model acquisition unit 1-42 acquires the trained model 1-47 constructed in advance and the inference image from the data server 1-120. The extraction unit 1-43 extracts a plurality of regions of interest from the inference image based on the inference conditions received by the reception unit 1-41. That is, it corresponds to an example of an extraction means for extracting a plurality of regions of interest from an image for inference.
 ここで注目領域とは、推論用画像中から切り抜いた一部分のことを指す。 Here, the area of interest refers to a part cut out from the inference image.
 推論部1-44は、複数の注目領域それぞれに対し、モデル取得部1-42で取得した学習済みモデル1-47を使用して、推論を行う。すなわち、複数の注目領域それぞれに対して共通の学習済みモデルを使用して推論を行う推論手段の一例に相当する。 The inference unit 1-44 makes inferences for each of the plurality of areas of interest using the trained model 1-47 acquired by the model acquisition unit 1-42. That is, it corresponds to an example of an inference means that makes an inference using a common trained model for each of a plurality of areas of interest.
 情報取得部1-45は、推論部1-44において行った推論結果に基づいて、最終的な推論結果を算出する。すなわち、複数の推論結果に基づいて最終的な推論結果を算出する算出手段の一例に相当する。 The information acquisition unit 1-45 calculates the final inference result based on the inference result performed by the inference unit 1-44. That is, it corresponds to an example of a calculation means for calculating the final inference result based on a plurality of inference results.
 表示制御部1-46は、各処理で取得した推論結果に関する情報を表示部1-130に出力し、表示部1-130に各処理の結果を表示させる。 The display control unit 1-46 outputs the information regarding the inference result acquired in each process to the display unit 1-130, and causes the display unit 1-130 to display the result of each process.
 なお、画像処理装置1-100が有する各部の少なくとも一部を独立した装置として実現してもよい。画像処理装置1-100はワークステーションでもよい。各部の機能はコンピュータ上で動作するソフトウェアとして実現してもよく、各部の機能を実現するソフトウェアは、クラウドをはじめとするネットワークを介したサーバ上で動作してもよい。以下に説明する本実施形態では、各部はローカル環境に設置したコンピュータ上で動作するソフトウェアによりそれぞれ実現されているものとする。 Note that at least a part of each part of the image processing device 1-100 may be realized as an independent device. The image processing device 1-100 may be a workstation. The functions of each part may be realized as software that operates on a computer, and the software that realizes the functions of each part may be realized on a server via a network such as a cloud. In the present embodiment described below, it is assumed that each part is realized by software running on a computer installed in a local environment.
 (処理フロー)
 続いて、本発明の第1-1の実施形態に係る画像処理について説明する。図3は本実施形態の画像処理装置1-100が実行する処理の処理手順を示す図である。本実施形態は、ROM1-33に格納されている各部の機能を実現するプログラムをCPU1-31が実行することにより実現される。本実施形態では、処理対象画像がTEM画像である場合の例を説明する。TEM画像は2次元濃淡画像として取得される。また、本実施形態ではメラミン・アルキッド樹脂塗料の塗膜中のカーボンブラックを、処理対象画像に含まれる処理対象物の例として説明する。本実施形態においては、10枚の推論用画像を使用し、1画像中のカーボンブラックの最大面積/最小面積の値は30~120であった。処理過程では、各推論用画像についてS1-201からS1-206までの処理を行うが、以下では重複した説明を省略するために、処理が1枚の推論用画像に適用される場合を説明する。
(Processing flow)
Subsequently, the image processing according to the first embodiment of the present invention will be described. FIG. 3 is a diagram showing a processing procedure of processing executed by the image processing apparatus 1-100 of the present embodiment. This embodiment is realized by the CPU 1-31 executing a program that realizes the functions of each part stored in the ROM 1-33. In this embodiment, an example in which the image to be processed is a TEM image will be described. The TEM image is acquired as a two-dimensional shading image. Further, in the present embodiment, carbon black in the coating film of the melamine / alkyd resin paint will be described as an example of the object to be processed included in the image to be processed. In this embodiment, 10 images for inference were used, and the value of the maximum area / minimum area of carbon black in one image was 30 to 120. In the processing process, the processes from S1-201 to S1-206 are performed for each inference image, but in order to omit duplicate explanations, a case where the processing is applied to one inference image will be described below. ..
 ステップS1-201において、受付部1-41は、操作部1-140においてユーザにより入力された推論条件を受け付ける。本実施形態における推論条件は、推論回数に関する情報、閾値、パッチサイズのうち少なくとも1つを含む。推論回数に関する情報とは、後述する各画素の平均推論回数、抽出回数等の情報である。 In step S1-201, the reception unit 1-41 receives the inference condition input by the user in the operation unit 1-140. The inference condition in the present embodiment includes at least one of information regarding the number of inferences, a threshold value, and a patch size. The information regarding the number of inferences is information such as the average number of inferences and the number of extractions of each pixel, which will be described later.
 ステップS1-202において、モデル取得部1-42は、あらかじめ構築した学習済みモデルと、推論用画像を取得する。推論用画像はデータサーバ1-120から取得する。ステップS1-201において、パッチサイズが設定されている場合は、同様のパッチサイズで学習した学習済みモデルを取得する。ここでパッチサイズとは、対象画像から一部を切り抜く際の、切り抜いた画像の縦と横の画素数のことである。 In step S1-202, the model acquisition unit 1-42 acquires the trained model constructed in advance and the inference image. The inference image is acquired from the data server 1-120. If the patch size is set in steps S1-201, the trained model trained with the same patch size is acquired. Here, the patch size is the number of pixels in the vertical and horizontal directions of the cropped image when a part of the target image is cropped.
 ここで、学習済みモデル1-47の構築方法の一例について説明する。ここでは、画像処理の種類としてセグメンテーションを行う場合の例を用いて説明するが、本実施形態の適用範囲は、画像処理の種類に限定されるものではない。まず、処理対象画像であるTEM画像と、教師画像のペアを用意する。ペアは複数あってもよい。ここで、教師画像は、適当な画像処理方法を用いて、処理対象画像を画像処理したものである。例えば、検出したい領域とそうでない領域を二値化した画像や、検出したい領域を塗りつぶし、検出したくない領域を塗りつぶさなかった画像等である。 Here, an example of how to build the trained model 1-47 will be described. Here, an example of performing segmentation as a type of image processing will be described, but the scope of application of this embodiment is not limited to the type of image processing. First, a pair of a TEM image, which is an image to be processed, and a teacher image is prepared. There may be multiple pairs. Here, the teacher image is an image processed image to be processed by using an appropriate image processing method. For example, it is an image obtained by binarizing an area to be detected and an area not to be detected, an image in which an area to be detected is filled, and an image in which an area not not to be detected is not filled.
 次に、処理対象画像と教師画像を用いて、所定のアルゴリズムに従った機械学習を実施することにより、学習済みモデル1-47を生成する。本実施形態では、所定のアルゴリズムとして、U-Netを用いる。なお、U-Netによる学習方法は、公知の技術を用いることができる。また、所定のアルゴリズムとして、例えば、SVM(Support vector machine)、DNN(Deep Neural Network)、CNN(Convolutional Neural Network)等を用いてもよい。また、1ピクセル単位でクラス分類を行うSemantic Segmentationに用いられるアルゴリズムとして、U-Netの他に、FCN (Fully Convolutional Network)、SegNet等を用いることもできる。さらに、上記アルゴリズムにGAN(Generative Adversarial Networks)等の所謂生成モデルを組み合わせたアルゴリズムを用いてもよい。実行したい処理が複数種類ある場合は、それぞれの処理を実行できるように、それぞれ別の学習モデルを構築する。また、学習に用いるデータ量を増やすために、水増し(Data Augmentation)を行ってもよい。 Next, the trained model 1-47 is generated by performing machine learning according to a predetermined algorithm using the image to be processed and the teacher image. In this embodiment, U-Net is used as a predetermined algorithm. As a learning method using U-Net, a known technique can be used. Further, as a predetermined algorithm, for example, SVM (Support Vector machine), DNN (Deep Neural Network), CNN (Convolutional Neural Network), or the like may be used. In addition to U-Net, FCN (Fully Convolutional Network), SegNet, and the like can also be used as an algorithm used for Semantic Segmentation that classifies classes in 1-pixel units. Further, an algorithm in which a so-called generative model such as GAN (Generative Adversarial Networks) is combined with the above algorithm may be used. If there are multiple types of processing that you want to execute, build different learning models so that each processing can be executed. Further, in order to increase the amount of data used for learning, inflating (Data Augmentation) may be performed.
 図4を用いて、ステップS1-203の処理について説明する。ここでも、メラミン・アルキッド樹脂中のカーボンブラックを処理対象物とする。ステップS1-203において、抽出部1-43は、推論用画像に対して、複数の注目領域を抽出する。図4に、位置座標1-530、位置座標1-531、位置座標51-32夫々に対して、注目領域1-540、注目領域1-541、注目領域1-542を抽出している例を示す。本実施形態における推論用画像は、2次元直交座標(x,y)によって位置が特定できる複数の画素により構成される。画像の横方向、縦方向の画素数をそれぞれx_size、y_sizeとすると、0≦x≦x_size、0≦y≦y_sizeが成り立つ。画像の左上を起点に右方向にx軸、下方向にy軸をとり、互いに異なる複数の位置座標を(x-,y-)(i=1,2,・・・,N)とし、0≦x-≦x_size、0≦y-≦y_sizeを満たすように設定する。本実施形態では、0≦x-≦x_size、0≦y-≦y_sizeを満たす乱数の組(x-,y-)を生成することとする。次に(x-,y-)を左上の座標として、注目領域を設定する。注目領域の大きさは、パッチサイズと等しくなるようにする。本実施形態では、操作部1-140において各画素の平均推論回数をユーザが設定する。平均推論回数は、抽出を行う際の、各画素毎の平均抽出回数である。抽出を行う際、各画素毎に抽出された回数を記録しておくことで、求めることができる。また、(x-,y-)が画像の端部付近に位置し、注目領域の大きさがパッチサイズより小さくなってしまう場合は、画像の周囲を画素値0で埋める、所謂パディング処理などを行うことで、注目領域の大きさがパッチサイズと同じになるように調整する。 The process of steps S1-203 will be described with reference to FIG. Here, too, the carbon black in the melamine / alkyd resin is the object to be treated. In step S1-203, the extraction unit 1-43 extracts a plurality of regions of interest from the inference image. FIG. 4 shows an example in which the attention region 1-540, the attention region 1-541, and the attention region 1-542 are extracted with respect to the position coordinates 1-530, the position coordinates 1-531, and the position coordinates 51-32. Shown. The inference image in this embodiment is composed of a plurality of pixels whose positions can be specified by two-dimensional Cartesian coordinates (x, y). Assuming that the number of pixels in the horizontal direction and the vertical direction of the image is x_size and y_size, respectively, 0 ≦ x ≦ x_size and 0 ≦ y ≦ y_size hold. Starting from the upper left of the image, the x-axis is to the right and the y-axis is to the bottom, and multiple position coordinates that are different from each other are (x- i , y- i- ) (i = 1, 2, ..., N). , 0 ≤ x- i ≤ x_size, 0 ≤ y i -≤ y_size. In the present embodiment, a set of random numbers (x- i , y- i- ) satisfying 0 ≤ x-i ≤ x_size and 0 ≤ y i -≤ y_size is generated. Next, the region of interest is set with (x- i , y- i-) as the upper left coordinate. The size of the area of interest should be equal to the patch size. In the present embodiment, the user sets the average number of inferences for each pixel in the operation unit 1-140. The average number of inferences is the average number of extractions for each pixel when performing extraction. When extracting, it can be obtained by recording the number of times of extraction for each pixel. Further, (x- i, y i - ) is positioned near the end of the image, when the size of the region of interest becomes smaller than the patch size may fill the periphery of the image pixel values 0, so-called padding process By doing something like this, adjust the size of the area of interest so that it is the same as the patch size.
 ステップS1-204において、推論部1-44は、ステップS1-203で抽出した複数の注目領域それぞれに対して、学習済みモデル1-47を使用して推論を行う。 In step S1-204, the inference unit 1-44 makes an inference using the trained model 1-47 for each of the plurality of areas of interest extracted in step S1-203.
 ステップS1-205において、情報取得部1-45は、ステップS1-204の推論結果に基づき、最終的な推論結果を算出して取得する。本実施形態においては、画素毎に、推論された回数と、カーボンブラックであると判定された回数を記録し、カーボンブラックであると判定された回数/推論された回数が閾値以上になった場合に、最終的にカーボンブラックであると判定する。閾値は、操作部1-140においてユーザが設定できるようにしてもよい。推論が分類ではなく、回帰処理の場合は、前述の閾値とは別に、新たな閾値を設定し、閾値以上の場合はカーボンブラックであるとするなどして予め結果を分類し、その後最終的な判定処理を行う。 In step S1-205, the information acquisition unit 1-45 calculates and acquires the final inference result based on the inference result in step S1-204. In the present embodiment, the number of times inferred and the number of times determined to be carbon black are recorded for each pixel, and the number of times determined to be carbon black / the number of times inferred becomes equal to or greater than the threshold value. Finally, it is determined that it is carbon black. The threshold value may be set by the user in the operation unit 1-140. If the inference is not classification but regression processing, a new threshold is set in addition to the above threshold, and if it is above the threshold, the result is classified in advance by assuming that it is carbon black, and then the final result. Judgment processing is performed.
 ステップS1-206において、表示制御部1-46は、最終的な推論結果を表示部1-130に表示させる。この場合、表示制御部1-46は、最終的な推論結果を画像処理装置1-100に接続された表示部1-130に送信し、表示部1-130に表示させる制御を行う。本実施形態では、画素毎にカーボンブラックであるかどうかの判定を行い、カーボンブラックであると判定された画素は輝度255、カーボンブラックでないと判定された画素は輝度0で表示する。 In step S1-206, the display control unit 1-46 causes the display unit 1-130 to display the final inference result. In this case, the display control unit 1-46 controls the display unit 1-130 to transmit the final inference result to the display unit 1-130 connected to the image processing device 1-100 and display the final inference result on the display unit 1-130. In the present embodiment, it is determined for each pixel whether or not it is carbon black, and the pixel determined to be carbon black is displayed with a brightness of 255, and the pixel determined to be not carbon black is displayed with a brightness of 0.
 第1-1の実施形態に係る画像処理装置の効果について図5を用いて説明する。本実施形態において、効果を測定するために、評価指標としてIoU(Intersection over Union)を用いた。IoUは式(1-1)により、定義される。 The effect of the image processing apparatus according to the first embodiment will be described with reference to FIG. In this embodiment, IoU (Intersection over Union) was used as an evaluation index in order to measure the effect. IoU is defined by the equation (1-1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 ここで、TP(True Positive)は、カーボンブラックである画素をカーボンブラックであると判定した数、FP(False Positive)は、カーボンブラックでない画素をカーボンブラックであると判定した数(誤検知数)である。そして、FN(False Negative)は、カーボンブラックである画素をカーボンブラックでないと判定した数(未検知数)である。 Here, TP (True Positive) is the number of pixels that are carbon black determined to be carbon black, and FP (False Positive) is the number of pixels that are not carbon black are determined to be carbon black (number of false positives). Is. The FN (False Negative) is a number (undetected number) in which a pixel that is carbon black is determined not to be carbon black.
 また、x_size=1280、y_size=960の画像を用いた。パッチサイズは128×128、閾値は0.1とした。図5のグラフに示す通り、従来手法ではIoU=0.61であったが、平均推論回数が増えるにつれてIoUが大きくなり、平均推論回数30回ではIoU=0.84となった。 Further, images of x_size = 1280 and y_size = 960 were used. The patch size was 128 × 128 and the threshold was 0.1. As shown in the graph of FIG. 5, in the conventional method, IoU = 0.61, but as the average number of inferences increased, the IoU increased, and when the average number of inferences was 30 times, IoU = 0.84.
 以上のように、本実施形態における画像処理装置1-100は、複数の注目領域それぞれに対して共通の学習済みモデルを使用して推論を行うことで、推論精度を向上することができる。また、ユーザーが閾値を設定できることにより、目的に合わせて、推論精度を制御することができる。例えば、未検知を減らしたい場合閾値を下げ、誤検知を減らしたい場合閾値を上げる等、同じ学習済みモデルを使用していながら、目的に合わせた推論が可能になる。 As described above, the image processing apparatus 1-100 in the present embodiment can improve the inference accuracy by performing inference using a common trained model for each of a plurality of areas of interest. Further, since the user can set the threshold value, the inference accuracy can be controlled according to the purpose. For example, if you want to reduce undetected, lower the threshold, and if you want to reduce false positives, raise the threshold. You can make inferences according to the purpose while using the same trained model.
 (第1-2の実施形態)
 (概要)
 次に、図面を参照して、第1-2の実施形態の一例を詳しく説明する。なお、上記の実施形態と同様の構成、機能、及び動作についての説明は省略し、主に上記実施形態との差異について説明する。
(Embodiment 1-2)
(Overview)
Next, an example of the first and second embodiments will be described in detail with reference to the drawings. The description of the configuration, function, and operation similar to those of the above embodiment will be omitted, and the differences from the above embodiment will be mainly described.
 (処理フロー)
 ステップS1-201において、受付部1-41は、操作部1-140においてユーザにより入力された推論条件を受け付ける。本実施形態における推論条件は、推論回数に関する情報、閾値、パッチサイズのうち少なくとも1つを含む。推論回数に関する情報とは、後述する基準座標設定回数等の情報である。
(Processing flow)
In step S1-201, the reception unit 1-41 receives the inference condition input by the user in the operation unit 1-140. The inference condition in the present embodiment includes at least one of information regarding the number of inferences, a threshold value, and a patch size. The information regarding the number of inferences is information such as the number of times the reference coordinates are set, which will be described later.
 第1-1の実施形態においては、すべての注目領域を異なる乱数を用いて決定したが、第1-2の実施形態においては、一部の注目領域のみを乱数を用いて決定し、その他の領域は、乱数を用いて決定した注目領域の座標から機械的に決定する。図6を用いて、ステップS1-203の処理について説明する。図6中の各注目領域は隣接する注目領域と一部重複する領域を有する。ステップS1-203において、抽出部1-43は、推論用画像1-501に対して、複数の注目領域を抽出する。図6に、基準座標1-560(x-,y)を基準として、注目領域1-550~注目領域1-558を抽出している例を示す。注目領域のx軸方向の画素数をp、注目領域のy軸方向の画素数をpとする。複数の基準座標を(x-,y)(j=1,2,・・・,N)とし、(x-、y)は0≦x-≦p、0≦y-≦pを満たす乱数の組とする。その他の注目領域の左上の座標は、(x-+p×m,y+p×n)とする(ただしnは1以上、x_size/p-1以下の整数。mは1以上、y_size/p-1以下の整数)。本実施形態では、操作部1-140において基準座標設定回数をユーザが設定することとする。基準座標設定回数は、抽出を行う際の、左上の基準座標(x-,y))を乱数を用いて設定する回数である。 In the first embodiment, all the areas of interest are determined using different random numbers, but in the first and second embodiments, only some areas of interest are determined using random numbers, and the others. The region is mechanically determined from the coordinates of the region of interest determined using random numbers. The process of steps S1-203 will be described with reference to FIG. Each region of interest in FIG. 6 has an region that partially overlaps with an adjacent region of interest. In step S1-203, the extraction unit 1-43 extracts a plurality of regions of interest with respect to the inference image 1-501. 6, reference coordinates 1-560 of (x- 1, y 1) as a reference, an example that extracts the attention region 1-550-interest region 1-558. The number of pixels x-axis direction of the region of interest p x, the number of pixels y-axis direction of the region of interest and p y. A plurality of reference coordinates (x- j, y j) ( j = 1,2, ···, N) and then, (x- j, y j) is 0 ≦ x- j ≦ p x, 0 ≦ y j - a set of random numbers satisfying ≦ p y. The coordinates of the upper left of the other areas of interest are (x- j + p x x m, y j + p y x n) (where n is 1 or more, x_size / p x -1 or less integer. M is 1 or more, y_size / p y -1 an integer). In the present embodiment, the user sets the reference coordinate setting number of times in the operation unit 1-140. The reference coordinate setting number is the number of times that the upper left reference coordinate (x- j , yj )) is set by using a random number when extracting.
 第1-2の実施形態に係る画像処理装置の効果について図7を用いて説明する。第1-1の実施形態同様、IoUを用いて評価を行った。ここではx_size=1280、y_size=960の画像を用いた。また、パッチサイズは128×128、閾値は0.2とした。図7のグラフに示す通り、従来手法ではIoU=0.61であったが、平均推論回数が増えるにつれてIoUが大きくなり、基準座標設定回数30回ではIoU=0.86となった。 The effect of the image processing apparatus according to the first and second embodiments will be described with reference to FIG. Similar to the first embodiment, the evaluation was performed using IoU. Here, images of x_size = 1280 and y_size = 960 were used. The patch size was 128 × 128, and the threshold value was 0.2. As shown in the graph of FIG. 7, in the conventional method, IoU = 0.61, but as the average number of inferences increased, IoU increased, and when the number of reference coordinate settings was 30 times, IoU = 0.86.
 (第1-3の実施形態)
 (概要)
 次に、図面を参照して、第1-3の実施形態の一例を詳しく説明する。なお、上記の実施形態と同様の構成、機能、及び動作についての説明は省略し、主に上記実施形態との差異について説明する。本実施形態では、光学顕微鏡画像中の、血液中の赤血球、白血球、血小板を処理対象画像に含まれる処理対象物の例として説明する。
(Embodiment 1-3)
(Overview)
Next, an example of the first to third embodiments will be described in detail with reference to the drawings. The description of the configuration, function, and operation similar to those of the above embodiment will be omitted, and the differences from the above embodiment will be mainly described. In the present embodiment, red blood cells, white blood cells, and platelets in blood in the optical microscope image will be described as an example of the object to be processed included in the image to be processed.
 (処理フロー)
 ステップS1-201において、受付部1-41は、操作部1-140においてユーザにより入力された推論条件を受け付ける。本実施形態における推論条件は、推論回数に関する情報、閾値、パッチサイズのうち少なくとも1つを含む。推論回数に関する情報とは、後述する基準座標設定回数等の情報である。
(Processing flow)
In step S1-201, the reception unit 1-41 receives the inference condition input by the user in the operation unit 1-140. The inference condition in the present embodiment includes at least one of information regarding the number of inferences, a threshold value, and a patch size. The information regarding the number of inferences is information such as the number of times the reference coordinates are set, which will be described later.
 第1-1の実施形態においては、すべての注目領域を異なる乱数を用いて決定したが、第1-3の実施形態においては、一部の注目領域のみを乱数を用いて決定し、その他の領域は、乱数を用いて決定した注目領域の座標から機械的に決定する。図8を用いて、ステップS1-203の処理について説明する。図8中の各注目領域は隣接する注目領域と一部重複する領域を有する。ステップS1-203において、抽出部1-43は、推論用画像1-502に対して、複数の注目領域を抽出する。図8に、基準座標1-660(x-,y)を基準として、注目領域1-560~注目領域1-568を抽出している例を示す。注目領域のx軸方向の画素数をp、注目領域のy軸方向の画素数をpとする。複数の基準座標を(x-,y)(j=1,2,・・・,N)とし、(x-、y)は0≦x-≦p、0≦y-≦pを満たす乱数の組とする。その他の注目領域の左上の座標は、(x-+p×m,y+p×n)とする(ただしnは1以上、x_size/p-1以下の整数。mは1以上、y_size/p-1以下の整数)。本実施形態では、操作部1-140において基準座標設定回数をユーザが設定することとする。基準座標設定回数は、抽出を行う際の、左上の基準座標(x-,y))を乱数を用いて設定する回数である。 In the first embodiment, all the areas of interest are determined using different random numbers, but in the first-third embodiment, only some areas of interest are determined using random numbers, and other areas of interest are determined. The region is mechanically determined from the coordinates of the region of interest determined using random numbers. The process of steps S1-203 will be described with reference to FIG. Each region of interest in FIG. 8 has an region that partially overlaps with an adjacent region of interest. In step S1-203, the extraction unit 1-43 extracts a plurality of regions of interest with respect to the inference image 1-502. 8, reference coordinates 1-660 of (x- 1, y 1) as a reference, an example that extracts the attention region 1-560-interest region 1-568. The number of pixels x-axis direction of the region of interest p x, the number of pixels y-axis direction of the region of interest and p y. A plurality of reference coordinates (x- j, y j) ( j = 1,2, ···, N) and then, (x- j, y j) is 0 ≦ x- j ≦ p x, 0 ≦ y j - a set of random numbers satisfying ≦ p y. The coordinates of the upper left of the other areas of interest are (x- j + p x x m, y j + p y x n) (where n is 1 or more, x_size / p x -1 or less integer. M is 1 or more, y_size / p y -1 an integer). In the present embodiment, the user sets the reference coordinate setting number of times in the operation unit 1-140. The reference coordinate setting number is the number of times that the upper left reference coordinate (x- j , yj )) is set by using a random number when extracting.
 第1-3の実施形態に係る画像処理装置の効果について図9を用いて説明する。本実施形態では、mIoUを用いて評価を行った。mIoUは式(1-2)によって定義される。 The effect of the image processing apparatus according to the first to third embodiments will be described with reference to FIG. In this embodiment, evaluation was performed using mIoU. mIoU is defined by equation (1-2).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 cは分類のクラス数であり、本実施例においてはc=3である。また、x_size=1280、y_size=960の画像を用いた。パッチサイズは128×128、閾値は0.2とした。図9のグラフに示す通り、従来手法ではmIoU=0.55であったが、平均推論回数が増えるにつれてIoUが大きくなり、基準座標設定回数30回ではIoU=0.77となった。 C is the number of classification classes, and in this embodiment, c = 3. In addition, images of x_size = 1280 and y_size = 960 were used. The patch size was 128 × 128 and the threshold was 0.2. As shown in the graph of FIG. 9, mIoU = 0.55 in the conventional method, but IoU increased as the average number of inferences increased, and IoU = 0.77 when the number of reference coordinate settings was 30 times.
 (第1-4の実施形態)
 (概要)
 次に、図面を参照して、第1-4の実施形態の一例を詳しく説明する。なお、上記の実施形態と同様の構成、機能、及び動作についての説明は省略し、主に上記実施形態との差異について説明する。本実施形態では、デジタルカメラで撮影した画像中に含まれる人間、自動車、道路を処理対象画像に含まれる処理対象物の例として説明する。
(Embodiment 1-4)
(Overview)
Next, an example of the first to fourth embodiments will be described in detail with reference to the drawings. The description of the configuration, function, and operation similar to those of the above embodiment will be omitted, and the differences from the above embodiment will be mainly described. In the present embodiment, humans, automobiles, and roads included in an image taken by a digital camera will be described as an example of a processing object included in the processing target image.
 (処理フロー)
 ステップS1-201において、受付部1-41は、操作部1-140においてユーザにより入力された推論条件を受け付ける。本実施形態における推論条件は、推論回数に関する情報、閾値、パッチサイズのうち少なくとも1つを含む。推論回数に関する情報とは、後述するピッチ等の情報である。
(Processing flow)
In step S1-201, the reception unit 1-41 receives the inference condition input by the user in the operation unit 1-140. The inference condition in the present embodiment includes at least one of information regarding the number of inferences, a threshold value, and a patch size. The information regarding the number of inferences is information such as pitch, which will be described later.
 第1-1の実施形態においては、すべての注目領域を異なる乱数を用いて決定したが、第1-4の実施形態においては、すべての注目領域を乱数を用いずに決定する。図10を用いて、ステップS1-203の処理について説明する。ステップS1-203において、抽出部1-43は、推論用画像に対して、複数の注目領域を抽出する。図10に、基準座標1-580(x-,y)を基準として、注目領域1-570~注目領域1-572を抽出している例を示す。注目領域を縦または横にピッチ分ずらすことにより、複数の注目領域を抽出する。ここで注目領域のx軸方向の画素数をp、注目領域のy軸方向の画素数をp、x軸方向のピッチをpitch_xとし、y軸方向のピッチをpitch_yとし、0<pitch_x<p、0<pitch_y-<pとする。図10では、注目領域1-571、注目領域1-572の左上座標はそれぞれ(x-+pitch_x、y)、(x-+2pitch_x、y)となる。図8では、3個の注目領域しか図示していないが、左上の座標を(x-+pitch_x×m,y+pitch_y×n)(ただしnは1以上、x_size/pitch_x-1以下の整数。mは1以上、y_size/pitch_y-1以下の整数)とする他の注目領域を抽出してもよい。 In the first embodiment, all the areas of interest are determined using different random numbers, but in the first to fourth embodiments, all the areas of interest are determined without using random numbers. The process of step S1-203 will be described with reference to FIG. In step S1-203, the extraction unit 1-43 extracts a plurality of regions of interest from the inference image. 10, the reference coordinates 1-580 of (x- 1, y 1) as a reference, an example that extracts the attention region 1-570-interest region 1-572. A plurality of areas of interest are extracted by shifting the areas of interest by the pitch vertically or horizontally. Here attention area in the x-axis direction of the number of pixels p x, the number of pixels y-axis direction of the region of interest and p y, the pitch in the x-axis direction and Pitch_x, and pitch_y pitch in the y-axis direction, 0 <pitch_x < and p x, 0 <pitch_y- <p y. In FIG. 10, the upper left coordinates of the region of interest 1-571 and the region of interest 1-572 are (x- 1 + pitch_x, y 1 ) and (x- 1 + 2 pitch_x, y 1 ), respectively. In FIG. 8, only three regions of interest are shown, but the upper left coordinates are (x- 1 + pitch_x × m, y 1 + pitch_y × n) (where n is 1 or more and x_size / pitch_x-1 or less. Other regions of interest may be extracted in which m is 1 or more and an integer of y_size / pitch_y-1 or less).
 第1-4の実施形態に係る画像処理装置の効果について図11を用いて説明する。評価には、第1-2の実施形態と同様、mIoUを用いた。また、x_size=1280、y_size=960の画像を用いた。パッチサイズは128×128、閾値は0.1、ピッチはpitch_x=pitch_yとし、値は16~112とした。第1-3の実施形態同様、mIoUを用いて評価を行った。第1-3の実施形態同様、c=3である。図11のグラフに示す通り、従来手法ではIoU=0.50であったが、いずれのピッチにおいてもIoUが0.50よりも大きくなり、ピッチが80のときIoU=0.64となり、最大であった。 The effect of the image processing apparatus according to the first to fourth embodiments will be described with reference to FIG. For the evaluation, mIoU was used as in the first and second embodiments. In addition, images of x_size = 1280 and y_size = 960 were used. The patch size was 128 × 128, the threshold value was 0.1, the pitch was pitch_x = pitch_y, and the values were 16 to 112. As in the first to third embodiments, evaluation was performed using mIoU. As in the first to third embodiments, c = 3. As shown in the graph of FIG. 11, in the conventional method, IoU = 0.50, but at any pitch, IoU is larger than 0.50, and when the pitch is 80, IoU = 0.64, and the maximum is there were.
 <第1の実施形態と、第2の実施形態や第3の実施形態との組み合わせ>
 本発明の第1の実施形態では、後述の本発明の第2の実施形態、及び第3の実施形態の少なくともいずれか一方と組合せることができる。
<Combination of the first embodiment with the second and third embodiments>
In the first embodiment of the present invention, it can be combined with at least one of the second embodiment and the third embodiment of the present invention described later.
 すなわち、第1の実施形態と第2の実施形態を組み合わせる場合、上記学習済みモデル(識別器)を、以下の第1の学習工程と第2の学習工程によって学習されたものを用いることができる。第1の学習工程は、識別対象情報を含むデータから作成した学習用データを複数含む初期データセットのうちの、第1の学習用データセットを用いて学習すると工程である。第2の学習工程は、第1の学習工程で学習することで生成された学習済みモデルに含まれる情報と、初期データセットのうちの、第2の学習用データセットを用いて学習することで、学習済みモデルに含まれる情報を更新する。そして、第1の学習用データセットに含まれる識別対象情報の量が、第2の学習用データセットに含まれる識別対象情報の量に比べて、多い。なお、第2の実施形態の内容については後述するため、ここでは省略する。 That is, when the first embodiment and the second embodiment are combined, the trained model (discriminator) that has been learned by the following first learning step and the second learning step can be used. .. The first learning step is a step in which learning is performed using the first learning data set among the initial data sets including a plurality of learning data created from the data including the identification target information. The second learning process is performed by learning using the information contained in the trained model generated by learning in the first learning process and the second training data set of the initial data sets. , Update the information contained in the trained model. The amount of identification target information included in the first learning data set is larger than the amount of identification target information included in the second learning data set. The contents of the second embodiment will be described later, and will be omitted here.
 また、第1の実施形態と第3の実施形態を組み合わせる場合、上記学習済みモデル(識別器)を、以下の水増し工程と、生成工程によって生成されたものを用いることができる。水増し工程は、入力データ、及び入力データに対する教師データで構成される学習データを含む第1の学習データセットと、第1の学習データセットよりも多い数の学習データを含む第2の学習データセットと、を有する学習データセット群に対して、第1の学習データセットに含まれる学習データの数が、第2の学習データセットに含まれる学習データの数以上となるように、学習データの水増しを行う。生成工程は、水増し工程と、水増しされた学習データを有する学習データセット群を用いて学習済みモデルを生成する。そして、第1の学習データセットに含まれる入力データが有する識別対象情報の量は、第2の学習データセットに含まれる入力データが有する識別対象情報の量よりも多い。なお、第3の実施形態の内容については後述するため、ここでは省略する。 Further, when the first embodiment and the third embodiment are combined, the trained model (identifier) can be used by the following padding step and the generation step. The padding process consists of a first training data set containing input data and training data composed of teacher data for the input data, and a second training data set containing a larger number of training data than the first training data set. And, for the training data set group having, the training data is inflated so that the number of training data contained in the first training data set is equal to or larger than the number of training data contained in the second training data set. I do. The generation step generates a trained model using the padding step and the training data set group having the padded training data. The amount of identification target information contained in the input data included in the first learning data set is larger than the amount of identification target information contained in the input data included in the second learning data set. Since the contents of the third embodiment will be described later, they will be omitted here.
 さらに、第1の実施形態、第2の実施形態、及び第3の実施形態を組み合わせる場合は、第1の実施形態の学習済みモデルを生成する際に、上記第1の学習工程、第二の学習工程、水増し工程、生成工程を実施する。 Further, when the first embodiment, the second embodiment, and the third embodiment are combined, the first learning step, the second, when generating the trained model of the first embodiment, Perform learning process, padding process, and generation process.
 <その他の実施形態>
 上述の各実施形態における画像処理装置および画像処理システムは、単体の装置として実現してもよいし、複数の情報取得装置を含む装置を互いに通信可能に組合せて上述の処理を実行する形態としてもよく、いずれも本発明の実施形態に含まれる。共通のサーバ装置あるいはサーバ群で、上述の処理を実行することとしてもよい。この場合、当該共通のサーバ装置は実施形態に係る画像処理装置に対応し、当該サーバ群は実施形態に係る画像処理システムに対応する。画像処理装置および画像処理システムを構成する複数の装置は所定の通信レートで通信可能であればよく、また同一の施設内あるいは同一の国に存在することを要しない。
<Other Embodiments>
The image processing device and the image processing system in each of the above-described embodiments may be realized as a single device, or may be a form in which devices including a plurality of information acquisition devices are combined so as to be able to communicate with each other to execute the above-mentioned processing. Often, both are included in the embodiments of the present invention. The above-mentioned processing may be executed by a common server device or a group of servers. In this case, the common server device corresponds to the image processing device according to the embodiment, and the server group corresponds to the image processing system according to the embodiment. The image processing device and the plurality of devices constituting the image processing system need not be present in the same facility or in the same country as long as they can communicate at a predetermined communication rate.
 以上、実施形態例を詳述したが、本発明は例えば、システム、装置、方法、プログラム若しくは記録媒体(記憶媒体)等としての実施態様をとることが可能である。具体的には、複数の機器(例えば、ホストコンピュータ、インタフェース機器、撮像装置、Webアプリケーション等)から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 Although the embodiment examples have been described in detail above, the present invention can take an embodiment as, for example, a system, an apparatus, a method, a program, a recording medium (storage medium), or the like. Specifically, it may be applied to a system composed of a plurality of devices (for example, a host computer, an interface device, an imaging device, a Web application, etc.), or it may be applied to a device composed of one device. good.
 また、本発明の目的は、以下のようにすることによって達成されることはいうまでもない。即ち、前述した実施形態の機能を実現するソフトウェアのプログラムコード(コンピュータプログラム)を記録した記録媒体(または記憶媒体)を、システムあるいは装置に供給する。係る記憶媒体は言うまでもなく、コンピュータ読み取り可能な記憶媒体である。そして、そのシステムあるいは装置のコンピュータ(またはCPUやGPU)が記録媒体に格納されたプログラムコードを読み出し実行する。この場合、記録媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本発明を構成することになる。 Needless to say, the object of the present invention is achieved by doing the following. That is, a recording medium (or storage medium) in which a software program code (computer program) that realizes the functions of the above-described embodiment is recorded is supplied to the system or device. Needless to say, the storage medium is a computer-readable storage medium. Then, the computer (or CPU or GPU) of the system or device reads and executes the program code stored in the recording medium. In this case, the program code itself read from the recording medium realizes the function of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention.
 以上、本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。例えば、適宜前処理、後処理を加えてもよい。 Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the specific embodiments, and various modifications are made within the scope of the gist of the present invention described in the claims.・ Can be changed. For example, pretreatment and posttreatment may be added as appropriate.
 上述の実施形態を適宜組み合わせた形態も、本発明の実施形態に含まれる。 An embodiment in which the above-described embodiments are appropriately combined is also included in the embodiment of the present invention.
 <<第2の実施形態>>
 (第2の実施形態の背景)
 近年、深層学習を用いて様々なデータを処理し、有用な情報を得る試みが数多く行われている。例えば、画像処理、音声処理、テキスト処理などが知られている。識別精度は、深層学習を用いることにより向上しているが、さらなる識別精度の向上のために、様々な取り組みが行われている。
<< Second Embodiment >>
(Background of the second embodiment)
In recent years, many attempts have been made to process various data using deep learning to obtain useful information. For example, image processing, voice processing, text processing and the like are known. The discrimination accuracy is improved by using deep learning, but various efforts are being made to further improve the discrimination accuracy.
 特開2019-118670号公報(文献2-1)には、深層学習を用いて疾患領域の診断を支援する診断支援装置が記載されている。この技術は、予め画像の色輝度を正規化して疾患部と非疾患部を分離することで、精度の高い診断を行うことを可能とする。 Japanese Unexamined Patent Publication No. 2019-118670 (Reference 2-1) describes a diagnostic support device that supports diagnosis of a diseased area by using deep learning. This technique makes it possible to perform highly accurate diagnosis by normalizing the color brightness of an image in advance and separating the diseased part and the non-diseased part.
 また、Sakamoto, M., Nakano, H., Zhao, K. and Sekiyama,T.: Multi-stage neuralnetworks with single-sided classifiers for false positive reduction and its evaluation using Lung X-ray CT Images, Image Analysis and Processing - ICIAP 2017, pp. 370-379(2017).(文献2-2)には、複数の識別器を接続したカスケード型識別器を用いて、明らかに正常であるサンプルを取り除きながら学習することで、結節候補画像から結節を精度よく識別する技術が記載されている。 Also, Sakamoto, M.M. , Nakano, H. , Zhao, K.K. And Sekiyama, T.M. : Multi-stage neuralnews with single-sided classifiers for false positive redemption and it's evaluation using Lung X-ray CT Image 370-379 (2017). In (Reference 2-2), there is a technique for accurately identifying nodules from nodule candidate images by learning while removing samples that are clearly normal using a cascade type classifier in which a plurality of discriminators are connected. Are listed.
 (第2の実施形態で解決しようとする課題)
 本発明者らの検討の結果、1つのデータ中に識別したい情報(以下、「識別対象情報」と記載)が複数存在する場合や、識別対象情報とそれ以外の情報が区別しづらい場合、文献2-1や文献2-2に記載の方法では、識別が困難であることがわかった。また、データ毎に識別対象情報の量に大きな差がある場合、識別対象情報の多寡に関わらず、識別対象情報を精度よく識別できる識別器を構築することは、従来の方法では困難であった。
(Problem to be solved in the second embodiment)
As a result of the examination by the present inventors, when there is a plurality of information to be identified in one data (hereinafter referred to as "identification target information"), or when it is difficult to distinguish between the identification target information and other information, the literature It was found that the methods described in 2-1 and Document 2-2 are difficult to identify. Further, when there is a large difference in the amount of identification target information for each data, it has been difficult to construct a classifier capable of accurately identifying the identification target information regardless of the amount of identification target information by the conventional method. ..
 したがって、第2の実施形態の目的は、1つのデータ中に識別対象情報が複数存在する場合や、識別対象情報とそれ以外の情報が区別しづらい場合であっても、識別対象情報を精度よく識別できる識別器の生成方法を提供することにある。また、本発明の別の目的は、前記識別器の生成方法により生成された識別器を用いた識別方法及び識別装置を提供することにある。 Therefore, the object of the second embodiment is to accurately identify the identification target information even when there are a plurality of identification target information in one data or when it is difficult to distinguish the identification target information from other information. The purpose is to provide a method for generating a discriminator that can be identified. Another object of the present invention is to provide an identification method and an identification device using the identification device generated by the identification device generation method.
 (第2の実施形態の概要)
 本実施形態に係る識別器の生成方法は、データから作成した学習用データを複数含む初期データセットのうちの、第1の学習用データセットを用いて学習する第1の学習工程を有する。さらに、第1の学習工程で学習することで生成された識別器に含まれる情報と、初期データセットのうちの、第2の学習用データセットを用いて学習することで、識別器に含まれる情報を更新する第2の学習工程とを有する。その際、第1の学習用データセットに含まれる識別対象情報の量は、第2の学習用データセットに含まれる識別対象情報の量に比べて、多い。このように識別器は、2回に分けて学習され、識別対象情報の量が多いデータセットから順次学習する。これにより、まず変換度合いが大きい画像変換のパラメータを学習し、そのパラメータを徐々に変化させていくことができるため、識別対象情報を精度よく識別することができる。
(Outline of the second embodiment)
The method of generating the classifier according to the present embodiment includes a first learning step of learning using the first learning data set among the initial data sets including a plurality of learning data created from the data. Further, it is included in the classifier by learning using the information included in the classifier generated by learning in the first learning step and the second learning data set of the initial data sets. It has a second learning step of updating information. At that time, the amount of identification target information included in the first learning data set is larger than the amount of identification target information included in the second learning data set. In this way, the classifier is trained in two steps, starting with a data set having a large amount of information to be identified. As a result, it is possible to first learn the parameters of image conversion having a large degree of conversion and gradually change the parameters, so that the identification target information can be accurately identified.
 [データ]
 データとは、情報の表現であって、伝達、解釈、又は処理に適するように形式化され、再度情報として解釈できるものである。データの例としては、画像データ、音声データ、テキストデータなどが挙げられる。
[data]
Data is a representation of information that is formalized for transmission, interpretation, or processing and can be reinterpreted as information. Examples of data include image data, voice data, text data, and the like.
 [識別対象情報]
 識別対象情報とは、データ中の識別対象となる情報である。データが画像データの場合、例えば、画像データ中の識別対象領域の位置、面積、及び分布の少なくともいずれか1つの情報が識別対象情報である。本実施形態に係る生成方法で生成された識別器は、ユーザの目視では抽出困難な、画像データ中の識別対象領域を推定し、抽出することができる。
[Identification target information]
The identification target information is information to be identified in the data. When the data is image data, for example, at least one piece of information on the position, area, and distribution of the identification target area in the image data is the identification target information. The classifier generated by the generation method according to the present embodiment can estimate and extract the identification target area in the image data, which is difficult to extract visually by the user.
 データが音声データの場合、例えば、音声データ中の識別対象音の周波数、及び強度の少なくともいずれか1つの情報が識別対象情報である。本実施形態に係る生成方法で生成された識別器は、ユーザにとって抽出困難な、ノイズを含む音のデータ中の識別対象音を推定し、抽出することができる。音のデータが複数の話者の音声データの場合、少なくとも1人の話者の音声データを識別対象情報とすることができる。 When the data is voice data, for example, at least one of the frequency and intensity of the identification target sound in the voice data is the identification target information. The classifier generated by the generation method according to the present embodiment can estimate and extract the sound to be identified in the sound data including noise, which is difficult for the user to extract. When the sound data is the voice data of a plurality of speakers, the voice data of at least one speaker can be used as the identification target information.
 データがテキストデータである場合、例えば、テキストデータ中の識別対象文字の文字、及び文字列の少なくともいずれか1つの情報が識別対象情報である。本実施形態に係る生成方法で生成された識別器は、ユーザにとって抽出困難な、テキストデータ中の識別対象の文字列を推定し、抽出することができる。 When the data is text data, for example, at least one of the characters of the identification target character and the character string in the text data is the identification target information. The classifier generated by the generation method according to the present embodiment can estimate and extract a character string to be identified in text data, which is difficult for the user to extract.
 [識別対象情報の量]
 学習用データセットに含まれる識別対象情報の量とは、学習用データセットに含まれる識別対象情報の量の合計を学習用データセットに含まれる学習用データの数で割った値(平均値)である。ここで、学習用データは、入力データと教師データのペアであり、学習用データセットには、学習用データが複数含まれる。学習用データセットに含まれる識別対象情報の量は、データが画像データの場合、画像中の識別対象領域の面積などである。画像中の識別対象領域の面積は、画素の個数から算出することができる。
[Amount of information to be identified]
The amount of identification target information contained in the training data set is the value obtained by dividing the total amount of identification target information contained in the training data set by the number of training data contained in the training data set (average value). Is. Here, the learning data is a pair of input data and teacher data, and the learning data set includes a plurality of learning data. When the data is image data, the amount of identification target information included in the learning data set is, for example, the area of the identification target area in the image. The area of the identification target area in the image can be calculated from the number of pixels.
 データが音声データの場合、音声の切れ目などで区切ったデータ中の識別対象情報の長さなどである。データセットを選択する際は、入力データと教師データそれぞれを音声の切れ目などで区切ったデータの集合体を初期データセットとし、入力データと教師データの信号の差分が大きい順に並び替えるなどすればよい。 If the data is voice data, it is the length of the identification target information in the data separated by voice breaks. When selecting a data set, the initial data set may be a collection of data in which the input data and the teacher data are separated by audio breaks, etc., and the data sets may be sorted in descending order of the difference between the input data and the teacher data signals. ..
 データがテキストデータの場合、テキストデータ中の識別したい文字列の文字数などである。データセットを選択する際は、入力データと教師データそれぞれを文章の切れ目などで区切ったデータの集合体を初期データセットとし、入力データと教師データのテキストの差分が大きい順に並び替えるなどすればよい。 If the data is text data, it is the number of characters in the character string you want to identify in the text data. When selecting a data set, the initial data set may be a collection of data in which the input data and the teacher data are separated by a break in a sentence, and the data may be sorted in descending order of the difference between the input data and the teacher data text. ..
 (第2-1の実施形態)
 以下、識別対象情報の量が、透過型電子顕微鏡(TEM)画像上の樹脂の量である場合について説明するが、本実施形態の適用範囲は、データ取得方法の種類に限定されるものではない。以下、具体的に、学習システムの装置構成、機能構成、及び学習装置の処理手順を説明する。
(Embodiment of 2-1)
Hereinafter, a case where the amount of identification target information is the amount of resin on a transmission electron microscope (TEM) image will be described, but the scope of application of this embodiment is not limited to the type of data acquisition method. .. Hereinafter, the device configuration and functional configuration of the learning system, and the processing procedure of the learning device will be specifically described.
 (学習システムの装置構成)
 図12は、第2-1の実施形態に係る学習システム(識別器の生成システム)の装置構成の一例を示す図である。以下、学習装置(識別器の生成装置)2-100、及び学習装置2-100と接続される各装置から構成される学習システム2-190について詳しく説明する。学習システム2-190は、学習を行う学習装置2-100と、データを取得するデータ取得装置2-110と、取得されたデータを記憶するデータサーバ2-120を有する。さらに、学習システム2-190は、データを加工して教師データを作成するデータ加工装置2-130と、取得された入力データ及び学習結果を表示する表示部2-140と、ユーザからの指示を入力するための操作部2-150を有する。
(Device configuration of learning system)
FIG. 12 is a diagram showing an example of the device configuration of the learning system (identifier generation system) according to the second embodiment. Hereinafter, the learning system 2-190 composed of the learning device (identifier generator) 2-100 and each device connected to the learning device 2-100 will be described in detail. The learning system 2-190 includes a learning device 2-100 for learning, a data acquisition device 2-110 for acquiring data, and a data server 2-120 for storing the acquired data. Further, the learning system 2-190 is a data processing device 2-130 that processes data to create teacher data, a display unit 2-140 that displays the acquired input data and the learning result, and instructions from the user. It has an operation unit 2-150 for inputting.
 学習装置2-100は、入力データと、データ加工装置2-130で入力データを加工して作成した教師データのペア(学習用データ)を取得する。このようにして作成した複数の学習用データを含む学習用データセットが初期データセットである。初期データセットから学習用データセットを取得し、学習を実施する。なお、本実施形態におけるデータ取得装置2-110は、透過型電子顕微鏡(TEM:Transmission Electron Microscope)であり、入力データは、TEM画像である。データ加工装置2-130の実施する処理については後述する。以下、学習システムを構成する各部について説明する。 The learning device 2-100 acquires a pair (learning data) of the input data and the teacher data created by processing the input data with the data processing device 2-130. The learning data set including the plurality of learning data created in this way is the initial data set. The training data set is acquired from the initial data set and training is performed. The data acquisition device 2-110 in the present embodiment is a transmission electron microscope (TEM: Transmission Electron Microscope), and the input data is a TEM image. The processing performed by the data processing apparatus 2-130 will be described later. Hereinafter, each part constituting the learning system will be described.
 学習装置2-100は、例えばコンピュータであり、本実施形態に係る学習を行う。学習装置2-100は、CPU2-31、通信IF2-32、ROM2-33、RAM2-34、記憶部2-35、及び共通バス2-36を少なくとも有する。CPU2-31は、学習装置2-100の各構成要素の動作を統合的に制御する。CPU2-31の制御により、学習装置2-100がデータ取得装置2-110、データ加工装置2-130の動作も併せて制御するようにしてもよい。 The learning device 2-100 is, for example, a computer, and performs learning according to the present embodiment. The learning device 2-100 has at least a CPU 2-31, a communication IF2-32, a ROM 2-33, a RAM 2-34, a storage unit 2-35, and a common bus 2-36. The CPU 2-31 integrally controls the operation of each component of the learning device 2-100. By controlling the CPU 2-31, the learning device 2-100 may also control the operations of the data acquisition device 2-110 and the data processing device 2-130.
 データサーバ2-120は、データ取得装置2-110が取得したデータを保持する。データ加工装置2-130は、データベースに保持されている入力データを加工して、学習に使えるようにする。通信IF(Interface)2-32は、例えば、LANカードで実現される。通信IF2-32は、外部装置(例えば、データサーバ2-120)と学習装置2-100との間の通信を司る。ROM2-33は、不揮発性のメモリなどで実現され、CPU2-31が実行する制御プログラムを格納し、CPU2-31によるプログラム実行時の作業領域を提供する。RAM(Random Access Memory)2-34は、揮発性のメモリなどで実現され、各種情報を一時的に記憶する。 The data server 2-120 holds the data acquired by the data acquisition device 2-110. The data processing device 2-130 processes the input data stored in the database so that it can be used for learning. Communication IF (Interface) 2-32 is realized by, for example, a LAN card. The communication IF2-32 controls communication between the external device (for example, the data server 2-120) and the learning device 2-100. The ROM 2-33 is realized by a non-volatile memory or the like, stores a control program executed by the CPU 2-31, and provides a work area when the program is executed by the CPU 2-31. RAM (Random Access Memory) 2-34 is realized by a volatile memory or the like, and temporarily stores various information.
 記憶部2-35は、例えば、HDD(Hard Disk Drive)などで実現され、オペレーティングシステム(OS:Operating System)、周辺機器のデバイスドライバ、後述する本実施形態に係る学習を行うためのプログラムを含む各種アプリケーションソフトウェアを格納する。 The storage unit 2-35 is realized by, for example, an HDD (Hard Disk Drive) or the like, and includes an operating system (OS: Operating System), a device driver of a peripheral device, and a program for performing learning according to the present embodiment described later. Stores various application software.
 操作部2-150は、例えば、キーボードやマウスなどで実現され、ユーザからの指示を装置内に入力する。表示部2-140は、例えば、ディスプレイなどで実現され、各種情報をユーザに向けて表示する。操作部2-150や表示部2-140は、CPU2-31からの制御によりGUI(Graphical User Interface)としての機能を提供する。表示部2-140は操作入力を受け付けるタッチパネルモニタであってもよく、操作部2-150はスタイラスペンであってもよい。上記の学習装置100の各構成要素は共通バス2-36により互いに通信可能に接続されている。 The operation unit 2-150 is realized by, for example, a keyboard or a mouse, and inputs an instruction from the user into the device. The display unit 2-140 is realized by, for example, a display or the like, and displays various information toward the user. The operation unit 2-150 and the display unit 2-140 provide a function as a GUI (Graphical User Interface) under the control of the CPU 2-31. The display unit 2-140 may be a touch panel monitor that accepts operation input, and the operation unit 2-150 may be a stylus pen. Each component of the learning device 100 is communicably connected to each other by a common bus 2-36.
 データ取得装置2-110は、例えば、走査型電子顕微鏡(SEM:Scanning Electron Microscope)、透過型電子顕微鏡(TEM)、光学顕微鏡、デジタルカメラ、スマートフォンなどである。データ取得装置2-110は、取得したデータをデータサーバ2-120へ送信する。データ取得装置2-110を制御する不図示のデータ取得制御部が、学習装置2-100に含まれていてもよい。 The data acquisition device 2-110 is, for example, a scanning electron microscope (SEM), a transmission electron microscope (TEM), an optical microscope, a digital camera, a smartphone, or the like. The data acquisition device 2-110 transmits the acquired data to the data server 2-120. A data acquisition control unit (not shown) that controls the data acquisition device 2-110 may be included in the learning device 2-100.
 (学習システムの機能構成)
 図13は、第2-1の実施形態に係る学習システムの機能構成の一例を示す図である。以下、図13を用いて、学習システムの機能構成について説明する。ROM2-33に格納されたプログラムをCPU2-31が実行することにより、図13に示した各部の機能が実現される。なお、プログラムを実行する主体は、1以上のCPUであってもよいし、プログラムを記憶するROMも1以上のメモリであってもよい。また、CPUに替えて、もしくはCPUと併用してGPU(Graphics Processing Unit)など他のプロセッサを用いることとしてもよい。すなわち、少なくとも1以上のプロセッサ(ハードウエア)が当該プロセッサと通信可能に接続された少なくとも1以上のメモリに記憶されたプログラムを実行することで、図13に示した各部の機能が実現される。
(Functional configuration of learning system)
FIG. 13 is a diagram showing an example of the functional configuration of the learning system according to the second embodiment. Hereinafter, the functional configuration of the learning system will be described with reference to FIG. When the CPU 2-31 executes the program stored in the ROM 2-33, the functions of the respective parts shown in FIG. 13 are realized. The main body that executes the program may be one or more CPUs, and the ROM that stores the program may also be one or more memories. Further, another processor such as GPU (Graphics Processing Unit) may be used instead of the CPU or in combination with the CPU. That is, the functions of the respective parts shown in FIG. 13 are realized by executing a program stored in at least one or more memories in which at least one or more processors (hardware) are communicably connected to the processors.
 学習装置2-100は、受付部2-41、取得部2-42、選択部2-43、学習部2-44、識別器2-45、表示制御部2-48、及び表示部2-140を少なくとも有する。学習装置2-100は、データサーバ2-120、及び表示部2-140と通信可能に接続されている。 The learning device 2-100 includes a reception unit 2-41, an acquisition unit 2-42, a selection unit 2-43, a learning unit 2-44, a classifier 2-45, a display control unit 2-48, and a display unit 2-140. Have at least. The learning device 2-100 is communicably connected to the data server 2-120 and the display unit 2-140.
 受付部2-41は、操作部2-150を介して、データセットの選択条件(後述)を受け付ける。 Reception unit 2-41 accepts data set selection conditions (described later) via operation unit 2-150.
 取得部2-42は、データサーバ2-120から初期データセットを取得する。 Acquisition unit 2-42 acquires the initial data set from the data server 2-120.
 選択部2-43は、取得部2-42で取得した初期データセットを処理し、第1の学習用データセットと、第2の学習用データセットを選択する。 The selection unit 2-43 processes the initial data set acquired by the acquisition unit 2-42, and selects the first learning data set and the second learning data set.
 学習部2-44は、選択部2-43で取得した第1の学習用データセットと、第2の学習用データセットを用いて学習を順次実行する。すなわち、第1の学習用データセットを少なくとも用いて学習する第1の学習と、第1の学習で生成された識別器に含まれる情報と第2の学習用データセットを用いて学習することで、識別器に含まれる情報を更新する第2の学習を実行する。第1の学習で生成された識別器に含まれる情報は、識別器内の情報格納部に格納されている。 The learning unit 2-44 sequentially executes learning using the first learning data set acquired by the selection unit 2-43 and the second learning data set. That is, by learning using at least the first learning data set, the first learning, and the information contained in the classifier generated in the first learning and the second learning data set. , Perform a second learning to update the information contained in the classifier. The information included in the classifier generated in the first learning is stored in the information storage unit in the classifier.
 なお、学習装置2-100が有する各部の少なくとも一部を独立した装置として実現してもよい。学習装置2-100はワークステーションでもよい。各部の機能はコンピュータ上で動作するソフトウェアとして実現してもよく、各部の機能を実現するソフトウェアは、クラウドをはじめとするネットワークを介したサーバ上で動作してもよい。以下に説明する本実施形態では、各部はローカル環境に設置したコンピュータ上で動作するソフトウェアによりそれぞれ実現されているものとする。 Note that at least a part of each part of the learning device 2-100 may be realized as an independent device. The learning device 2-100 may be a workstation. The functions of each part may be realized as software that operates on a computer, and the software that realizes the functions of each part may be realized on a server via a network such as a cloud. In the present embodiment described below, it is assumed that each part is realized by software running on a computer installed in a local environment.
 (学習装置の処理手順)
 図14は、第2-1の実施形態に係る識別器の生成方法の一例を示すフロー図である。以下、学習装置の処理手順について説明する。本実施形態は、ROM2-33に格納されている各部の機能を実現するプログラムをCPU2-31が実行することにより実現される。本実施形態では、処理対象画像はTEM画像であるとして説明する。TEM画像は、2次元濃淡画像として取得される。また、本実施形態では例として、メラミン・アルキッド樹脂塗料の塗膜中のカーボンブラックを、識別対象情報として説明する。本実施形態においては、サイズ128×128の画像2000枚、1000対の初期データセットを使用した。学習用と評価用を8:2に分割して使用した。
(Processing procedure of learning device)
FIG. 14 is a flow chart showing an example of a method for generating a classifier according to the second embodiment. Hereinafter, the processing procedure of the learning device will be described. This embodiment is realized by the CPU 2-31 executing a program that realizes the functions of each part stored in the ROM 2-33. In the present embodiment, the image to be processed will be described as a TEM image. The TEM image is acquired as a two-dimensional shading image. Further, in the present embodiment, as an example, carbon black in the coating film of the melamine / alkyd resin paint will be described as identification target information. In this embodiment, 2000 images of size 128 × 128 and 1000 pairs of initial data sets were used. The learning and evaluation were divided into 8: 2 and used.
 まず、学習用データセットの構築方法の一例について説明する。ここでは、識別の種類としてセグメンテーションを行う場合の例を用いて説明するが、本実施形態の適用範囲は、処理の種類に限定されるものではない。また、学習用データセットは、学習用データを含むものである。学習用データは、入力データと、入力データに対する教師データで構成される。入力データが画像データ(入力画像データ)である場合、教師データは、画像データに識別対象情報を付帯させたものとなる。例えば、画像データにおいて識別対象領域が示されたものである。 First, an example of how to build a learning data set will be explained. Here, an example in which segmentation is performed as the type of identification will be described, but the scope of application of the present embodiment is not limited to the type of processing. In addition, the learning data set includes the learning data. The learning data is composed of input data and teacher data for the input data. When the input data is image data (input image data), the teacher data is the image data with the identification target information attached. For example, the identification target area is shown in the image data.
 まず、識別対象画像であるTEM画像を複数枚用意する。次に、それぞれの画像について、正解画像を作成する。ここで、正解画像とは、識別対象画像中の識別対象情報を、適当な画像処理方法を用いて処理したものである。例えば、識別対象情報とそれ以外の情報を二値化した画像や、識別対象情報を塗りつぶした画像などである。本実施形態においては、TEM画像中のカーボンブラックを輝度値(0,255,0)で塗りつぶした画像を用いて説明する。 First, prepare a plurality of TEM images that are identification target images. Next, a correct image is created for each image. Here, the correct image is an image obtained by processing the identification target information in the identification target image by using an appropriate image processing method. For example, an image obtained by binarizing the identification target information and other information, or an image filled with the identification target information. In the present embodiment, the carbon black in the TEM image will be described using an image filled with a luminance value (0,255,0).
 ステップS2-201において、受付部2-41は、操作部2-150を介して、データセットの選択条件を受け付ける。データセットの選択条件は、ユーザーによって入力される。本実施形態において、データセットの選択条件は、少なくとも初期データセットを区分する方法、区分されたデータセットのうち、学習に使用するデータセットの情報と、学習順を含む。また、ここでは、データセットを区分する方法として、識別対象情報の量の閾値によって区分する方法を用いる。識別対象情報の量は、輝度値(0,255,0)で塗りつぶされた画素の個数で定義する。また、ここでは、閾値の値を5000pixelとする。 In step S2-201, the reception unit 2-41 receives the data set selection condition via the operation unit 2-150. The dataset selection criteria are entered by the user. In the present embodiment, the data set selection condition includes at least a method of dividing the initial data set, information on the data set used for training among the divided data sets, and a learning order. Further, here, as a method of classifying the data set, a method of classifying by the threshold value of the amount of identification target information is used. The amount of identification target information is defined by the number of pixels filled with the luminance value (0,255,0). Further, here, the threshold value is set to 5000pixel.
 ステップS2-202において、取得部2-42は、データサーバ2-120から初期データセットを取得する。 In step S2-202, the acquisition unit 2-42 acquires the initial data set from the data server 2-120.
 ステップS2-203において、選択部2-43は、取得部2-42で取得した初期データセットを処理し、第1の学習用データセットと、第2の学習用データセットを選択する。 In step S2-203, the selection unit 2-43 processes the initial data set acquired by the acquisition unit 2-42, and selects the first learning data set and the second learning data set.
 ステップS2-203の処理について説明する。ここでも、メラミン・アルキッド樹脂中のカーボンブラックを識別対象情報とする。まず、識別対象情報の量が多い方から順にデータセットを並び替える。すなわち、輝度値(0,255,0)で塗りつぶされた画素の個数が多い順にデータセットを並び替える。 The process of step S2-203 will be described. Here, too, the carbon black in the melamine / alkyd resin is used as the identification target information. First, the data sets are sorted in order from the one with the largest amount of identification target information. That is, the data sets are sorted in descending order of the number of pixels filled with the luminance value (0,255,0).
 次に、受付部2-41において受け付けた閾値に従ってデータセットを区分する。そして、受付部2-41において受け付けた、学習に使用するデータセットの情報と、学習順に従って学習プロセスを決定する。ここでは、識別対象情報の量が5000pixel以上の画像を含むデータセットを第1の学習用データセットとし、識別対象情報の量が0pixel以上の画像を含むデータセットを第2の学習用データセットとする。このように、第2の学習用データセットが第1の学習用データセットを含むことで、識別精度のより高い識別器を生成することが可能となる。 Next, the data set is divided according to the threshold value received by the reception unit 2-41. Then, the learning process is determined according to the information of the data set used for learning received by the reception unit 2-41 and the learning order. Here, a data set containing images having an amount of identification target information of 5000 pixels or more is referred to as a first training data set, and a data set containing images having an amount of identification target information of 0 pixels or more is referred to as a second learning data set. To do. As described above, when the second learning data set includes the first learning data set, it is possible to generate a classifier having higher discrimination accuracy.
 ステップS2-204において、学習部2-44は、選択部で選択した第1の学習用データセットを用いて学習を実行する。ここで、学習とは、学習用データセットを用いて、所定のアルゴリズムに従った機械学習を実施することにより、識別器を生成することを指す。本実施形態では、所定のアルゴリズムとして、U-Netを用いる。なお、U-Netの学習方法は、周知技術であるため、本実施形態では詳細な説明を省略する。 In step S2-204, the learning unit 2-44 executes learning using the first learning data set selected by the selection unit. Here, learning refers to generating a classifier by performing machine learning according to a predetermined algorithm using a learning data set. In this embodiment, U-Net is used as a predetermined algorithm. Since the learning method of U-Net is a well-known technique, detailed description thereof will be omitted in the present embodiment.
 また、所定のアルゴリズムとして、例えば、SVM(Support vector machine)、DNN(Deep Neural Network)、CNN(Convolutional Neural Network)などを用いてもよい。また、1ピクセル単位でクラス分類を行うSemantic Segmentationに用いられるアルゴリズムとして、U-Netの他に、FCN (Fully Convolutional Network)、SegNetなどを用いることもできる。 Further, as a predetermined algorithm, for example, SVM (Support Vector machine), DNN (Deep Neural Network), CNN (Convolutional Neural Network), or the like may be used. In addition to U-Net, FCN (Fully Convolutional Network), SegNet, and the like can also be used as an algorithm used for Semantic Segmentation that classifies classes in 1-pixel units.
 さらに、上記アルゴリズムにGAN(Generative AdversarialNetworks)などの所謂生成モデルを組み合わせたアルゴリズムを用いてもよい。実行したい処理が複数種類ある場合は、それぞれの処理を実行できるように、それぞれ別の学習モデルを構築する。また、学習に用いるデータ量を増やすために、水増し(Data Augmentation)を行ってもよい。 Further, an algorithm in which a so-called generative model such as GAN (Generative Adversarial Networks) is combined with the above algorithm may be used. If there are multiple types of processing that you want to execute, build different learning models so that each processing can be executed. Further, in order to increase the amount of data used for learning, inflating (Data Augmentation) may be performed.
 本実施形態における水増しとは、例えば、回転、反転、輝度変換、歪み付与、拡大、及び縮小の少なくとも1つを行うことで、学習に用いる新たなデータを生成し、データ量を増やすことである。データの水増しをデータオーグメンテーションと言い換えることもできる。また、入力データが音声データである場合、入力データに、1種又は複数種の周波数の音を組み合わせた音を付与することで、学習に用いる新たなデータを生成し、データ量を増やすことができる。 The padding in the present embodiment is to generate new data used for learning and increase the amount of data by performing at least one of rotation, inversion, luminance conversion, distortion addition, enlargement, and reduction, for example. .. Inflating data can also be rephrased as data augmentation. Further, when the input data is audio data, it is possible to generate new data used for learning and increase the amount of data by adding a sound combining sounds of one or more kinds of frequencies to the input data. it can.
 また、過学習抑制のために、初期データセットをあらかじめ学習用データセットと評価用データセットに分割しておくことが好ましい。 Further, in order to suppress overfitting, it is preferable to divide the initial data set into a learning data set and an evaluation data set in advance.
 ステップS2-205において、ステップS2-204で生成された情報を識別器の情報格納部2-46に保存する。 In step S2-205, the information generated in step S2-204 is stored in the information storage unit 2-46 of the classifier.
 ステップS2-206において、ステップS2-205において保存した識別器に含まれる情報と、第2の学習用データセットを用いて、学習を行う。ここで、識別器に含まれる情報とは、モデルの構造、重み、バイアスなどを指す。重み、バイアスとは、入力から出力を計算する際のパラメータであり、例えば、ニューラルネットワークの場合、式(2-1)のxを入力とすると、wが重み、bがバイアスである。本実施例においては、モデル構造は変更せず、重みとバイアスを第2の学習用データセットに最適化するように学習を行う。 In step S2-206, learning is performed using the information contained in the classifier saved in step S2-205 and the second learning data set. Here, the information contained in the classifier refers to the structure, weight, bias, and the like of the model. The weight and bias are parameters when calculating the output from the input. For example, in the case of a neural network, when x in the equation (2-1) is input, w is the weight and b is the bias. In this embodiment, the model structure is not changed, and training is performed so as to optimize the weights and biases for the second training data set.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 学習終了後、表示制御部2-48は、学習結果を表示部2-140に表示させる。この場合、表示制御部2-48は、学習結果を学習装置2-100に接続された表示部2-140に送信し、表示部2-140に表示させる制御を行う。本実施形態では、入力画像、正解画像、生成された識別器を用いて推論処理をおこなった画像とを並べて表示することで、学習の進み具合が確認できる。また、より詳細に学習の進み具合を確認するために、IoU(後述)の値を表示してもよい。 After the learning is completed, the display control unit 2-48 displays the learning result on the display unit 2-140. In this case, the display control unit 2-48 controls to transmit the learning result to the display unit 2-140 connected to the learning device 2-100 and display the learning result on the display unit 2-140. In the present embodiment, the progress of learning can be confirmed by displaying the input image, the correct answer image, and the image subjected to the inference processing using the generated discriminator side by side. Further, in order to confirm the progress of learning in more detail, the value of IoU (described later) may be displayed.
 第2-1の実施形態に係る学習装置の効果について説明する。本実施形態において、効果を測定するために、評価指標としてIoU(Intersection over Union)を用いた。IoUは式(2-2)により、定義される。 The effect of the learning device according to the second embodiment will be described. In this embodiment, IoU (Intersection over Union) was used as an evaluation index in order to measure the effect. IoU is defined by equation (2-2).
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 ここで、TP(True Positive)は、カーボンブラックである画素をカーボンブラックであると判定した数、FP(False Positive)は、カーボンブラックでない画素をカーボンブラックであると判定した数(誤検知数)である。さらに、FN(False Negative)は、カーボンブラックである画素をカーボンブラックでないと判定した数(未検知数)である。ここでは、評価用の400枚の画像についてそれぞれIoUの値を求めて平均したIoUavgを使用した。従来手法では、IoUavg=0.11であったが、本手法を用いることで、IoUavg=0.36となった。 Here, TP (True Positive) is the number of pixels that are carbon black determined to be carbon black, and FP (False Positive) is the number of pixels that are not carbon black are determined to be carbon black (number of false positives). Is. Further, FN (False Negative) is a number (undetected number) in which pixels that are carbon black are determined not to be carbon black. Here, IoUavg obtained by calculating the value of IoU for each of the 400 images for evaluation and averaging them was used. In the conventional method, IoUavg = 0.11, but by using this method, IoUavg = 0.36.
 以上のように、本実施形態における学習装置2-100は、識別対象情報の量が多いデータセットから順次学習する。そのため、まず変換度合いが大きい画像変換のパラメータを学習し、そのパラメータを徐々に変化させていくことができるため、識別対象情報を精度よく識別することができる。 As described above, the learning device 2-100 in the present embodiment sequentially learns from the data set having a large amount of identification target information. Therefore, it is possible to first learn the parameters of image conversion having a large degree of conversion and gradually change the parameters, so that the identification target information can be accurately identified.
 (第2-2の実施形態)
 (概要)
 図面を参照して、第2-2の実施形態の一例を詳しく説明する。なお、上記の実施形態と同様の構成、機能、及び動作についての説明は省略し、主に上記実施形態との差異について説明する。
(Embodiment of 2-2)
(Overview)
An example of the second embodiment will be described in detail with reference to the drawings. The description of the configuration, function, and operation similar to those of the above embodiment will be omitted, and the differences from the above embodiment will be mainly described.
 (学習システムの機能構成)
 図15は、第2-2の実施形態に係る学習システム(識別器の生成システム)の機能構成の一例を示す図である。学習装置(識別器の生成装置)2-200は、受付部2-41、取得部2-42、選択部2-43、学習部2-44、識別器2-45、表示制御部2-48、データ拡張部2-49、及び表示部2-140を少なくとも有する。
(Functional configuration of learning system)
FIG. 15 is a diagram showing an example of the functional configuration of the learning system (identifier generation system) according to the second embodiment. The learning device (identifier generator) 2-200 includes a reception unit 2-41, an acquisition unit 2-42, a selection unit 2-43, a learning unit 2-44, a discriminator 2-45, and a display control unit 2-48. , Data expansion unit 2-49, and display unit 2-140.
 データ拡張部2-49は、取得部2-42で取得した初期データセットを拡張する。すなわち、データ拡張部2-49では、入力データの画像を増やすことができる。 The data expansion unit 2-49 expands the initial data set acquired by the acquisition unit 2-42. That is, the data expansion unit 2-49 can increase the number of images of input data.
 (学習装置の処理手順)
 図16は、第2-2の実施形態に係る識別器の生成方法の一例を示すフロー図である。ステップS2-301において、受付部2-41は、操作部2-150を介して、データセットの選択条件を受け付ける。データセットの選択条件は、ユーザーによって入力される。本実施形態において、データセットの選択条件は、少なくとも初期データセット中の画像1枚あたりのデータ拡張数、パッチサイズ、データセットを区分する方法、区分されたデータセットのうち、学習に使用するデータセットの情報と、学習順を含む。なお、パッチサイズとは、画像から一部を選択する際の、選択した画像の縦と横の画素数のことである。ここでは、データセットを区分する方法は、識別対象情報の量の閾値によって区分するものとする。識別対象情報の量は、輝度値(0,255,0)で塗りつぶされた画素の個数で定義する。また、閾値の値は、5000pixel、1000pixelの2つとする。
(Processing procedure of learning device)
FIG. 16 is a flow chart showing an example of a method for generating a classifier according to the second embodiment. In step S2-301, the reception unit 2-41 receives the data set selection condition via the operation unit 2-150. The dataset selection criteria are entered by the user. In the present embodiment, the data set selection condition is at least the number of data expansions per image in the initial data set, the patch size, the method of dividing the data set, and the data used for training among the divided data sets. Includes set information and learning order. The patch size is the number of vertical and horizontal pixels of the selected image when a part of the image is selected. Here, the method of classifying the data set shall be based on the threshold value of the amount of identification target information. The amount of identification target information is defined by the number of pixels filled with the luminance value (0,255,0). Further, the threshold value is set to two, 5000pixel and 1000pixel.
 ステップS2-302において、取得部2-42は、データサーバ2-120から初期データセットを取得する。 In step S2-302, the acquisition unit 2-42 acquires the initial data set from the data server 2-120.
 ステップS2-303において、データ拡張部2-43は、取得部2-42で取得した初期データセットを拡張する。本実施形態においては、1280×960のサイズの20枚40対の画像を含む初期データセットそれぞれから、100枚ずつパッチサイズ128×128の画像を切り抜くことで、2000枚の入力データを生成する。また、学習用と評価用を8:2に分割して使用した。 In step S2-303, the data expansion unit 2-43 expands the initial data set acquired by the acquisition unit 2-42. In the present embodiment, 2000 input data are generated by cutting out 100 images of patch size 128 × 128 from each of the initial data sets including 20 images of 40 pairs having a size of 1280 × 960. In addition, learning and evaluation were divided into 8: 2 and used.
 図17は、第2-2の実施形態に係るデータ拡張処理手順の一例を示す図である。図15を用いて、ステップS2-303の処理について説明する。ここでも、メラミン・アルキッド樹脂中のカーボンブラックを識別対象情報とする。ステップS2-303において、データ拡張部2-43は、初期データセットに対して、複数の注目領域を抽出することで、データを拡張する。 FIG. 17 is a diagram showing an example of the data expansion processing procedure according to the second embodiment. The process of step S2-303 will be described with reference to FIG. Here, too, the carbon black in the melamine / alkyd resin is used as the identification target information. In step S2-303, the data expansion unit 2-43 expands the data by extracting a plurality of areas of interest with respect to the initial data set.
 図17に、位置座標2-530、位置座標2-531、位置座標2-532それぞれに対して、注目領域2-540、注目領域2-541、注目領域2-542を抽出している例を示す。本実施形態における入力画像は、2次元直交座標(x,y)によって位置が特定できる複数の画素により構成される。画像の横方向、縦方向の画素数をそれぞれx_size、y_sizeとすると、0≦x≦x_size、0≦y≦y_sizeが成り立つ。画像の左上を起点に右方向にx軸、下方向にy軸をとり、互いに異なる複数の位置座標を(x-,y-)(i=1,2,・・・,N)とし、0≦x-≦x_size、0≦y-≦y_sizeを満たすように設定する。本実施形態では、0≦x-≦x_size、0≦y-≦y_sizeを満たす乱数の組(x-,y-)を生成することとする。次に(x-,y-)を左上の座標として、注目領域を設定する。注目領域の大きさは、パッチサイズと等しくなるようにする。また、(x-,y-)が画像の端部付近に位置し、注目領域の大きさがパッチサイズより小さくなってしまう場合は、画像の周囲を画素値0で埋める、所謂パディング処理などを行うことで、注目領域の大きさがパッチサイズと同じになるように調整する。 FIG. 17 shows an example in which the area of interest 2-540, the area of interest 2-541, and the area of interest 2-542 are extracted for each of the position coordinates 2-530, the position coordinates 2-531, and the position coordinates 2-532. Shown. The input image in this embodiment is composed of a plurality of pixels whose positions can be specified by two-dimensional Cartesian coordinates (x, y). Assuming that the number of pixels in the horizontal direction and the vertical direction of the image is x_size and y_size, respectively, 0 ≦ x ≦ x_size and 0 ≦ y ≦ y_size hold. X-axis upper left to the right direction to the origin of the image, taking the y-axis downward, a plurality of different coordinates from each other (x- i, y i -) (i = 1,2, ···, N) and then , 0 ≤ x- i ≤ x_size, 0 ≤ y i -≤ y_size. In the present embodiment, a set of random numbers (x- i , y- i- ) satisfying 0 ≤ x-i ≤ x_size and 0 ≤ y i -≤ y_size is generated. Next, the region of interest is set with (x- i , y- i-) as the upper left coordinate. The size of the area of interest should be equal to the patch size. Further, (x- i, y i - ) is positioned near the end of the image, when the size of the region of interest becomes smaller than the patch size may fill the periphery of the image pixel values 0, so-called padding process By doing something like this, adjust the size of the area of interest so that it is the same as the patch size.
 ステップS2-304において、選択部2-43は、取得部2-42で取得した初期データセットを処理し、第1の学習用データセットと、第2の学習用データセットを選択する。 In step S2-304, the selection unit 2-43 processes the initial data set acquired by the acquisition unit 2-42, and selects the first learning data set and the second learning data set.
 ステップS2-304の処理について説明する。まず、識別対象情報の量が多い方から順にデータセットを並び替える。すなわち、輝度値(0,255,0)で塗りつぶされた画素の個数が多い順にデータセットを並び替える。次に、受付部2-41において受け付けた閾値に従ってデータセットを区分する。次に、受付部2-41において受け付けた、学習に使用するデータセットの情報と、学習順に従って学習プロセスを決定する。ここでは、識別対象情報の量が5000pixel以上の画像を含むデータセットを第1の学習用データセットとし、識別対象情報の量が1000pixel以上の画像を含むデータセットを第2の学習用データセットとする。また、識別対象情報の量が0pixel以上の画像を含むデータセットを第3の学習用データセットとして、さらに学習に用いてもよい。このように、第2の学習用データセットは、第1の学習用データセットを含み、第3の学習用データセットは、第1の学習用データセット、及び第2の学習用データセットを含む。これにより、識別精度のより高い識別器を生成することが可能となる。 The process of step S2-304 will be described. First, the data sets are sorted in order from the one with the largest amount of identification target information. That is, the data sets are sorted in descending order of the number of pixels filled with the luminance value (0,255,0). Next, the data set is divided according to the threshold value received by the reception unit 2-41. Next, the learning process is determined according to the information of the data set used for learning received by the reception unit 2-41 and the learning order. Here, a data set containing an image having an amount of identification target information of 5000 pixels or more is referred to as a first training data set, and a data set containing an image having an amount of identification target information of 1000 pixels or more is referred to as a second learning data set. To do. Further, a data set including an image in which the amount of identification target information is 0pixel or more may be used as a third learning data set for further learning. As described above, the second training data set includes the first training data set, and the third training data set includes the first training data set and the second training data set. .. This makes it possible to generate a classifier with higher discrimination accuracy.
 ステップS2-305において、学習部2-44は、選択部で選択した第1の学習用データセットを用いて学習を実行する。 In step S2-305, the learning unit 2-44 executes learning using the first learning data set selected by the selection unit.
 ステップS2-306において、ステップS2-305で生成された情報を識別器の情報格納部2-46に保存する。 In step S2-306, the information generated in step S2-305 is stored in the information storage unit 2-46 of the classifier.
 ステップS2-307において、ステップS2-306において情報格納部で保存された識別器に含まれる情報と、第2の学習用データセットを用いて、学習を行うことで、識別器に含まれる情報を更新する。ここで、識別器に含まれる情報とは、モデルの構造、重み、バイアスなどを指す。また、第2の学習用データセットを用いた学習で生成した識別器に含まれる情報と、第3の学習用データセットを用いて、さらに学習を行ってもよい。さらに、データセットの数は多くてもよいが、識別対象情報の量は、学習工程の数がn(nは2以上の整数)回である場合、nが多くなるにつれて単調減少することが好ましい。すなわち、学習回数に対して識別対象情報の量をプロットした際の傾きが負となることが好ましい。学習終了後、表示制御部2-48は、学習結果を表示部2-140に表示させる。 In step S2-307, the information contained in the classifier is obtained by learning using the information stored in the classifier stored in the information storage unit in step S2-306 and the second learning data set. Update. Here, the information contained in the classifier refers to the structure, weight, bias, and the like of the model. Further, further learning may be performed using the information contained in the discriminator generated by the learning using the second learning data set and the third learning data set. Further, although the number of data sets may be large, the amount of identification target information preferably decreases monotonically as n increases when the number of learning steps is n (n is an integer of 2 or more). .. That is, it is preferable that the slope when plotting the amount of identification target information with respect to the number of learnings is negative. After the learning is completed, the display control unit 2-48 causes the display unit 2-140 to display the learning result.
 第2-2の実施形態に係る学習装置の効果について説明する。本実施形態においては、初期データセットを拡張して生成した2000枚のうち400枚を評価用画像として使用した。また、第2-1の実施形態と同様、IoUavgを用いた。従来手法では、IoUavg=0.13であったが、本手法を用いることで、IoUavg=0.56となった。 The effect of the learning device according to the second embodiment will be described. In this embodiment, 400 out of 2000 images generated by expanding the initial data set were used as evaluation images. In addition, IoUavg was used as in the second embodiment. In the conventional method, IoUavg = 0.13, but by using this method, IoUavg = 0.56.
 以上のように、本実施形態における学習装置2-100は、識別対象情報の量が多いデータセットから順次学習することで、精度よく識別対象情報を識別することができる。 As described above, the learning device 2-100 in the present embodiment can accurately identify the identification target information by sequentially learning from the data set having a large amount of identification target information.
 また、データを拡張することで、多くの入力データを用意できない場合にも適用することができる。 Also, by expanding the data, it can be applied even when a lot of input data cannot be prepared.
 (第2-3の実施形態)
 (概要)
 次に、図面を参照して、第2-3の実施形態の一例を詳しく説明する。なお、上記の実施形態と同様の構成、機能、及び動作についての説明は省略し、主に上記実施形態との差異について説明する。本実施形態では、光学顕微鏡画像中の、血液中の赤血球、白血球、血小板を処理対象画像に含まれる処理対象物の例として説明する。
(Embodiment 2-3)
(Overview)
Next, an example of the second and third embodiments will be described in detail with reference to the drawings. The description of the configuration, function, and operation similar to those of the above embodiment will be omitted, and the differences from the above embodiment will be mainly described. In the present embodiment, red blood cells, white blood cells, and platelets in blood in the optical microscope image will be described as an example of the object to be processed included in the image to be processed.
 また、本実施形態では、データセットの選択を自動で行い、評価値が目標値に達するまで、学習プロセスを繰り返す。画像中の赤血球部分を輝度値(255,0,0)、白血球部分を輝度値(0,255,0)、血小板部分を輝度値(0,0,255)で塗りつぶした画像を正解データとして用いた。そして、サイズ128×128の画像2000枚、1000対の初期データセットを使用した。学習用と評価用を8:2に分割して使用した。図18は、第2-3の実施形態に係る入力データの一例を示す図である。 Further, in the present embodiment, the data set is automatically selected, and the learning process is repeated until the evaluation value reaches the target value. The red blood cell part in the image is filled with the brightness value (255,0,0), the white blood cell part is filled with the brightness value (0,255,0), and the platelet part is filled with the brightness value (0,0,255). There was. Then, 2000 images of size 128 × 128 and 1000 pairs of initial data sets were used. The learning and evaluation were divided into 8: 2 and used. FIG. 18 is a diagram showing an example of input data according to the second and third embodiments.
 (学習システムの機能構成)
 図19は、第2-3の実施形態に係る学習システム(識別器の生成システム)の機能構成の一例を示す図である。学習装置(識別器の生成装置)2-300は、機能構成として、受付部2-41、取得部2-42、選択部2-43、学習部2-44、識別器2-45、評価部2-50、表示制御部2-48、及び表示部2-140を少なくとも有する。
(Functional configuration of learning system)
FIG. 19 is a diagram showing an example of the functional configuration of the learning system (identifier generation system) according to the second to third embodiments. The learning device (identifier generator) 2-300 has a functional configuration of a reception unit 2-41, an acquisition unit 2-42, a selection unit 2-43, a learning unit 2-44, a discriminator 2-45, and an evaluation unit. It has at least 2-50, a display control unit 2-48, and a display unit 2-140.
 評価部2-50は、識別器2-45に保存された識別器を用いて推論を行い、IoUavgの値が目標値より高い場合は学習を終了し、IoUavgの値が目標値より低い場合は、学習プロセスを繰り返す。 Evaluation unit 2-50 makes inferences using the discriminator stored in discriminator 2-45, ends learning when the value of IoUavg is higher than the target value, and ends learning when the value of IoUavg is lower than the target value. , Repeat the learning process.
 (学習装置の処理手順)
 図20は、第2-3の実施形態に係る識別器の生成方法の一例を示すフロー図である。ステップS2-401において、受付部2-41は、操作部2-150を介して、データセットの選択条件を受け付ける。データセットの選択条件は、ユーザーによって入力される。本実施形態において、選択条件は、少なくともIoUの目標値、上限学習時間、階級の幅の初期値を含む。またここでは、初期データセットを区分する方法は、識別対象情報の量に従って初期データセットの階級分けを行うものとする。階級の幅の初期値は1000とする。
(Processing procedure of learning device)
FIG. 20 is a flow chart showing an example of a method for generating a classifier according to the second and third embodiments. In step S2-401, the reception unit 2-41 receives the data set selection condition via the operation unit 2-150. The dataset selection criteria are entered by the user. In the present embodiment, the selection condition includes at least the target value of IoU, the upper limit learning time, and the initial value of the class width. Further, here, the method of classifying the initial data set is to classify the initial data set according to the amount of identification target information. The initial value of the class width is 1000.
 ステップS2-402において、取得部2-42は、データサーバ2-120から初期データセットを取得する。 In step S2-402, the acquisition unit 2-42 acquires the initial data set from the data server 2-120.
 ステップS2-403において、選択部2-43は、取得部2-42で取得した初期データセットを処理し、第1の学習用データセットと、第2の学習用データセットを選択する。 In step S2-403, the selection unit 2-43 processes the initial data set acquired by the acquisition unit 2-42, and selects the first learning data set and the second learning data set.
 図19を用いて、ステップS2-403の処理について説明する。選択部2-43は、受付部2-41で受け付けた階級の幅の初期値に従って初期データセットを階級に区分する。ここでは、最も識別対象情報の量が多い階級に属するデータセットを第1の学習用データセットとし、最も識別対象情報の量が多い階級と、2番目に識別対象情報の量が多い階級に属するデータセットを合わせたものを第2の学習用データセットとする。 The process of step S2-403 will be described with reference to FIG. The selection unit 2-43 divides the initial data set into classes according to the initial value of the width of the class received by the reception unit 2-41. Here, the data set belonging to the class having the largest amount of identification target information is set as the first learning data set, and belongs to the class having the largest amount of identification target information and the class having the second largest amount of identification target information. The combined data set is used as the second training data set.
 ステップS2-404において、学習部2-44は、選択部で選択した第1の学習用データセットを用いて学習を実行する。 In step S2-404, the learning unit 2-44 executes learning using the first learning data set selected by the selection unit.
 ステップS2-405において、ステップS2-404で生成された情報を情報格納部46に保存する。 In step S2-405, the information generated in step S2-404 is stored in the information storage unit 46.
 ステップS2-406において、ステップS2-405において情報格納部で保存された識別器に含まれる情報と、第2の学習用データセットを用いて、学習を行うことで、識別器に含まれる情報を更新する。ここで識別器に含まれる情報とは、モデルの構造、重み、バイアスなどを指す。 In step S2-406, the information contained in the classifier is obtained by learning using the information stored in the classifier stored in the information storage unit in step S2-405 and the second learning data set. Update. Here, the information contained in the classifier refers to the structure, weight, bias, etc. of the model.
 ステップS2-407において、評価部2-50は、識別器2-45を用いて推論を行い、IoUavgの値が目標値より高い場合は学習を終了し、IoUavgの値が目標値より低い場合は、学習プロセスを繰り返す。学習終了後、表示制御部2-48は、学習結果を表示部2-140に表示させる。 In step S2-407, the evaluation unit 2-50 makes an inference using the classifier 2-45, ends learning when the value of IoUavg is higher than the target value, and ends learning when the value of IoUavg is lower than the target value. , Repeat the learning process. After the learning is completed, the display control unit 2-48 causes the display unit 2-140 to display the learning result.
 第2-3の実施形態に係る画像処理装置の効果について説明する。本実施形態では、mIoUを用いて評価を行った。mIoUは式(2-3)によって定義される。 The effect of the image processing device according to the second to third embodiments will be described. In this embodiment, evaluation was performed using mIoU. mIoU is defined by equation (2-3).
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 cは分類のクラス数であり、本実施例においてはc=3である。従来手法では、IoUavg=0.08であったが、本手法を用いることで、IoUavg=0.45となった。 C is the number of classification classes, and in this embodiment, c = 3. In the conventional method, IoUavg = 0.08, but by using this method, IoUavg = 0.45.
 上記の通り、本発明の第2の実施形態によれば、1つのデータ中に識別対象情報が複数存在する場合や、識別対象情報とそれ以外の情報が区別しづらい場合であっても、識別対象情報を精度よく識別できる識別器の生成方法を提供することができる。また、本発明によれば、識別対象情報を精度よく識別できる識別器の生成方法により生成された識別器を用いた識別方法及び識別装置を提供することができる。 As described above, according to the second embodiment of the present invention, even when a plurality of identification target information exists in one data or when it is difficult to distinguish between the identification target information and other information, identification is performed. It is possible to provide a method of generating a classifier that can accurately identify target information. Further, according to the present invention, it is possible to provide an identification method and an identification device using the identification device generated by the identification device generation method capable of accurately identifying the identification target information.
 <その他の実施形態>
 上述の各実施形態における学習装置および学習システムは、単体の装置として実現してもよいし、複数の情報処理装置を含む装置を互いに通信可能に組合せて上述の処理を実行する形態としてもよく、いずれも本発明の実施形態に含まれる。共通のサーバ装置あるいはサーバ群で、上述の処理を実行することとしてもよい。この場合、当該共通のサーバ装置は実施形態に係る学習装置に対応し、当該サーバ群は実施形態に係る学習システムに対応する。学習装置および学習システムを構成する複数の装置は所定の通信レートで通信可能であればよく、また同一の施設内あるいは同一の国に存在することを要しない。
<Other Embodiments>
The learning device and the learning system in each of the above-described embodiments may be realized as a single device, or may be a form in which devices including a plurality of information processing devices are combined so as to be able to communicate with each other to execute the above-mentioned processing. Both are included in the embodiments of the present invention. The above-mentioned processing may be executed by a common server device or a group of servers. In this case, the common server device corresponds to the learning device according to the embodiment, and the server group corresponds to the learning system according to the embodiment. The learning device and the plurality of devices constituting the learning system need not be present in the same facility or in the same country as long as they can communicate at a predetermined communication rate.
 以上、実施形態例を詳述したが、本発明は、例えば、システム、装置、方法、プログラム、又は記録媒体(記憶媒体)などとしての実施態様をとることが可能である。具体的には、複数の機器(例えば、ホストコンピュータ、インタフェース機器、撮像装置、Webアプリケーションなど)から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 Although the embodiment examples have been described in detail above, the present invention can take an embodiment as, for example, a system, an apparatus, a method, a program, or a recording medium (storage medium). Specifically, it may be applied to a system composed of a plurality of devices (for example, a host computer, an interface device, an imaging device, a Web application, etc.), or it may be applied to a device composed of one device. good.
 また、本発明の目的は、以下のようにすることによって達成されることはいうまでもない。すなわち、前述した実施形態の機能を実現するソフトウェアのプログラムコード(コンピュータプログラム)を記録した記録媒体(又は記憶媒体)を、システムあるいは装置に供給する。係る記憶媒体は言うまでもなく、コンピュータ読み取り可能な記憶媒体である。そして、そのシステムあるいは装置のコンピュータ(又はCPUやGPU)が記録媒体に格納されたプログラムコードを読み出し実行する。この場合、記録媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本発明を構成することになる。 Needless to say, the object of the present invention is achieved by doing the following. That is, a recording medium (or storage medium) in which a software program code (computer program) that realizes the functions of the above-described embodiment is recorded is supplied to the system or device. Needless to say, the storage medium is a computer-readable storage medium. Then, the computer (or CPU or GPU) of the system or device reads and executes the program code stored in the recording medium. In this case, the program code itself read from the recording medium realizes the function of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention.
 以上、本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。例えば、適宜前処理、後処理を加えてもよい。 Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the specific embodiments, and various modifications are made within the scope of the gist of the present invention described in the claims.・ Can be changed. For example, pretreatment and posttreatment may be added as appropriate.
 上述の実施形態を適宜組み合わせた形態も、本発明の実施形態に含まれる。 An embodiment in which the above-described embodiments are appropriately combined is also included in the embodiment of the present invention.
 <<第3の実施形態>>
 (第3の実施形態の背景)
 近年、深層学習を用いて様々なデータを処理し、有用な情報を得る試みが数多く行われている。例えば、画像処理、音声処理、テキスト処理などが知られている。深層学習を用いることにより、従来の手法と比べて識別精度が向上しているが、さらなる識別精度向上のために、様々な取り組みが行われている。
<< Third Embodiment >>
(Background of the third embodiment)
In recent years, many attempts have been made to process various data using deep learning to obtain useful information. For example, image processing, voice processing, text processing and the like are known. By using deep learning, the identification accuracy is improved as compared with the conventional method, but various efforts are being made to further improve the identification accuracy.
 上記文献2-1には、深層学習を用いて疾患領域の診断を支援する診断支援装置について記載されている。この技術は、予め画像の色輝度を正規化して疾患部と非疾患部を分離することで、精度の高い診断をおこなう。 The above document 2-1 describes a diagnostic support device that supports diagnosis of a diseased area by using deep learning. This technique performs highly accurate diagnosis by normalizing the color brightness of an image in advance and separating the diseased part and the non-diseased part.
 また、上記文献2-2には、複数の識別器を接続し、明らかに正常であるサンプルを取り除きながら学習することで、結節候補画像から結節を精度よく識別する技術が開示されている。このように複数の識別器を接続したものは、カスケード型識別器と呼ばれ、識別精度を向上させるためにしばしばおこなわれるテクニックである。 Further, the above-mentioned Document 2-2 discloses a technique for accurately identifying a nodule from a nodule candidate image by connecting a plurality of classifiers and learning while removing a sample that is clearly normal. Connecting a plurality of classifiers in this way is called a cascade type classifier, and is a technique often used to improve the discrimination accuracy.
 (第3の実施形態で解決しようとする課題)
 しかしながら1つのデータの中に識別したい領域(以下、識別対象領域)が複数あったり、識別対象領域とそれ以外の領域の区別が困難だったりする場合次のような課題を有する。すなわち、上記文献2-1のように予め分離処理することや、上記文献2-2のように明らかに正常であるサンプルを取り除きながら学習することは困難であるという課題があった。
(Problem to be solved in the third embodiment)
However, when there are a plurality of areas to be identified (hereinafter referred to as identification target areas) in one data, or when it is difficult to distinguish between the identification target area and other areas, there are the following problems. That is, there is a problem that it is difficult to perform the separation process in advance as in the above-mentioned document 2-1 and to learn while removing the sample which is clearly normal as in the above-mentioned document 2-2.
 (識別器の生成方法)
 本実施形態に係る識別器の生成方法は、データ中の識別対象情報を推定するための識別器の生成方法である。具体的には、学習データセット群に対して学習データの水増しを行う水増し工程(S3-102)と、水増しされた学習データセット群を用いて学習を行うことで識別器を生成する生成工程(S3-103)と、を少なくとも有する(図21)。
(Method of generating a classifier)
The method of generating the classifier according to the present embodiment is a method of generating the classifier for estimating the identification target information in the data. Specifically, a padding step (S3-102) in which training data is padded with respect to the training data set group, and a generation step (S3-102) in which a classifier is generated by performing training using the padded learning data set group. S3-103) and at least (FIG. 21).
 ここで、学習データセット群は、少なくとも第1の学習データセットと、第2の学習データセットを含む。第1及び第2の学習データセットは、学習データを含む。学習データは、入力データ、及びその入力データに対する教師データで構成される。第2の学習データセットは、第1の学習データセットよりも多い数の学習データを含む。 Here, the training data set group includes at least the first training data set and the second training data set. The first and second training data sets include training data. The learning data is composed of input data and teacher data for the input data. The second training data set contains a larger number of training data than the first training data set.
 また、第1の学習データセットに含まれる入力データが有する識別対象情報の量は、第2の学習データセットに含まれる入力データが有する識別対象情報の量よりも多い。 Further, the amount of identification target information contained in the input data included in the first learning data set is larger than the amount of identification target information contained in the input data included in the second learning data set.
 本発明者らは、第1の学習データセットと第2の学習データセットとを、水増し工程を経ずに学習させると、識別対象情報を正確に識別できないことを見出した。これは、第2の学習データセットに含まれる入力データの識別対象情報の量が少ないことに起因すると考えられる。すなわち、識別対象情報の量が少ない入力データで学習をすると、推論用データが識別対象情報を含むものであったとしても、識別対象情報がない推論が行われてしまう傾向があることを見出した。そこで、識別対象情報の量が多い入力データを有する第1の学習データセットの学習データを水増しして、第1の学習データセットに含まれる学習データの数が、第2の学習データセットに含まれる学習データの数以上となるようにする。そのようにすることで、識別対象情報の量が多い入力データが増え、識別対象情報を正確に識別できる。 The present inventors have found that if the first learning data set and the second learning data set are trained without going through the padding step, the identification target information cannot be accurately identified. It is considered that this is due to the small amount of identification target information of the input data included in the second learning data set. That is, it was found that when learning is performed with input data in which the amount of identification target information is small, inference without the identification target information tends to be performed even if the inference data includes the identification target information. .. Therefore, the training data of the first training data set having the input data having a large amount of identification target information is inflated, and the number of training data included in the first training data set is included in the second training data set. Make sure that the number of training data is greater than or equal to the number of training data. By doing so, the amount of input data having a large amount of identification target information increases, and the identification target information can be accurately identified.
 なお、学習データセット群の受付工程(S3-101)を有していても良い。 It should be noted that the learning data set group reception process (S3-101) may be provided.
 ここで、上述の説明の中で使用されている用語について説明する。 Here, the terms used in the above explanation will be explained.
 (データ)
 本実施形態においてデータとは、情報の表現であって、伝達、解釈または処理に適するように形式化され、再度情報として解釈できるものである。データの例として、画像データ、音のデータ(音声データ等)、テキストデータ等が挙げられる。データ、画像データ、音のデータ、テキストデータである場合、入力データは、入力画像データ、音の入力データ、入力テキストデータとなる。
(data)
In the present embodiment, the data is an expression of information, which is formalized to be suitable for transmission, interpretation or processing, and can be reinterpreted as information. Examples of data include image data, sound data (voice data, etc.), text data, and the like. In the case of data, image data, sound data, and text data, the input data is input image data, sound input data, and input text data.
 (識別対象情報)
 本実施形態において識別対象情報とは、データ中の識別対象となる情報である。データが画像データの場合、例えば、画像データ中の識別対象領域の位置、面積、及び分布の少なくともいずれか1つの情報が識別対象情報である。本実施形態に係る生成方法で生成された識別器は、ユーザの目視では抽出困難な、画像データ中の識別対象領域を推定し、抽出することができる。また、データが画像データである場合、識別対象情報の量は、識別対象領域に含まれる画素数とすることができる。
(Identification target information)
In the present embodiment, the identification target information is information to be identified in the data. When the data is image data, for example, at least one piece of information on the position, area, and distribution of the identification target area in the image data is the identification target information. The classifier generated by the generation method according to the present embodiment can estimate and extract the identification target area in the image data, which is difficult to extract visually by the user. When the data is image data, the amount of identification target information can be the number of pixels included in the identification target area.
 また、データが音のデータの場合、例えば、音のデータ中の識別対象とする音(識別対象音)の周波数、強度の少なくともいずれか1つの情報が識別対象情報である。本実施形態に係る生成方法で生成された識別器は、ユーザにとって抽出困難な、ノイズを含む音のデータ中の識別対象の音を推定し、抽出することができる。 When the data is sound data, for example, at least one of the frequency and intensity of the sound to be identified (identification target sound) in the sound data is the identification target information. The classifier generated by the generation method according to the present embodiment can estimate and extract the sound to be identified in the noise-containing sound data, which is difficult for the user to extract.
 また、音のデータが複数の話者の音声データの場合、少なくとも1人の話者の音声データを識別対象情報とすることができる。 Further, when the sound data is the voice data of a plurality of speakers, the voice data of at least one speaker can be used as the identification target information.
 データがテキストデータである場合、例えば、テキストデータ中の識別対象文字の文字、文字列やそれらの数の情報が識別対象情報である。本実施形態に係る生成方法で生成された識別器は、ユーザにとって抽出困難な、テキストデータ中の識別対象の文字列を推定し、抽出することができる。 When the data is text data, for example, the information of the character, the character string, and the number of the identification target characters in the text data is the identification target information. The classifier generated by the generation method according to the present embodiment can estimate and extract a character string to be identified in text data, which is difficult for the user to extract.
 (学習データ)
 本実施形態における学習データは、識別器を生成するための学習用データであり、入力データと、入力データに対する教師データで構成される。入力データが画像データ(入力画像データ)である場合、教師データは、画像データに識別対象情報を付帯させたものとなる。例えば、画像データにおいて識別対象領域が示されたものである。
(Learning data)
The learning data in the present embodiment is learning data for generating a discriminator, and is composed of input data and teacher data for the input data. When the input data is image data (input image data), the teacher data is the image data with the identification target information attached. For example, the identification target area is shown in the image data.
 (識別対象情報の量)
 本実施形態において入力データが有する識別対象情報の量とは、例えば、入力データが画像データである場合、当該画像データに占める識別対象領域の割合である。すなわち、識別対象情報の量が多いとは、例えば、入力データが画像データである場合、当該画像データに占める識別対象領域の割合が多いことを意味する。また、入力データが音のデータである場合、識別対象情報の量が多いとは、当該音のデータに占める識別対象の音の強度が大きい、又は、音のデータが複数の話者の音声データの場合、抽出したい話者の数が多い、ということを意味する。
(Amount of information to be identified)
In the present embodiment, the amount of identification target information contained in the input data is, for example, the ratio of the identification target area to the image data when the input data is image data. That is, a large amount of identification target information means that, for example, when the input data is image data, the ratio of the identification target region to the image data is large. Further, when the input data is sound data, a large amount of identification target information means that the intensity of the identification target sound in the sound data is large, or the sound data is voice data of a plurality of speakers. In the case of, it means that the number of speakers to be extracted is large.
 入力データがテキストデータの場合、識別対象情報の量が多いとは、例えば、テキストデータ中の識別したい文字や文字列の数が多いことを意味する。 When the input data is text data, a large amount of identification target information means, for example, a large number of characters or character strings to be identified in the text data.
 (学習データセット)
 本実施形態における学習データセットは、上述の学習データを含むものである。第1の学習データセットに含まれる学習データよりも、第2の学習データセットに含まれる学習データの数の方が多い。
(Learning data set)
The training data set in the present embodiment includes the above-mentioned training data. The number of training data contained in the second training data set is larger than the number of training data contained in the first training data set.
 本実施形態における学習データセット群は、第1の学習データセットと第2の学習データセットを少なくとも含む。学習データセット群は、3つ以上の学習データセットを含んでいても良い。 The learning data set group in the present embodiment includes at least a first learning data set and a second learning data set. The training data set group may include three or more training data sets.
 (水増し)
 本実施形態においてデータの水増しとは、例えば、回転、反転、輝度変換、歪み付与、拡大、及び縮小の少なくとも1つを行うことで、新たな入力データを生成し、入力画像データの数を増やすことである。データの水増しをデータオーグメンテーションと言い換えることもできる。また入力データが音である場合、入力データに、1種類または複数種類の周波数の音を組み合わせた音を付与することで、新たな音の入力データを生成して水増しをすることができる。
(Inflated)
In the present embodiment, the data padding is to generate new input data and increase the number of input image data by performing at least one of rotation, inversion, luminance conversion, distortion addition, enlargement, and reduction, for example. That is. Inflating data can also be rephrased as data augmentation. When the input data is sound, it is possible to generate new sound input data and inflate it by adding a sound that is a combination of sounds of one type or a plurality of types to the input data.
 (識別器の生成装置)
 本実施形態に係る識別器の生成装置は、データ中の識別対象情報を推定するための識別器の生成装置である。具体的には、学習データセット群に対して学習データの水増しを行う水増し部3-22と、水増しされた学習データセット群を用いて学習を行うことで識別器を生成する生成部3-23と、を少なくとも有する(図22)。
(Identifier generator)
The classifier generator according to the present embodiment is a classifier generator for estimating identification target information in data. Specifically, the inflated unit 3-22 that inflates the learning data for the learning data set group and the generation unit 3-23 that generates a classifier by performing learning using the inflated learning data set group. And at least (Fig. 22).
 ここで、学習データセット群は、少なくとも第1の学習データセットと、第2の学習データセットを含む。第1及び第2の学習データセットは、学習データを含む。学習データは、入力データ、及びその入力データに対する教師データで構成される。第2の学習データセットは、第1の学習データセットよりも多い数の学習データを含む。 Here, the training data set group includes at least the first training data set and the second training data set. The first and second training data sets include training data. The learning data is composed of input data and teacher data for the input data. The second training data set contains a larger number of training data than the first training data set.
 また、第1の学習データセットに含まれる入力データが有する識別対象情報の量は、第2の学習データセットに含まれる入力データが有する識別対象情報の量よりも多い。 Further, the amount of identification target information contained in the input data included in the first learning data set is larger than the amount of identification target information contained in the input data included in the second learning data set.
 本実施形態に係る生成装置は、操作部3-150の操作によって、取得部3-21は上記学習データセット群を取得する構成とすることができる。また、本実施形態に係る生成装置は、データサーバ3-120とデータの送受信を行う構成とすることができる。 The generation device according to the present embodiment can be configured such that the acquisition unit 3-21 acquires the learning data set group by operating the operation unit 3-150. Further, the generator according to the present embodiment can be configured to send and receive data to and from the data server 3-120.
 なお、本実施形態に係る識別器の生成装置の各構成や用語については、識別器の生成方法と同じであるため、説明を省略する。 Note that each configuration and terminology of the classifier generator according to the present embodiment is the same as the classifier generation method, and thus description thereof will be omitted.
 (識別器)
 本実施形態に係る識別器は、上記本実施形態に係る生成方法や生成装置によって生成されるものである。本実施形態に係る生成方法や生成装置によって生成された識別器は、入力された推論用のデータに含まれる識別対象情報を正確に推論できる。
(Identifier)
The classifier according to the present embodiment is generated by the generation method and the generation device according to the present embodiment. The discriminator generated by the generation method and the generation device according to the present embodiment can accurately infer the identification target information included in the input inference data.
 (情報処理装置、情報処理方法)
 本実施形態に係る情報処理装置は、上記識別器を備え、識別器を用いて推論用のデータに含まれる識別対象情報の推論を行う推論部を有する。同様に、本実施形態に係る情報処理方法は、上記識別器を備え、識別器を用いて推論用データに含まれる識別対象情報の推論を行う推論工程を有する。
(Information processing device, information processing method)
The information processing apparatus according to the present embodiment includes the above-mentioned classifier, and has an inference unit that infers the identification target information included in the data for inference using the classifier. Similarly, the information processing method according to the present embodiment includes the above-mentioned classifier and has an inference step of inferring the identification target information included in the inference data using the classifier.
 以下、本発明の実施形態に係る識別器の生成方法や生成装置について具体例を示しながら詳細を説明する。 Hereinafter, the method of generating the classifier and the generator according to the embodiment of the present invention will be described in detail while showing specific examples.
 (第3-1の実施形態)
 (概要)
 本実施形態に係る識別器の生成方法または生成装置について、データが画像データである例を用いて説明する。
(Embodiment of 3-1)
(Overview)
The method or apparatus for generating the classifier according to the present embodiment will be described with reference to an example in which the data is image data.
 まず学習用入力画像(入力データ又は入力画像データ)と、さらに識別対象領域(識別対象情報)が一定の色で彩色された、学習用正解画像(教師データ)とで構成される学習データを用意し、これを用いて学習を行い、学習済みモデルを作成する。 First, we prepare learning data consisting of an input image for learning (input data or input image data) and a correct answer image for learning (teacher data) in which the identification target area (identification target information) is colored with a certain color. Then, training is performed using this to create a trained model.
 以下の説明では、透過型電子顕微鏡(TEM)画像上の、樹脂の画像領域を画像処理の対象とする場合について説明するが、本実施形態の適用範囲は、検出対象物、画像取得方法の種類に限定されるものではない。以下、具体的な装置構成、機能構成および処理フローを説明する。 In the following description, a case where the image region of the resin on the transmission electron microscope (TEM) image is the target of image processing will be described, but the scope of application of this embodiment is the detection target and the type of image acquisition method. It is not limited to. Hereinafter, a specific device configuration, functional configuration, and processing flow will be described.
 (装置構成)
 図23に基づいて本発明の第3-1の実施形態に係る識別器学習装置(識別器の生成装置)及び、識別器学習装置3-100と接続される各装置から構成される領域識別システム(識別器の生成システム)3-190について詳しく説明する。
(Device configuration)
An area identification system composed of a classifier learning device (classifier generation device) according to the third embodiment of the present invention and each device connected to the classifier learning device 3-100 based on FIG. 23. (Identifier generation system) 3-190 will be described in detail.
 領域識別システム3-190は、学習のために画像を撮影するデータ入力装置3-110と、撮影された画像を記憶するデータサーバ3-120とを有する。そして、ユーザによって画像の領域の識別がなされ、識別された領域を彩色するためのデータ加工装置3-130と、識別器の学習を行う識別器学習装置3-100とを有する。そして、学習結果や度数分布を表示する表示部3-140と、ユーザが識別器学習装置の動作指示を入力するための操作部3-150を有する。識別器学習装置3-100は、学習時には学習用入力画像と学習用正解画像を取得し、学習し、学習済みモデルを出力する。 The area identification system 3-190 has a data input device 3-110 that captures an image for learning, and a data server 3-120 that stores the captured image. Then, the user identifies the area of the image, and has a data processing device 3-130 for coloring the identified area and a classifier learning device 3-100 for learning the classifier. Then, it has a display unit 3-140 for displaying the learning result and the frequency distribution, and an operation unit 3-150 for the user to input an operation instruction of the discriminator learning device. The classifier learning device 3-100 acquires a learning input image and a learning correct answer image at the time of learning, learns them, and outputs a learned model.
 また、識別器学習装置3-100によって生成された識別器を用いて、推論を行うことができる。推論時には推論用入力画像を取得し、生成した学習済みモデルを用い、入力画像中の識別領域を抽出し、その領域全体かその境界部を一定の色で彩色し、推論された画像として出力することができる。 In addition, inference can be performed using the classifier generated by the classifier learning device 3-100. At the time of inference, an input image for inference is acquired, the generated trained model is used, an identification area in the input image is extracted, the entire area or its boundary is colored with a certain color, and the image is output as an inferred image. be able to.
 以下、各部について説明する。識別器学習装置3-100は、少なくとも3-CPU31、通信IF3-32、ROM3-33、RAM3-34、記憶部3-35、共通バス3-36を有する。CPU3-31は、識別器学習装置3-100の各構成要素の動作を統合的に制御する。 Each part will be explained below. The classifier learning device 3-100 has at least 3-CPU31, communication IF3-32, ROM3-33, RAM3-34, storage unit 3-35, and common bus 3-36. The CPU 3-31 integrally controls the operation of each component of the classifier learning device 3-100.
 CPU3-31の制御により、識別器学習装置3-100がデータ入力装置3-110の動作も併せて制御するようにしてもよい。データサーバ3-120は、データ入力装置3-110が撮影した画像を保持する。通信IF(Interface)3-32は、例えば、LANカードで実現される。通信IF3-32は、外部装置(例えば、データサーバ3-120)と識別器学習装置3-100との間の通信を司る。ROM3-33は、不揮発性のメモリ等で実現され、CPU3-31が実行する制御プログラムを格納し、CPU3-31によるプログラム実行時の作業領域を提供する。RAM(Random Access Memory)3-34は、揮発性のメモリ等で実現され、各種情報を一時的に記憶する。記憶部3-35は、例えば、HDD(Hard Disk Drive)等で実現される。そして、オペレーティングシステム(OS:Operating System)、周辺機器のデバイスドライバ、後述する本実施形態に係る領域識別を行うためのプログラムを含む各種アプリケーションソフトウェアを格納する。操作部3-150は、例えば、キーボードやマウス等で実現され、ユーザからの指示を装置内に入力する。表示部3-140は、例えば、ディスプレイ等で実現され、各種情報をユーザに向けて表示する。操作部3-150や表示部3-140は、CPU3-31からの制御によりGUI(Graphical User Interface)としての機能を提供する。表示部3-140は操作入力を受け付けるタッチパネルモニタであってもよく、操作部3-150はスタイラスペンであってもよい。上記の各構成要素は共通バス3-36により互いに通信可能に接続されている。 By controlling the CPU 3-31, the classifier learning device 3-100 may also control the operation of the data input device 3-110. The data server 3-120 holds an image taken by the data input device 3-110. Communication IF (Interface) 3-32 is realized by, for example, a LAN card. The communication IF3-32 controls communication between the external device (for example, the data server 3-120) and the classifier learning device 3-100. The ROM 3-33 is realized by a non-volatile memory or the like, stores a control program executed by the CPU 3-31, and provides a work area when the program is executed by the CPU 3-31. The RAM (Random Access Memory) 3-34 is realized by a volatile memory or the like, and temporarily stores various information. The storage unit 3-35 is realized by, for example, an HDD (Hard Disk Drive) or the like. Then, various application software including an operating system (OS: Operating System), a device driver of a peripheral device, and a program for identifying an area according to the present embodiment described later are stored. The operation unit 3-150 is realized by, for example, a keyboard, a mouse, or the like, and inputs an instruction from the user into the device. The display unit 3-140 is realized by, for example, a display or the like, and displays various information toward the user. The operation unit 3-150 and the display unit 3-140 provide a function as a GUI (Graphical User Interface) under the control of the CPU 3-31. The display unit 3-140 may be a touch panel monitor that accepts operation input, and the operation unit 3-150 may be a stylus pen. Each of the above components is communicably connected to each other by a common bus 3-36.
 データ入力装置3-110は、例えば、走査型電子顕微鏡(SEM:Scanning Electron Microscope)、透過型電子顕微鏡(TEM:Transmission Electron Microscope)、光学顕微鏡、デジタルカメラ、スマートフォンなどである。データ入力装置3-110は取得した画像をデータサーバ3-120へ送信する。データ入力装置3-110を制御する不図示の撮影制御部が、識別器学習装置3-100に含まれていてもよい。 The data input device 3-110 is, for example, a scanning electron microscope (SEM), a transmission electron microscope (TEM: Transmission Electron Microscope), an optical microscope, a digital camera, a smartphone, or the like. The data input device 3-110 transmits the acquired image to the data server 3-120. An imaging control unit (not shown) that controls the data input device 3-110 may be included in the classifier learning device 3-100.
 (機能構成)
 次に、図24に基づいて本実施形態に係る識別器学習装置3-100を含む領域識別システムの機能構成について説明する。ROM3-33に格納されたプログラムをCPU3-31が実行することにより、図24に示した各部の機能が実現される。なお、プログラムを実行する主体は1以上のCPUであってもよいし、プログラムを記憶するROMも1以上のメモリであってもよい。また、CPUに替えてもしくはCPUと併用してGPU(GraphicsProcessing Unit)など他のプロセッサを用いることとしてもよい。すなわち、少なくとも1以上のプロセッサ(ハードウエア)が当該プロセッサと通信可能に接続された少なくとも1以上のメモリに記憶されたプログラムを実行することで、図24に示した各部の機能が実現される。
(Functional configuration)
Next, the functional configuration of the area identification system including the classifier learning device 3-100 according to the present embodiment will be described with reference to FIG. 24. When the CPU 3-31 executes the program stored in the ROM 3-33, the functions of the respective parts shown in FIG. 24 are realized. The main body that executes the program may be one or more CPUs, and the ROM that stores the program may also be one or more memories. Further, another processor such as GPU (Graphics Processing Unit) may be used instead of the CPU or in combination with the CPU. That is, the functions of the respective parts shown in FIG. 24 are realized by executing a program stored in at least one or more memories in which at least one or more processors (hardware) are communicably connected to the processors.
 識別器学習装置3-100は機能構成として、受付部3-41、取得部3-42、度数分布計算部3-44、データ拡張部3-45、学習部3-46、格納部3-47、および表示制御部3-48を有する。さらに抽出部3-43を有しても良い。識別器学習装置3-100は、データサーバ3-120および表示部3-140と通信可能に接続されている。 The classifier learning device 3-100 has a functional configuration of a reception unit 3-41, an acquisition unit 3-42, a frequency distribution calculation unit 3-44, a data expansion unit 3-45, a learning unit 3-46, and a storage unit 3-47. , And display control unit 3-48. Further, it may have an extraction unit 3-43. The classifier learning device 3-100 is communicably connected to the data server 3-120 and the display unit 3-140.
 受付部3-41は、操作部3-150を介してユーザから入力されたデータ拡張条件を受け付ける。すなわち操作部3-150は、拡張条件の設定およびパッチサイズ(後述)と受け付ける受付手段の一例に相当する。拡張条件は、度数分布(後述)、のビンの数、ビンの幅、オーグメンテーション方法(後述)のうち少なくとも一つを含む。なお、ビンとは度数分布(ヒストグラム)において各々が互いに素である区間や階級のことである。 The reception unit 3-41 receives the data expansion condition input from the user via the operation unit 3-150. That is, the operation unit 3-150 corresponds to an example of the reception means for setting the extended condition, patch size (described later), and receiving. The expansion condition includes at least one of a frequency distribution (described later), a number of bins, a bin width, and an augmentation method (described later). A bin is an interval or class in which each is relatively prime in a frequency distribution (histogram).
 取得部3-42は、データサーバ3-120からの学習用入力画像と学習用正解画像から構成される学習用データ(学習用データ対という言い方もできる)を複数取得する。 The acquisition unit 3-42 acquires a plurality of learning data (which can also be called a learning data pair) composed of a learning input image and a learning correct answer image from the data server 3-120.
 抽出部3-43が有している場合は、受付部3-41で受け付けたパッチサイズに基づき、学習用入力画像と学習用正解画像のそれぞれから複数の小領域(データブロック)対を抽出する。 When the extraction unit 3-43 has, a plurality of small area (data block) pairs are extracted from each of the learning input image and the learning correct answer image based on the patch size received by the reception unit 3-41. ..
 度数分布計算部3-44は、複数の学習用正解画像か、抽出されたデータブロック群を有する場合は学習用正解画像から抽出されたデータブロック群に対し、それぞれ抽出領域の面積または画素数を算出する。さらに、受付部3-41で受け付けたビンの数、ビンの幅を用い、算出した面積または画素数を特性値とした度数分布を作成する。 The frequency distribution calculation unit 3-44 determines the area or the number of pixels of the extraction region for each of a plurality of correct answer images for learning or, if there is an extracted data block group, the data block group extracted from the correct answer image for learning. calculate. Further, using the number of bins and the width of the bins received by the reception unit 3-41, a frequency distribution is created with the calculated area or the number of pixels as the characteristic value.
 データ拡張部3-45では、作成した度数分布と、受付部3-41で受け付けたオーグメンテーションの実行の指示に基づき、学習用入力画像と学習用正解画像のデータ拡張を行う。 The data expansion unit 3-45 expands the data of the learning input image and the learning correct answer image based on the created frequency distribution and the instruction to execute the augmentation received by the reception unit 3-41.
 学習部3-46は上記教師データをもとに学習を行い、学習済みモデルを作成する。格納部3-47は該学習済みモデルを格納する。続いて表示制御部3-48は表示部3-140を用いて前記度数分布と学習結果に関する情報を出力する。 Learning unit 3-46 learns based on the above teacher data and creates a learned model. The storage unit 3-47 stores the trained model. Subsequently, the display control unit 3-48 uses the display unit 3-140 to output information on the frequency distribution and the learning result.
 一方、推論用入力データからの推論においては、操作部3-150を介してユーザから入力された推論動作の開始指令を受け付ける。 On the other hand, in the inference from the input data for inference, the start command of the inference operation input from the user is received via the operation unit 3-150.
 取得部3-42は、データサーバ3-120からの推論用画像を取得する。 Acquisition unit 3-42 acquires an inference image from the data server 3-120.
 不図示の推論部は教師データ3-49をもとに推論を行う。続いて表示制御部48は表示部3-140を用いて推論結果を出力する。 The inference unit (not shown) makes inferences based on teacher data 3-49. Subsequently, the display control unit 48 outputs the inference result using the display unit 3-140.
 なお、識別器学習装置3-100が有する各部の少なくとも一部を独立した装置として実現してもよい。識別器学習装置3-100はワークステーションでもよい。各部の機能はコンピュータ上で動作するソフトウェアとして実現してもよく、各部の機能を実現するソフトウェアは、クラウドをはじめとするネットワークを介したサーバ上で動作してもよい。以下に説明する本実施形態では、各部はローカル環境に設置したコンピュータ上で動作するソフトウェアによりそれぞれ実現されているものとする。 Note that at least a part of each part of the classifier learning device 3-100 may be realized as an independent device. The classifier learning device 3-100 may be a workstation. The functions of each part may be realized as software that operates on a computer, and the software that realizes the functions of each part may be realized on a server via a network such as a cloud. In the present embodiment described below, it is assumed that each part is realized by software running on a computer installed in a local environment.
 (処理フロー)
 続いて、本発明の第3-1の実施形態に係る識別器の生成方法について説明する。図25は本実施形態の識別器学習装置3-100が実行する処理の処理手順を示す図である。本実施形態は、ROM3-33に格納されている各部の機能を実現するプログラムをCPU3-31が実行することにより実現される。本実施形態では、処理対象画像はTEM画像であるとして説明する。TEM画像は2次元濃淡画像として取得される。また、本実施形態では画像中に含まれる識別対象を、処理対象画像に含まれる処理対象物の例として説明する。
(Processing flow)
Subsequently, a method for generating a classifier according to the third embodiment of the present invention will be described. FIG. 25 is a diagram showing a processing procedure of processing executed by the classifier learning device 3-100 of the present embodiment. This embodiment is realized by the CPU 3-31 executing a program that realizes the functions of each part stored in the ROM 3-33. In the present embodiment, the image to be processed will be described as a TEM image. The TEM image is acquired as a two-dimensional shading image. Further, in the present embodiment, the identification target included in the image will be described as an example of the processing target object included in the processing target image.
 まず、ステップS3-201からS3-207の学習に関する処理について説明する。 First, the processing related to learning in steps S3-201 to S3-207 will be described.
 ステップS3-201において、受付部3-41は、操作部3-150においてユーザにより入力されたデータ拡張条件を受け付ける。本実施形態におけるデータ拡張条件は、作成する度数分布に関するビンの数、ビンの幅、オーグメンテーション方法のうち少なくとも1つを含む。 In step S3-201, the reception unit 3-41 receives the data expansion condition input by the user in the operation unit 3-150. The data expansion condition in the present embodiment includes at least one of the number of bins, the width of the bins, and the augmentation method regarding the frequency distribution to be created.
 ステップS3-202において、取得部3-42は、学習用入力画像と学習用正解画像からなる学習用データ対をデータサーバ3-120から取得する。この時用いられる学習用入力画像と学習用正解画像は抽出領域部分の全体もしくは境界部が彩色されたことを除き全く同じ画像対を用いることができる。 In step S3-202, the acquisition unit 3-42 acquires a learning data pair consisting of a learning input image and a learning correct answer image from the data server 3-120. For the learning input image and the learning correct answer image used at this time, exactly the same image pair can be used except that the entire extraction region portion or the boundary portion is colored.
 抽出部3-43を有する場合にはステップS3-202bにおいて、学習用入力画像、および学習用正解画像からパッチサイズに応じて小領域(データブロック)対を抽出する。ここでパッチサイズとは、対象画像から一部を切り抜く際の、切り抜いた画像の縦と横の画素数のことである。抽出されるデータブロック対はそれぞれ該画像上の同じ座標から抽出される。 When the extraction unit 3-43 is provided, in step S3-202b, a small area (data block) pair is extracted from the learning input image and the learning correct answer image according to the patch size. Here, the patch size is the number of pixels in the vertical and horizontal directions of the cropped image when a part of the target image is cropped. Each pair of extracted data blocks is extracted from the same coordinates on the image.
 ステップS3-203において、度数分布計算部3-44は、複数の学習用正解画像に対し、また抽出部3-43を有する場合には学習用正解画像から抽出されたデータブロック群に対し、それぞれ抽出領域の面積を算出し、この面積値を特性値とした度数分布を作成する。 In step S3-203, the frequency distribution calculation unit 3-44 for a plurality of learning correct answer images, and for the data block group extracted from the learning correct answer image when the extraction unit 3-43 is provided, respectively. The area of the extraction area is calculated, and a frequency distribution is created using this area value as a characteristic value.
 ステップS3-204において、データ拡張部3-45は、前記度数分布と、受付部3-41で受け付けたオーグメンテーションの実行の指示に基づき、学習用入力画像と学習用正解画像のデータ拡張を行う。具体的には画像の回転の他、反転、拡大、縮小、歪み付与、輝度変更などオーグメンテーションと呼ばれる方法を使い、同じ度数分布に含まれる様に学習用入力画像と学習用正解画像を増やす。この方法により識別対象領域が多く含まれるビンの度数が、識別対象領域がより少なく含まれるビンの度数よりも多い教師データを生成する。 In step S3-204, the data expansion unit 3-45 expands the data of the learning input image and the learning correct answer image based on the frequency distribution and the instruction to execute the augmentation received by the reception unit 3-41. Do. Specifically, in addition to rotating the image, a method called augmentation such as inversion, enlargement, reduction, distortion addition, and brightness change is used to increase the input image for learning and the correct answer image for learning so that they are included in the same frequency distribution. .. By this method, teacher data is generated in which the frequency of the bin containing a large amount of the identification target area is higher than the frequency of the bin containing a smaller identification target area.
 オーグメンテーションは、例えば、回転、反転、拡大、縮小といったことを実行するものであり、各処理については、以下のように行うことができる。すなわち、あらかじめパッチサイズの縦横10倍の大きさのブランク画像(白色)を用意し、その中央部に回転したい画像を配置する。次に各座標においてそれぞれ式(3-1)と表1に従いアフィン変換を行う。式中でxとyは変換前の座標を、x’とy’は変換後の座標を示す。またθは通常の場合、回転角で30°から330°の間で設定するとよい。またaとdはそれぞれ縦横方向の拡大・縮小率で通常の場合、0.1~10の間で設定すると良い。次にパッチサイズで中央を切り取りオーグメンテーション後の画像とする。 Augmentation executes, for example, rotation, inversion, enlargement, reduction, etc., and each process can be performed as follows. That is, a blank image (white) having a size 10 times the length and width of the patch size is prepared in advance, and the image to be rotated is arranged in the center portion thereof. Next, the affine transformation is performed at each coordinate according to Eq. (3-1) and Table 1. In the equation, x and y indicate the coordinates before conversion, and x'and y'indicate the coordinates after conversion. Further, in a normal case, θ may be set between 30 ° and 330 ° in terms of rotation angle. Further, a and d are enlargement / reduction ratios in the vertical and horizontal directions, respectively, and are usually set between 0.1 and 10. Next, the center is cut out at the patch size to make the image after augmentation.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-T000007
Figure JPOXMLDOC01-appb-T000007
 また歪みの一例としてはx座標に任意の値を足して平行移動させるが、この任意の値をy座標に応じ変化させて行う。任意の値の最大値は通常の場合、パッチサイズのX方向の長さの20%から60%の間が良い。 As an example of distortion, an arbitrary value is added to the x-coordinate and translated, but this arbitrary value is changed according to the y-coordinate. The maximum value of any value is usually preferably between 20% and 60% of the length of the patch size in the X direction.
 一方、輝度変更の一例としてはガンマ補正を用いることができる。この時のガンマ値は通常の場合、1.2以上もしくは1/1.2以下を用いると良い。 On the other hand, gamma correction can be used as an example of changing the brightness. The gamma value at this time is usually 1.2 or more or 1 / 1.2 or less.
 さらにオーグメンテーション画像に対して線形補間処理をしても良い。これによりモザイク状になったギザギザに見える画像をスムーズにすることができる。 Further, linear interpolation processing may be performed on the augmentation image. This makes it possible to smooth out a mosaic-like jagged image.
 ステップS3-205において、学習部3-46は前記学習教師データを用いて、所定のアルゴリズムに従った機械学習を実施することにより、学習済みモデル3-49を生成する。本実施形態では、所定のアルゴリズムとして、U-Netを用いることが好適である。なお、U-Netの学習方法は、周知技術であるため、本実施形態では詳細な説明を省略する。また、所定のアルゴリズムとして、例えば、SVM(Support vector machine)、DNN(Deep Neural Network)、CNN(Convolutional Neural Network)等を用いてもよい。また、1ピクセル単位でクラス分類を行うSemantic Segmentationに用いられるアルゴリズムとして、U-Netの他に、FCN (Fully Convolutional Network)、SegNet等を用いることもできる。さらに、上記アルゴリズムにGAN(Generative Adversarial Networks)等の所謂生成モデルを組み合わせたアルゴリズムを用いてもよい。 In step S3-205, the learning unit 3-46 generates a trained model 3-49 by performing machine learning according to a predetermined algorithm using the learning teacher data. In this embodiment, it is preferable to use U-Net as a predetermined algorithm. Since the learning method of U-Net is a well-known technique, detailed description thereof will be omitted in the present embodiment. Further, as a predetermined algorithm, for example, SVM (Support Vector machine), DNN (Deep Neural Network), CNN (Convolutional Neural Network), or the like may be used. In addition to U-Net, FCN (Fully Convolutional Network), SegNet, and the like can also be used as an algorithm used for Semantic Segmentation that classifies classes in 1-pixel units. Further, an algorithm in which a so-called generative model such as GAN (Generative Adversarial Networks) is combined with the above algorithm may be used.
 ステップS3-206において、格納部3-47は前記学習済みモデルを格納する。 In step S3-206, the storage unit 3-47 stores the trained model.
 ステップS3-207において、表示制御部3-48は表示部3-140を用いて、前記度数分布と学習に関する情報等を出力する。 In step S3-207, the display control unit 3-48 uses the display unit 3-140 to output information related to the frequency distribution and learning.
 一方、推論処理に関しては、殆どの部分において学習処理と同様なため以下に簡潔に説明する。すなわち、受付部が受け付ける情報がデータ拡張条件の代わりに推論開始命令であることを除きステップS3-201と同様な不図示の処理を行う。 On the other hand, the inference process is almost the same as the learning process, so it will be explained briefly below. That is, the same processing as in step S3-201 (not shown) is performed except that the information received by the reception unit is an inference start command instead of the data extension condition.
 次に学習用データ対の代わりに推論用入力データを取得部から取得することを除きステップS3-202と同様な不図示の処理を行う。 Next, a process (not shown) similar to step S3-202 is performed except that the inference input data is acquired from the acquisition unit instead of the learning data pair.
 さらに、不図示のステップにおいて上記学習データと前記推論用入力データを用いて、学習処理と同じアルゴリズムを用い領域を推論する。 Further, in a step (not shown), the area is inferred using the same algorithm as the learning process by using the learning data and the inference input data.
 最後に上記度数分布と学習に関する情報等の代わりに推論結果を出力することを除きステップS3-207と同様な不図示の処理を行う。 Finally, the same non-illustrated process as in step S3-207 is performed except that the inference result is output instead of the information related to the frequency distribution and learning.
 以上の処理により推論精度を向上することができる。 The inference accuracy can be improved by the above processing.
 <その他の実施形態>
 第3-1の実施形態で扱うデータを画像の代わりに音声データにすること、また入力装置をマイクロフォンにすることができる。また、面積の代わりに学習用入力データと学習用正解データの差分量を使う、など音声データに対応することにより、音声データ中の話者識別やノイズキャンセルなど音声処理に利用することもできる。
<Other Embodiments>
The data handled in the third embodiment can be audio data instead of an image, and the input device can be a microphone. Further, by supporting voice data such as using the difference amount between the learning input data and the learning correct answer data instead of the area, it can be used for voice processing such as speaker identification and noise cancellation in the voice data.
 例えば話者認識の一例では、全音声を周波数帯ごとに分別し、識別したい話者の特徴を持った周波数成分の音性を一定の音量に書き換えたものを教師用正解データとする。これをもとに学習を行うと全音声中のどの音声成分が識別したい話者の音声か識別することができる。この識別器を利用すると全音声中から特定の話者の音声のみを抽出する明瞭化が可能となる。 For example, in an example of speaker recognition, all voices are separated for each frequency band, and the sound quality of the frequency component having the characteristics of the speaker to be identified is rewritten to a constant volume as the correct answer data for the teacher. When learning is performed based on this, it is possible to identify which voice component in all voices is the voice of the speaker to be identified. By using this classifier, it is possible to clarify that only the voice of a specific speaker is extracted from all the voices.
 一方、ノイズキャンセルの一例では同様な方法により全音声中のどの音声成分が不要な音声、すなわちノイズであるかが識別できる。この識別器を利用すると全音声中からノイズを消去する明瞭化が可能となる。 On the other hand, in an example of noise cancellation, it is possible to identify which voice component in all voices is unnecessary voice, that is, noise by the same method. By using this classifier, it is possible to clarify the noise by eliminating it from all the voice.
 なお、処理内容はデータ拡張の方法が音量や周波数、速度の増減であることを除き第3-1の実施形態と同様である。 The processing content is the same as that of the third embodiment except that the data expansion method is to increase / decrease the volume, frequency, and speed.
 上述の各実施形態における識別器学習装置および領域識別システムは、単体の装置として実現してもよいし、複数の情報処理装置を含む装置を互いに通信可能に組合せて上述の処理を実行する形態としてもよく、いずれも本発明の実施形態に含まれる。共通のサーバ装置あるいはサーバ群で、上述の処理を実行することとしてもよい。この場合、当該共通のサーバ装置は実施形態に係る識別器学習装置に対応し、当該サーバ群は実施形態に係る領域識別システムに対応する。識別器学習装置および領域識別システムを構成する複数の装置は所定の通信レートで通信可能であればよく、また同一の施設内あるいは同一の国に存在することを要しない。 The classifier learning device and the area identification system in each of the above-described embodiments may be realized as a single device, or as a mode in which devices including a plurality of information processing devices are combined so as to be able to communicate with each other to execute the above-described processing. Also, both are included in the embodiments of the present invention. The above-mentioned processing may be executed by a common server device or a group of servers. In this case, the common server device corresponds to the classifier learning device according to the embodiment, and the server group corresponds to the area identification system according to the embodiment. The classifier learning device and the plurality of devices constituting the area identification system need only be able to communicate at a predetermined communication rate, and do not need to exist in the same facility or in the same country.
 以上、実施形態例を詳述したが、本発明は例えば、システム、装置、方法、プログラム若しくは記録媒体(記憶媒体)等としての実施態様をとることが可能である。具体的には、複数の機器(例えば、ホストコンピュータ、インタフェース機器、撮像装置、Webアプリケーション等)から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 Although the embodiment examples have been described in detail above, the present invention can take an embodiment as, for example, a system, an apparatus, a method, a program, a recording medium (storage medium), or the like. Specifically, it may be applied to a system composed of a plurality of devices (for example, a host computer, an interface device, an imaging device, a Web application, etc.), or it may be applied to a device composed of one device. good.
 また、本発明の目的は、以下のようにすることによって達成されることはいうまでもない。即ち、前述した実施形態の機能を実現するソフトウェアのプログラムコード(コンピュータプログラム)を記録した記録媒体(または記憶媒体)を、システムあるいは装置に供給する。係る記憶媒体は言うまでもなく、コンピュータ読み取り可能な記憶媒体である。そして、そのシステムあるいは装置のコンピュータ(またはCPUやGPU)が記録媒体に格納されたプログラムコードを読み出し実行する。この場合、記録媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本発明を構成することになる。 Needless to say, the object of the present invention is achieved by doing the following. That is, a recording medium (or storage medium) in which a software program code (computer program) that realizes the functions of the above-described embodiment is recorded is supplied to the system or device. Needless to say, the storage medium is a computer-readable storage medium. Then, the computer (or CPU or GPU) of the system or device reads and executes the program code stored in the recording medium. In this case, the program code itself read from the recording medium realizes the function of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention.
 以上、本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。例えば、適宜前処理、後処理を加えてもよい。 Although the preferred embodiments of the present invention have been described in detail above, the present invention is not limited to the specific embodiments, and various modifications are made within the scope of the gist of the present invention described in the claims.・ Can be changed. For example, pretreatment and posttreatment may be added as appropriate.
 上述の実施形態を適宜組み合わせた形態も、本発明の実施形態に含まれる。 An embodiment in which the above-described embodiments are appropriately combined is also included in the embodiment of the present invention.
 以下に、実施例および比較例を挙げて本発明の第3の実施形態をより詳細に説明する。なお、本発明は以下の実施例に限定されるものではない。 Hereinafter, a third embodiment of the present invention will be described in more detail with reference to Examples and Comparative Examples. The present invention is not limited to the following examples.
 (実施例1)
 本実施例ではカラートナーの断面TEM画像中のマゼンタ顔料量の把握のために本発明の実施形態の識別器学習装置を利用した。
(Example 1)
In this embodiment, the classifier learning device of the embodiment of the present invention was used to grasp the amount of magenta pigment in the cross-sectional TEM image of the color toner.
 (トナーの準備)
 従来の方法に従いマゼンタ顔料入り粉砕トナーを得た。粉砕トナーを得る方法については、特開2010-140062や特開2003-233215に記載された方法を利用することができる。
(Toner preparation)
A pulverized toner containing a magenta pigment was obtained according to a conventional method. As a method for obtaining the pulverized toner, the methods described in JP-A-2010-140062 and JP-A-2003-233215 can be used.
 (トナーのTEM観察)
 トナーの透過型電子顕微鏡(TEM)による断面観察は、以下のようにして実施することができる。
(TEM observation of toner)
Cross-sectional observation of the toner with a transmission electron microscope (TEM) can be performed as follows.
 オスミウム・プラズマコーター(filgen社、OPC80T)を用いて、保護膜としてトナーにOs膜(5nm)およびナフタレン膜(20nm)を施し、光硬化性樹脂D800(日本電子社)で包埋した。その後、超音波ウルトラミクロトーム(Leica社、UC7)により、切削速度1mm/sで膜厚60nm(or70nm)のトナー断面を作製した。 Using an osmium plasma coater (filgen, OPC80T), an Os film (5 nm) and a naphthalene film (20 nm) were applied to the toner as a protective film, and the toner was embedded with a photocurable resin D800 (JEOL Ltd.). Then, a toner cross section having a film thickness of 60 nm (or 70 nm) was prepared at a cutting speed of 1 mm / s by an ultrasonic ultramicrotome (UC7, Leica).
 TEM(JEOL社、JEM2800)を用いて、作製したトナー断面の観察を行った。図26にトナーのTEM画像から切り取った学習用入力画像の一例を示す。 The cross section of the produced toner was observed using TEM (JEOL, JEM2800). FIG. 26 shows an example of a learning input image cut out from the TEM image of the toner.
 (学習用正解画像)
 18枚の前記TEM画像より学習用正解画像を作成した。作成にはアドビシステムズ株式会社製の画像処理ソフトPhotoshop5.5を用い、マゼンタ顔料部分を黒(輝度0/256)で彩色した。図26に彩色した学習用正解画像の一例を示した。
(Correct image for learning)
A correct answer image for learning was created from the 18 TEM images. The image processing software Photoshop 5.5 manufactured by Adobe Systems Incorporated was used to create the magenta pigment portion, which was colored black (brightness 0/256). FIG. 26 shows an example of a colored correct answer image for learning.
 バッチサイズを128×128とし、前記学習用入力画像および学習用正解画像それぞれ同じ位置における小領域(データブロック)を100枚ずつ計1800対作成した。 The batch size was 128 × 128, and 100 small areas (data blocks) at the same position for each of the learning input image and the learning correct answer image were created for a total of 1800 pairs.
 次に前記学習用正解画像1800枚において、特性値を彩色した領域部分の画素数をとして、またビン数と幅は表1に示す値を用い度数分布を作成した。 Next, in the 1800 correct answer images for learning, a frequency distribution was created using the number of pixels in the region where the characteristic values were colored and the values shown in Table 1 for the number of bins and the width.
 本実施例では表2に示す条件で回転、反転、拡大、縮小、歪み付与、輝度変更などのデータ拡張を行い、学習し識別器を作成した。 In this embodiment, data expansion such as rotation, inversion, enlargement, reduction, distortion addition, and brightness change was performed under the conditions shown in Table 2, and learning was performed to create a classifier.
Figure JPOXMLDOC01-appb-T000008
Figure JPOXMLDOC01-appb-T000008
 (実施例2)
 本実施例ではデータ拡張条件を除き実施例1と同様で、同じTEM画像を用いてトナー中のマゼンタ顔料量測定のために学習し識別器を作成した。データ拡張条件は表1に示すように、対象領域内の画素数が多いものほど、その度数が増えるものである。
(Example 2)
In this example, the same as in Example 1 except for the data expansion condition, the same TEM image was used to learn for measuring the amount of magenta pigment in the toner, and a classifier was created. As shown in Table 1, the data expansion conditions are such that the larger the number of pixels in the target area, the higher the frequency.
 (比較例1)
 本比較例ではデータ拡張を行わなかった事を除き実施例1と同様に、同じTEM画像を用いてトナー中のマゼンタ顔料量測定のために学習し識別器を作成した。
(Comparative Example 1)
In this comparative example, a discriminator was created by learning for measuring the amount of magenta pigment in the toner using the same TEM image as in Example 1 except that the data was not expanded.
 (実施例3)
 本実施例では本発明の実施形態の識別器学習装置を利用し、航空写真から街中の車の台数を計測するために自動車部分の領域を識別する。使用した画像はhttps://gdo152.llnl.gov/cowc/ (2019年10月現在)から入手したポツダム市の航空写真4枚である。
(Example 3)
In this embodiment, the classifier learning device of the embodiment of the present invention is used to identify the area of the automobile part in order to measure the number of cars in the city from the aerial photograph. The image used is https: // gdo152. llnl. Four aerial photographs of Potsdam City obtained from gov / cowc / (as of October 2019).
 図27に示すように、実施例1と同様に彩色し学習用正解画像を作成した。 As shown in FIG. 27, a correct answer image for learning was created by coloring in the same manner as in Example 1.
 本実施例では表3に示す条件で回転、反転、拡大、縮小、歪み付与、輝度変更などのデータ拡張を行い、学習し識別器を作成した。 In this embodiment, data expansion such as rotation, inversion, enlargement, reduction, distortion addition, and brightness change was performed under the conditions shown in Table 3, and learning was performed to create a classifier.
Figure JPOXMLDOC01-appb-T000009
Figure JPOXMLDOC01-appb-T000009
 (比較例2)
 本比較例ではデータ拡張を行わなかった事を除き実施例3と同様に、同じ航空写真から街中の車の台数を計測するために学習し識別器を作成した。
(Comparative Example 2)
In this comparative example, a classifier was created by learning to measure the number of cars in the city from the same aerial photograph as in Example 3 except that the data was not expanded.
 (評価基準)
 実施例に係る識別器学習装置によって生成された識別器の効果(識別精度)について説明する。本実施形態において、効果を測定するために、評価指標としてIoU(Intersection over Union)を用いた。IoUは式(3-2)により、定義される。
(Evaluation criteria)
The effect (discrimination accuracy) of the discriminator generated by the discriminator learning device according to the embodiment will be described. In this embodiment, IoU (Intersection over Union) was used as an evaluation index to measure the effect. IoU is defined by equation (3-2).
Figure JPOXMLDOC01-appb-M000010
Figure JPOXMLDOC01-appb-M000010
 ここで、TP(True Positive)は、マゼンタ顔料である画素をマゼンタ顔料であると判定した数である。また、FP(False Positive)は、マゼンタ顔料でない画素をマゼンタ顔料であると判定した数(誤検知数)、FN(FalseNegative)は、マゼンタ顔料である画素をマゼンタ顔料でないと判定した数(未検知数)である。実施例1から2と比較例1の結果を表4に、また実施例3と比較例2の結果を表5にそれぞれ比較した。 Here, TP (True Positive) is a number of pixels that are magenta pigments that are determined to be magenta pigments. Further, FP (False Positive) is the number of pixels that are not magenta pigments are determined to be magenta pigments (false positives), and FN (FalseNegative) is the number of pixels that are magenta pigments are determined to be non-magenta pigments (undetected). Number). The results of Examples 1 and 2 and Comparative Example 1 were compared in Table 4, and the results of Example 3 and Comparative Example 2 were compared in Table 5.
Figure JPOXMLDOC01-appb-T000011
Figure JPOXMLDOC01-appb-T000011
Figure JPOXMLDOC01-appb-T000012
Figure JPOXMLDOC01-appb-T000012
 いずれの場合でも本発明の実施形態に則した例のIoU値は比較例のIoU値よりも大きく、識別精度が向上していることが確認できる。すなわち、上記実施例に係る生成方法によって生成した識別器は、マゼンタ顔料である領域や自動車部分の領域の画素数が多い入力データが増えるため、識別対象情報(マゼンタ顔料である領域や自動車部分の領域)を精度高く識別できることがわかった。 In any case, the IoU value of the example according to the embodiment of the present invention is larger than the IoU value of the comparative example, and it can be confirmed that the identification accuracy is improved. That is, in the classifier generated by the generation method according to the above embodiment, since the input data having a large number of pixels in the magenta pigment region and the automobile portion region increases, the identification target information (magenta pigment region and automobile portion) It was found that the area) can be identified with high accuracy.
 上記の通り本発明の第3の実施形態に係る識別器の生成方法によれば、推論精度の高い識別器を生成できる。 As described above, according to the method for generating a classifier according to the third embodiment of the present invention, a classifier with high inference accuracy can be generated.
 <<第4の実施形態>>
 (概要)
 本発明の第4の実施形態は、上記本発明の第1の実施形態、第2の実施形態、第3の実施形態を組み合わせたものである。上記第1の実施形態、第2の実施形態、第3の実施形態を組み合わせることで、さらなる識別精度向上の効果が得られる。図面を参照して、第4の実施形態の一例を詳しく説明する。なお、上記第1~3の実施形態と同様の構成、機能、及び動作についての説明は省略し、主に上記実施形態との差異について説明する。
<< Fourth Embodiment >>
(Overview)
The fourth embodiment of the present invention is a combination of the first embodiment, the second embodiment, and the third embodiment of the present invention. By combining the first embodiment, the second embodiment, and the third embodiment, the effect of further improving the identification accuracy can be obtained. An example of the fourth embodiment will be described in detail with reference to the drawings. The description of the configuration, function, and operation similar to those of the first to third embodiments will be omitted, and the differences from the above embodiments will be mainly described.
 本実施形態では、処理対象画像がTEM画像である場合の例を説明する。TEM画像は2次元濃淡画像として取得される。また、メラミン・アルキッド樹脂塗料の塗膜中のカーボンブラックを、識別対象物の例として説明する。本実施形態においては、1280×960のサイズの25枚50対の画像を含む初期データセットのうち、20枚40対を学習用、5枚10対を評価用として使用した。また、学習時には学習用画像からパッチサイズ128×128の画像をそれぞれ100枚ずつ切り抜くことで、2000枚の入力データを生成した。切り抜いた1画像中の識別対象物の最大面積/最小面積の値は30~120であり、切り抜いた1画像中の識別対象物の量は0pixelから16384pixelであった。 In this embodiment, an example in which the image to be processed is a TEM image will be described. The TEM image is acquired as a two-dimensional shading image. Further, carbon black in the coating film of the melamine / alkyd resin paint will be described as an example of the object to be identified. In the present embodiment, of the initial data set including 25 images and 50 pairs of images having a size of 1280 × 960, 40 pairs of 20 images were used for learning and 10 pairs of 5 images were used for evaluation. Further, at the time of learning, 2000 input data were generated by cutting out 100 images each having a patch size of 128 × 128 from the learning image. The value of the maximum area / minimum area of the identification object in one cropped image was 30 to 120, and the amount of the identification object in one cropped image was 0pixel to 16384pixel.
 評価には、第2-1の実施形態と同様に、各評価用の画像についてIoUの値を求めて平均したIoUavgを使用した。 For the evaluation, as in the second embodiment, IoUavg obtained by calculating the value of IoU for each image for evaluation and averaging was used.
 (第4-1の実施形態)
 処理フローのうち、学習処理部分は第2-2の実施形態と同様とし、推論処理部分は第1-1の実施形態と同様とした。
(Embodiment of 4-1)
In the processing flow, the learning processing part was the same as that of the second embodiment, and the inference processing part was the same as that of the 1-1 embodiment.
 従来手法ではIoUavg=0.0であったが、平均推論回数30回ではIoUavg=0.85となった。 In the conventional method, IoUavg = 0.0, but when the average number of inferences was 30 times, IoUavg = 0.85.
 (第4-2の実施形態)
 処理フローのうち、学習処理部分は第3-1の実施形態と同様とし、推論処理部分は第1-1の実施形態と同様とした。
(Embodiment of 4-2)
In the processing flow, the learning processing portion is the same as that of the 3-1 embodiment, and the inference processing portion is the same as that of the 1-1 embodiment.
 従来手法ではIoUavg=0.0であったが、平均推論回数30回ではIoUavg=0.81となった。 In the conventional method, IoUavg = 0.0, but when the average number of inferences was 30 times, IoUavg = 0.81.
 (第4-3の実施形態)
 処理フローのうち、学習処理部分では第3の実施形態中の実施例1に記載のように度数の少ない画像を度数の多い画像と同じ度数までデータ拡張したうえで、第2-1の実施形態に記載のように、2段階に分けて学習を行った。推論処理部分は第1-1の実施形態と同様とした。
(4th-3rd Embodiment)
In the learning processing part of the processing flow, as described in the first embodiment in the third embodiment, the data of the image having a low power is expanded to the same power as the image having a high power, and then the image of the second embodiment is 2-1. As described in the above, the learning was performed in two stages. The inference processing part is the same as that of the first embodiment.
 従来手法ではIoUavg=0.0であったが、平均推論回数30回ではIoUavg=0.87となった。 In the conventional method, IoUavg = 0.0, but when the average number of inferences was 30 times, IoUavg = 0.87.
 上記各実施形態の処理内容、識別対象、評価値の一覧を表6に示す。 Table 6 shows a list of processing contents, identification targets, and evaluation values of each of the above embodiments.
Figure JPOXMLDOC01-appb-T000013
Figure JPOXMLDOC01-appb-T000013
 本発明は上記実施の形態に制限されるものではなく、本発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、本発明の範囲を公にするために以下の請求項を添付する。 The present invention is not limited to the above-described embodiment, and various modifications and modifications can be made without departing from the spirit and scope of the present invention. Therefore, the following claims are attached in order to publicize the scope of the present invention.
 本願は、2019年10月31日提出の日本国特許出願特願2019-199099と2019年11月29日提出の日本国特許出願特願2019-217334と特願2019-217335を基礎として優先権を主張するものであり、その記載内容の全てをここに援用する。 This application has priority based on Japanese Patent Application Patent Application No. 2019-199099 submitted on October 31, 2019 and Japanese Patent Application No. 2019-217334 and Japanese Patent Application No. 2019-217335 submitted on November 29, 2019. It is an assertion, and all the contents of the description are incorporated here.

Claims (46)

  1.  画像中の特定領域の情報を推論に基づいて取得する画像処理装置であって、
     前記画像から、所定の推論条件に基づいて抽出された複数の注目領域の情報の各々を、学習済みモデルに入力することにより推論された、前記特定領域の情報を取得する情報取得手段を有し、
     前記複数の注目領域は、第一の注目領域と第二の注目領域とを含み、前記第一の注目領域と前記第二の注目領域とは、各々が、互いに重複する領域と、互いに重複しない領域とを有する画像処理装置。
    An image processing device that acquires information on a specific area in an image based on inference.
    It has an information acquisition means for acquiring the information of the specific region, which is inferred by inputting each of the information of a plurality of regions of interest extracted from the image based on a predetermined inference condition into the trained model. ,
    The plurality of attention regions include a first attention region and a second attention region, and the first attention region and the second attention region do not overlap with each other and overlap with each other. An image processing device having an area.
  2.  前記第一の注目領域のサイズと前記第二の注目領域のサイズが互いに同じである請求項1に記載の画像処理装置。 The image processing apparatus according to claim 1, wherein the size of the first attention region and the size of the second attention region are the same as each other.
  3.  前記推論条件の設定を受け付ける受付部をさらに有する請求項1または2に記載の画像処理装置。 The image processing device according to claim 1 or 2, further comprising a reception unit that accepts the setting of the inference condition.
  4.  前記情報取得手段は、前記受付部で受け付けた前記推論条件に基づいて前記画像から前記複数の注目領域を抽出する抽出部を有する請求項1乃至3のいずれか1項に記載の画像処理装置。 The image processing apparatus according to any one of claims 1 to 3, wherein the information acquisition means has an extraction unit that extracts the plurality of areas of interest from the image based on the inference conditions received by the reception unit.
  5.  前記抽出部は、乱数を用いて前記複数の注目領域を抽出する請求項4に記載の画像処理装置。 The image processing device according to claim 4, wherein the extraction unit extracts the plurality of areas of interest using random numbers.
  6.  前記情報取得手段は、前記抽出部で抽出された複数の注目領域の各々を、学習済みモデルに入力することで複数の推論結果を取得し、前記複数の推論結果に基づいて、前記特定領域の情報を取得する情報取得部を有する請求項4または5に記載の画像処理装置。 The information acquisition means acquires a plurality of inference results by inputting each of the plurality of attention regions extracted by the extraction unit into the trained model, and based on the plurality of inference results, the specific region. The image processing apparatus according to claim 4 or 5, which has an information acquisition unit for acquiring information.
  7.  前記推論条件が、前記画像の各画素毎に平均して行われる推論の回数、前記注目領域のうち、前記特定領域であると推論される回数と前記注目領域が推論された回数の比の閾値、及び前記注目領域のサイズのうち少なくとも一つを含む請求項1乃至6のいずれか1項に記載の画像処理装置。 The threshold value of the ratio of the number of inferences that the inference condition is performed on average for each pixel of the image, the number of times that the specific area is inferred from the attention area, and the number of times that the attention area is inferred. , And the image processing apparatus according to any one of claims 1 to 6, which includes at least one of the sizes of the region of interest.
  8.  前記画像が複数の前記特定領域を含み、前記複数の特定領域の面積が分布をもつ請求項1乃至7のいずれか1項に記載の画像処理装置。 The image processing apparatus according to any one of claims 1 to 7, wherein the image includes a plurality of the specific regions and the areas of the plurality of specific regions have a distribution.
  9.  前記画像が複数の前記特定領域を含み、前記複数の特定領域の面積の最小値に対する前記複数の特定領域の面積の最大値の比が50以上である請求項1乃至8のいずれか1項に記載の画像処理装置。 The invention according to any one of claims 1 to 8, wherein the image includes a plurality of the specific regions, and the ratio of the maximum value of the area of the plurality of specific regions to the minimum value of the area of the plurality of specific regions is 50 or more. The image processing apparatus described.
  10.  前記画像が複数の前記特定領域を含み、前記複数の特定領域の面積の最小値に対する最大値の比が100以上である請求項1乃至9のいずれか1項に記載の画像処理装置。 The image processing apparatus according to any one of claims 1 to 9, wherein the image includes a plurality of the specific areas, and the ratio of the maximum value to the minimum value of the area of the plurality of specific areas is 100 or more.
  11.  前記特定領域の情報に基づいて、前記画像中の前記特定領域の表示態様と、前記特定領域以外の表示態様とが異なるように、前記特定領域の情報を表示部に表示させる表示制御部をさらに有する請求項1乃至10のいずれか1項に記載の画像処理装置。 A display control unit that causes the display unit to display information on the specific area so that the display mode of the specific area in the image differs from the display mode other than the specific area based on the information of the specific area. The image processing apparatus according to any one of claims 1 to 10.
  12.  前記学習済みモデルは、前記特定領域の情報が既知である画像を教師データとして学習して得られるものである請求項1乃至11のいずれか1項に記載の画像処理装置。 The image processing apparatus according to any one of claims 1 to 11, wherein the trained model is obtained by learning an image in which information in the specific area is known as teacher data.
  13.  前記画像は、走査型電子顕微鏡、透過型電子顕微鏡、及び光学顕微鏡にいずれかによって撮影された画像である請求項1乃至12のいずれか1項に記載の画像処理装置。 The image processing apparatus according to any one of claims 1 to 12, wherein the image is an image taken by any one of a scanning electron microscope, a transmission electron microscope, and an optical microscope.
  14.  画像は、第1の材料の像と、前記第1の材料とは異なる第2の材料の像とを含む画像である請求項1乃至13のいずれか1項に記載の画像処理装置。 The image processing apparatus according to any one of claims 1 to 13, wherein the image is an image including an image of a first material and an image of a second material different from the first material.
  15.  前記特定領域の情報は、前記画像中の前記第2の材料の像の位置、及び前記第2の材料の像のサイズの少なくともいずれか一つの情報を含む請求項14に記載の画像処理装置。 The image processing apparatus according to claim 14, wherein the information of the specific area includes at least one information of the position of the image of the second material in the image and the size of the image of the second material.
  16.  前記注目領域の情報は、前記画像から抽出された領域の位置、及びサイズの少なくともいずれか一つの情報を含む請求項1乃至15のいずれか1項に記載の画像処理装置。 The image processing apparatus according to any one of claims 1 to 15, wherein the information of the region of interest includes information on at least one of the position and size of the region extracted from the image.
  17.  画像中の特定領域の情報を推論に基づいて取得する画像処理装置の制御方法であって、前記画像から、所定の推論条件に基づいて抽出された複数の注目領域の情報の各々を、学習済みモデルに入力することにより推論された、前記特定領域の情報を取得する情報取得工程を有し、
     前記複数の注目領域は、第一の注目領域と第二の注目領域とを含み、前記第一の注目領域と前記第二の注目領域とは、各々が、互いに重複する領域と、互いに重複しない領域とを有する画像処理装置の制御方法。
    It is a control method of an image processing device that acquires information of a specific region in an image based on inference, and has learned each of the information of a plurality of regions of interest extracted from the image based on a predetermined inference condition. It has an information acquisition process for acquiring information in the specific area, which is inferred by inputting to the model.
    The plurality of attention regions include a first attention region and a second attention region, and the first attention region and the second attention region do not overlap with each other and overlap with each other. A method of controlling an image processing device having an area.
  18.  データ中の識別対象情報を識別するための識別器の生成方法であって、
     前記データから作成した学習用データを複数含む初期データセットのうちの、第1の学習用データセットを用いて学習する第1の学習工程と、
     前記第1の学習工程で学習することで生成された識別器に含まれる情報と、前記初期データセットのうちの、第2の学習用データセットを用いて学習することで、前記識別器に含まれる情報を更新する第2の学習工程と、を有し、
     前記第1の学習用データセットに含まれる前記識別対象情報の量が、前記第2の学習用データセットに含まれる前記識別対象情報の量に比べて、多いことを特徴とする識別器の生成方法。
    It is a method of generating a classifier for identifying the identification target information in the data.
    A first learning step of learning using the first learning data set among the initial data sets including a plurality of learning data created from the above data, and
    The information included in the classifier generated by learning in the first learning step and the information included in the classifier by learning using the second learning data set of the initial data sets. Has a second learning process, which updates the information
    Generation of a classifier characterized in that the amount of the identification target information included in the first learning data set is larger than the amount of the identification target information included in the second learning data set. Method.
  19.  前記第2の学習用データセットが、前記第1の学習用データセットを含む請求項18に記載の識別器の生成方法。 The method for generating a classifier according to claim 18, wherein the second learning data set includes the first learning data set.
  20.  前記第1の学習用データセット及び前記第2の学習用データセットの各データセットに含まれる前記識別対象情報の量が、各データセットに含まれる識別対象情報の量の合計を各データセットに含まれる前記学習用データの数で割った値である請求項18又は19に記載の識別器の生成方法。 The amount of the identification target information contained in each data set of the first learning data set and the second learning data set is the sum of the amounts of the identification target information contained in each data set in each data set. The method for generating a classifier according to claim 18 or 19, which is a value divided by the number of training data included.
  21.  学習工程の数がn(nは2以上の整数)回であり、
     前記識別対象情報の量が、前記nが多くなるにつれて単調減少する請求項18乃至20のいずれか1項に記載の識別器の生成方法。
    The number of learning steps is n (n is an integer of 2 or more),
    The method for generating a classifier according to any one of claims 18 to 20, wherein the amount of the identification target information decreases monotonically as the n increases.
  22.  前記データが、画像データであり、前記識別対象情報が識別対象領域である請求項18乃至21のいずれか1項に記載の識別器の生成方法。 The method for generating a classifier according to any one of claims 18 to 21, wherein the data is image data and the identification target information is an identification target area.
  23.  前記識別対象情報の量が、前記画像データ中の識別対象領域の面積である請求項22に記載の識別器の生成方法。 The method for generating a classifier according to claim 22, wherein the amount of the identification target information is the area of the identification target area in the image data.
  24.  前記初期データセットが、前記画像データの一部が選択されて生成した画像を含む請求項18乃至23のいずれか1項に記載の識別器の生成方法。 The method for generating a classifier according to any one of claims 18 to 23, wherein the initial data set includes an image generated by selecting a part of the image data.
  25.  前記データが、音声データである請求項18乃至21のいずれか1項に記載の識別器の生成方法。 The method for generating a classifier according to any one of claims 18 to 21, wherein the data is voice data.
  26.  前記データが、テキストデータである請求項18乃至21のいずれか1項に記載の識別器の生成方法。 The method for generating a classifier according to any one of claims 18 to 21, wherein the data is text data.
  27.  前記初期データセットから、前記第1の学習用データセットと前記第2の学習用データセットを自動で決定する請求項18乃至26のいずれか1項に記載の識別器の生成方法。 The method for generating a classifier according to any one of claims 18 to 26, wherein the first learning data set and the second learning data set are automatically determined from the initial data set.
  28.  請求項18乃至27のいずれか1項に記載の識別器の生成方法により生成された識別器を用いて、前記データ中の前記識別対象情報を識別する識別方法。 An identification method for identifying the identification target information in the data by using the identification device generated by the identification device generation method according to any one of claims 18 to 27.
  29.  請求項18乃至27のいずれか1項に記載の識別器の生成方法により生成された識別器を有する識別装置。 A discriminating device having a discriminator generated by the method for generating a discriminator according to any one of claims 18 to 27.
  30.  データ中の識別対象情報を推定するための識別器の生成方法であって、
     入力データ、及び前記入力データに対する教師データで構成される学習データを含む第1の学習データセットと、前記第1の学習データセットよりも多い数の前記学習データを含む第2の学習データセットと、を有する学習データセット群に対して、
     前記第1の学習データセットに含まれる前記学習データの数が、前記第2の学習データセットに含まれる前記学習データの数以上となるように、前記学習データの水増しを行う、水増し工程と、
     水増しされた前記学習データを有する前記学習データセット群を用いて前記識別器を生成する生成工程と、を有し、
     前記第1の学習データセットに含まれる前記入力データが有する前記識別対象情報の量は、
     前記第2の学習データセットに含まれる前記入力データが有する前記識別対象情報の量よりも多い、生成方法。
    It is a method of generating a classifier for estimating the information to be identified in the data.
    A first training data set containing input data and training data composed of teacher data for the input data, and a second training data set containing a larger number of the training data than the first training data set. For training datasets with,
    A padding step of padding the training data so that the number of the training data included in the first training data set is equal to or greater than the number of the training data included in the second training data set.
    It has a generation step of generating the classifier using the training data set group having the inflated training data.
    The amount of the identification target information contained in the input data included in the first learning data set is
    A generation method in which the amount of identification target information contained in the input data included in the second learning data set is larger than the amount of the identification target information.
  31.  前記データが画像データであり、前記入力データが入力画像データであり、前記識別対象情報は識別対象領域である請求項30に記載の生成方法。 The generation method according to claim 30, wherein the data is image data, the input data is input image data, and the identification target information is an identification target area.
  32.  前記識別対象情報は、前記画像データ中の識別対象領域の位置、面積、及び分布の少なくともいずれか1つの情報である請求項31に記載の生成方法。 The generation method according to claim 31, wherein the identification target information is at least one information of the position, area, and distribution of the identification target area in the image data.
  33.  前記識別対象情報の量は、前記識別対象領域に含まれる画素数である請求項32に記載の生成方法。 The generation method according to claim 32, wherein the amount of the identification target information is the number of pixels included in the identification target area.
  34.  前記水増し工程が、前記入力データを回転、反転、輝度変換、歪み付与、拡大、及び縮小の少なくとも1つを行うことで、新たな入力データを生成する工程を含む請求項32または33に記載の生成方法。 32 or 33 of claim 32 or 33, wherein the padding step comprises generating new input data by performing at least one of rotation, inversion, luminance conversion, distortion addition, enlargement, and reduction of the input data. Generation method.
  35.  前記データが音のデータであり、前記入力データが音の入力データである請求項30に記載の生成方法。 The generation method according to claim 30, wherein the data is sound data and the input data is sound input data.
  36.  前記識別対象情報は、識別対象とする音である請求項35に記載の生成方法。 The generation method according to claim 35, wherein the identification target information is a sound to be identified.
  37.  前記識別対象情報の量は、前記識別対象とする音に含まれる特定の音の強度である請求項36に記載の生成方法。 The generation method according to claim 36, wherein the amount of the identification target information is the intensity of a specific sound included in the identification target sound.
  38.  前記水増し工程が、前記入力データに、1種類または複数種類の周波数の音を組み合わせた音を付与することで、新たな入力データを生成する工程を含む請求項36または37に記載の生成方法。 The generation method according to claim 36 or 37, wherein the padding step includes a step of generating new input data by adding a sound that is a combination of sounds of one type or a plurality of types to the input data.
  39.  前記データがテキストデータであり、前記入力データが入力テキストデータである請求項30に記載の生成方法。 The generation method according to claim 30, wherein the data is text data and the input data is input text data.
  40.  前記識別対象情報は、識別対象とする文字または文字列である請求項39に記載の生成方法。 The generation method according to claim 39, wherein the identification target information is a character or a character string to be identified.
  41.  前記識別対象情報の量は、前記識別対象とする文字または文字列の数である請求項39または40に記載の生成方法。 The generation method according to claim 39 or 40, wherein the amount of the identification target information is the number of characters or character strings to be identified.
  42.  前記学習データセット群には3つ以上の学習データセットを有し、
     前記学習データセット群のうち、前記識別対象情報の量が最も多い学習データを有する学習データセットについて、前記学習データセットに含まれる学習データが最も多くなるように、前記水増し工程を行う請求項30乃至41のいずれか1項に記載の生成方法。
    The training data set group has three or more training data sets.
    30. A claim 30 in which the padding step is performed so that the learning data included in the learning data set is the largest in the learning data set having the learning data having the largest amount of identification target information in the learning data set group. The generation method according to any one of items 41 to 41.
  43.  前記生成工程は、U-Netを用いて行われる請求項30乃至42のいずれか1項に記載の生成方法。 The generation method according to any one of claims 30 to 42, wherein the generation step is performed using U-Net.
  44.  請求項30乃至43のいずれか1項に記載の生成方法によって生成された識別器。 A classifier generated by the generation method according to any one of claims 30 to 43.
  45.  請求項44に記載の識別器に入力された推論用のデータに関して、前記推論用データに含まれる前記識別対象情報を推論する推論手段を有する情報処理装置。 An information processing device having an inference means for inferring the identification target information included in the inference data with respect to the inference data input to the classifier according to claim 44.
  46.  データ中の識別対象情報を推定するための識別器の生成装置であって、
     入力データ、及び前記入力データに対する教師データで構成される学習データを含む第1の学習データセットと、
     前記第1の学習データセットよりも多い数の前記学習データを含む第2の学習データセットと、を有する学習データセット群に対して、
     前記第1の学習データセットに含まれる前記学習データの数が、前記第2の学習データセットに含まれる前記学習データの数以上となるように、
     前記学習データの水増しを行う、水増し手段と、
     水増しされた学習データを有する前記学習データセット群を用いて前記識別器を生成する生成手段と、を有し、
     前記第1の学習データセットに含まれる前記入力データが有する前記識別対象情報の量は、前記第2の学習データセットに含まれる前記入力データが有する前記識別対象情報の量よりも多く、
     前記第1の学習データセットに含まれる前記入力データが有する前記識別対象情報の量は、前記第2の学習データセットに含まれる前記入力データが有する前記識別対象情報の量よりも多い、生成装置。
    It is a generator of a classifier for estimating the information to be identified in the data.
    A first training data set including input data and training data composed of teacher data for the input data, and
    For a training data set group having a second training data set containing a larger number of the training data than the first training data set.
    The number of the training data included in the first training data set is equal to or greater than the number of the training data included in the second training data set.
    An inflating means for inflating the learning data and
    It has a generation means for generating the classifier using the training data set group having the inflated training data.
    The amount of the identification target information contained in the input data included in the first learning data set is larger than the amount of the identification target information contained in the input data included in the second learning data set.
    The amount of the identification target information contained in the input data included in the first learning data set is larger than the amount of the identification target information contained in the input data included in the second learning data set. ..
PCT/JP2020/039496 2019-10-31 2020-10-21 Image processing device, image processing device control method, identifier generation method, identification method, identification device, identifier generation device, and identifier WO2021085258A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
JP2019199099 2019-10-31
JP2019-199099 2019-10-31
JP2019-217334 2019-11-29
JP2019217335 2019-11-29
JP2019217334 2019-11-29
JP2019-217335 2019-11-29

Publications (1)

Publication Number Publication Date
WO2021085258A1 true WO2021085258A1 (en) 2021-05-06

Family

ID=75715063

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/039496 WO2021085258A1 (en) 2019-10-31 2020-10-21 Image processing device, image processing device control method, identifier generation method, identification method, identification device, identifier generation device, and identifier

Country Status (2)

Country Link
JP (1) JP2021093142A (en)
WO (1) WO2021085258A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024116309A1 (en) * 2022-11-30 2024-06-06 日本電気株式会社 Image generation device, learning device, image generation method, and program

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013114596A (en) * 2011-11-30 2013-06-10 Kddi Corp Image recognition device and method
JP2015103144A (en) * 2013-11-27 2015-06-04 富士ゼロックス株式会社 Image processing device and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013114596A (en) * 2011-11-30 2013-06-10 Kddi Corp Image recognition device and method
JP2015103144A (en) * 2013-11-27 2015-06-04 富士ゼロックス株式会社 Image processing device and program

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024116309A1 (en) * 2022-11-30 2024-06-06 日本電気株式会社 Image generation device, learning device, image generation method, and program

Also Published As

Publication number Publication date
JP2021093142A (en) 2021-06-17

Similar Documents

Publication Publication Date Title
Baur et al. Generating highly realistic images of skin lesions with GANs
CN110543837B (en) Visible light airport airplane detection method based on potential target point
CN109791693B (en) Digital pathology system and related workflow for providing visualized whole-slice image analysis
CN111524106B (en) Skull fracture detection and model training method, device, equipment and storage medium
JP6710135B2 (en) Cell image automatic analysis method and system
CN105144239B (en) Image processing apparatus, image processing method
JP5174040B2 (en) Computer-implemented method for distinguishing between image components and background and system for distinguishing between image components and background
CN112598643B (en) Depth fake image detection and model training method, device, equipment and medium
JP6235921B2 (en) Endoscopic image diagnosis support system
JP2016534709A (en) Method and system for classifying and identifying individual cells in a microscopic image
JPWO2007029467A1 (en) Image processing method and image processing apparatus
Zheng et al. Improvement of grayscale image 2D maximum entropy threshold segmentation method
CN115775226B (en) Medical image classification method based on transducer
JP2020160543A (en) Information processing system and information processing method
WO2021085258A1 (en) Image processing device, image processing device control method, identifier generation method, identification method, identification device, identifier generation device, and identifier
JP2021170284A (en) Information processing device and program
Burget et al. Trainable segmentation based on local-level and segment-level feature extraction
JP2018206260A (en) Image processing system, evaluation model construction method, image processing method, and program
CN104268845A (en) Self-adaptive double local reinforcement method of extreme-value temperature difference short wave infrared image
JPH1091782A (en) Method for extracting specific site for gradation picture
Zhang et al. Simultaneous lung field detection and segmentation for pediatric chest radiographs
Gugulothu et al. A novel deep learning approach for the detection and classification of lung nodules from ct images
JP6546385B2 (en) IMAGE PROCESSING APPARATUS, CONTROL METHOD THEREOF, AND PROGRAM
JP6425468B2 (en) Teacher data creation support method, image classification method, teacher data creation support device and image classification device
Khalid et al. DeepMuCS: a framework for co-culture microscopic image analysis: from generation to segmentation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20883053

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20883053

Country of ref document: EP

Kind code of ref document: A1