WO2020129235A1 - 画像認識装置及び方法 - Google Patents
画像認識装置及び方法 Download PDFInfo
- Publication number
- WO2020129235A1 WO2020129235A1 PCT/JP2018/047224 JP2018047224W WO2020129235A1 WO 2020129235 A1 WO2020129235 A1 WO 2020129235A1 JP 2018047224 W JP2018047224 W JP 2018047224W WO 2020129235 A1 WO2020129235 A1 WO 2020129235A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- learning model
- image recognition
- learning
- feature
- feature extraction
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 41
- 230000013016 learning Effects 0.000 claims abstract description 389
- 238000000605 extraction Methods 0.000 claims abstract description 210
- 230000006835 compression Effects 0.000 claims abstract description 20
- 238000007906 compression Methods 0.000 claims abstract description 20
- 238000010801 machine learning Methods 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims description 13
- 239000000284 extract Substances 0.000 claims description 7
- 238000013461 design Methods 0.000 claims description 4
- 238000000513 principal component analysis Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 description 79
- 238000010586 diagram Methods 0.000 description 15
- 239000013598 vector Substances 0.000 description 9
- 239000004065 semiconductor Substances 0.000 description 7
- 230000007774 longterm Effects 0.000 description 6
- 230000011218 segmentation Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 239000000470 constituent Substances 0.000 description 4
- 238000007689 inspection Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000035045 associative learning Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005530 etching Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000000763 evoking effect Effects 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 238000001459 lithography Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/285—Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/69—Microscopic objects, e.g. biological cells or cellular parts
- G06V20/695—Preprocessing, e.g. image segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/69—Microscopic objects, e.g. biological cells or cellular parts
- G06V20/698—Matching; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/42—Document-oriented image-based pattern recognition based on the type of document
- G06V30/422—Technical drawings; Geographical maps
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30108—Industrial image inspection
- G06T2207/30148—Semiconductor; IC; Wafer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/06—Recognition of objects for industrial automation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Definitions
- the present invention relates to an image recognition device and method in, for example, a semiconductor pattern inspection.
- Pattern recognition using machine learning such as deep learning can extract various types of patterns from various images with high accuracy, and it can be expected to be effective even in applications where contour lines are extracted from semiconductor patterns.
- the contour line extracted from the semiconductor pattern is used for shape evaluation and the like by comparison with a design drawing of the semiconductor pattern.
- the learning model is a parameter such as a coefficient of a network structure of deep learning, and a learning sample composed of a set of an image and teacher data (inference result which is a target of learning) is set in advance according to the learning model. It is calculated using a learning operation. Due to the nature of machine learning, in order to extract a good contour line from an image, an image having image features of the image to be inferred, that is, an image similar to the inference target, is included in the learning samples used in the learning operation. Must be included. In order for contour line extraction to exhibit higher performance, it is desirable that images that are not similar to the image to be inferred are not included in the learning sample. This is because a learning model specialized in contour extraction from an image to be inferred is obtained by the learning calculation.
- the optimum learning sample refers to a learning model capable of extracting the best contour line from the image given at the time of operation.
- Patent Document 1 discloses a method of selecting an optimal learning model from a plurality of learning models on condition that the prediction error is the smallest.
- the prediction error is the error between the predicted value and the correct value when inferring using a learning model.
- Patent Document 2 discloses a method of selecting an optimum learning model from a plurality of learning models by a selection method using an index called certainty factor.
- the certainty factor is an index calculated from the intermediate processing result until the inference result is obtained using the learning model, and is a measure of the certainty of the inference result (expected value of being the correct answer).
- JP 2001-236337 A Japanese Patent Laid-Open No. 2001-339265
- Patent Document 1 and Patent Document 2 described above are useful when applied to an image recognition apparatus and method in semiconductor pattern inspection.
- the method of Patent Document 1 has a first problem that a correct value is required for selecting a learning model.
- the correct value for contour line extraction is the inference result of the contour line accurately extracted at every point in the image.
- Accurately extracted contour lines can be obtained, for example, by manually assigning correct values for contour line extraction to each pixel in the image, but preparing this for each image to be inferred is the start of operation. It takes work time and man-hours.
- Patent Literature 2 since the scale of the certainty factor of interest differs depending on the type of learning model (a mathematical model of machine learning, a network structure of deep learning, etc.), a plurality of types of learning models are to be selected. Has a second problem that is not applicable.
- the object of the present invention is that in contour line extraction using machine learning, an image in which an optimal learning model for an image at the time of inference can be selected without requiring a correct value or certainty factor
- a recognition device and method are provided.
- a feature extraction learning model group that stores a plurality of feature extraction learning models, and an associative learning that stores an associative learning model paired with the feature extraction learning model
- a feature amount extraction unit that refers to a model group and a learning model for feature extraction to extract a feature amount from input data, and data that outputs a recall result accompanied by dimension compression of the feature amount by referring to a learning model for recall
- An image characterized by including an inter-recollection unit and a learning model selection unit that selects a learning model for feature extraction from a group of learning models for feature extraction on condition that the difference between the feature amount and the recall result is minimized.
- a feature extraction learning model group that stores a plurality of feature extraction learning models
- a feature amount extraction unit that extracts a feature amount from input data with reference to the feature extraction learning model
- a common scale that allows comparison between multiple types of learning models is calculated from the scores when the amount extraction unit extracts the feature amount
- the learning model for feature extraction is used by using the common scale from the feature extraction learning model group.
- An image recognition device characterized by comprising a learning model selection unit for selecting.
- a plurality of learning models for feature extraction and a plurality of learning models for recall that are paired with the learning model for feature extraction are provided, and the learning model for feature extraction is referred to to input data.
- the feature extraction is performed by referring to the learning model for recall, and the recall result accompanied by the dimension compression of the feature is obtained, and the feature extraction learning model group
- the image recognition method is characterized in that a learning model for feature extraction is selected from the inside.
- a plurality of learning models for feature extraction are provided, a feature amount is extracted from input data with reference to the learning model for feature extraction, and a plurality of types of learning models are extracted from scores when the feature amount is extracted.
- An image recognition method characterized by calculating a common scale that can be compared with each other and selecting a learning model for feature extraction from a plurality of learning models for feature extraction using the common scale. is there.
- the feature amount is extracted from the image to be inferred, and the recall result of the feature amount is acquired. It is possible to select a learning model for feature amount extraction on the condition that the difference of is minimized.
- FIG. 1 is a diagram showing an example of a functional configuration of an image recognition apparatus according to a first embodiment of the present invention.
- the figure explaining input-output of the feature-value extraction part 1. The figure which shows the example of one typical input data 30 and one feature-value 40 calculated
- the figure explaining the input/output of the data recall part 3. The figure which shows an example of one typical characteristic amount 40 and one early result 50.
- the figure which shows the data storage method in the database which stores the learning model group M2 for feature extraction, and the learning model group M4 for recall.
- FIG. 1 The figure explaining input-output of the feature-value extraction part 1.
- FIG. 9 is a diagram showing a specific configuration example of a learning model m2a for feature extraction (left) and a corresponding learning model m4a (right) stored in the database DB of FIG. 8.
- FIG. 9 is a diagram showing a specific configuration example of a learning model for feature extraction m2b (left) and a corresponding learning model for recall m4b (right) stored in the database DB of FIG. 8.
- the figure which shows the example of the feature-values 40a and 40b which the feature-value extraction part 1 output using the learning models m2a and m2b for feature extraction.
- FIG. 6 is a diagram showing an example of a functional configuration of an image recognition device according to a third embodiment of the invention.
- FIG. 14 is a diagram showing an example of the functional configuration of an image recognition device 7A according to a first modified example of the fourth embodiment.
- FIG. 14 is a diagram showing a functional configuration example of an image recognition device 7A according to a second modified example of the fourth embodiment.
- FIG. 1 shows an example of the functional configuration of the image recognition apparatus according to the first embodiment of the present invention realized by using a computer device.
- the computer device 7 includes a feature amount extraction unit 1, a data inter-memory recall unit 3, a learning model selection unit 5, and a general process, which are processes realized by an arithmetic function such as a CPU. It is composed of a feature extraction learning model group M2 and a recall learning model group M4 realized by a database.
- the computer device 7 incorporates an input sample 10 which is a sample during operation of an image which is a target of contour extraction in a semiconductor pattern inspection.
- the feature extraction learning model group M2 stores two or more feature extraction learning models m2 in the database.
- the learning model group for recall M4 stores two or more learning models for recall m4 in the database.
- the feature extraction learning model group M2 and the recall learning model group M4 share the symbols assigned to the feature extraction and recall learning models m2 and m4, and the feature extraction and recall learning models m2 of the same symbol. , M4 are pairs learned from the same learning sample.
- the feature amount extraction unit 1 extracts a contour line (hereinafter, the contour line extracted by the feature amount extraction unit 1 will be referred to as a feature amount) from the image in the input sample 10 by referring to the learning model m2 for feature extraction.
- the feature amount is extracted from the image in the input sample 10 for each of the feature extraction learning models m2 in the feature extraction learning model group M2.
- the inter-data recall unit 3 has a function of recalling the feature amount from the feature amount by referring to the learning model m4 for recall, and recalls the feature amount from each of the feature amounts output by the feature amount extraction unit 1.
- the feature amount evoked by the inter-data recall unit 3 will be referred to as a recall result.
- the learning model selection unit 5 selects the learning model m2 in which the difference between the feature amount output by the feature amount extraction unit 1 and the feature amount output by the inter-data recall unit 3 is minimum, and the symbols assigned to the learning model m2 are selected. Output.
- Each of the functions in FIG. 1 described above can be realized by signal processing on a computer.
- the input sample 10 is a small number of samples of images from which feature quantities are extracted during operation. A small number of samples are acquired, for example, by randomly selecting images taken during operation.
- the input sample 10 is collected from a limited type of manufacturing process and the like, and a small number of samples are composed of one or a small number of types of images.
- FIG. 2 is a diagram for explaining the input/output of the feature quantity extraction unit 1.
- the function of the feature amount extraction unit 1 alone will be described with reference to FIG.
- the feature quantity extraction unit 1 focuses on one learning model m2 for feature extraction in the learning model group M2 for feature extraction, and when referring to this, learns from one input data 30 in the input sample 10 the semantic segmentation. Is used to output one feature amount 40 to the data recall unit 3.
- FIG. 3 shows an example of one typical input data 30 and one feature amount 40 obtained by using semantic segmentation for one input data 30.
- the input data 30 is an image from which contour lines are to be extracted as shown in the example on the left of FIG. 3, and each pixel in the image is, for example, 256 ⁇ 256 bit data.
- semantic segmentation is a method of machine learning that determines the category of each pixel in an image.
- the learning model m2 for feature extraction is a parameter such as a weighting factor or a threshold referred to in the semantic segmentation.
- One feature quantity 40 obtained by using the semantic segmentation in the feature quantity extraction unit 1 is shown in the example on the right side of FIG. 3, where the constituent elements (pixels) in the input data 30 are contour lines 41 and closed areas 42 ( The area surrounded by the contour line 41) and the background 43 are classified (category line extraction result).
- the relationship between the input (one input data 30) and the output (one feature quantity 40) of the feature quantity extraction unit 1 has been described with reference to FIG. 3 by way of example.
- This extraction uses the learning model m2 for feature extraction.
- the learning model m2 for feature extraction will be described next.
- the learning model m2 for feature extraction is calculated by a predetermined learning operation from a learning sample composed of one or more of a set of input data 30 and teacher data.
- the teacher data is an image of the same format as the feature amount 40 illustrated on the left side of FIG. 3, and the category of each pixel in the image is appropriately assigned. This learning calculation is optimized so that the difference between the feature amount output from the feature amount extraction unit 1 from the input data 30 included in the learning sample and the teacher data in the learning sample is minimized.
- the feature quantity extraction unit 1 refers to the learning model m2 for feature extraction and the input data 30 similar to the learning sample is given, the category of each pixel in the input data 30 is accurately determined. It becomes possible to output the determined feature amount 40.
- the feature amount extraction unit 1 refers to the learning model m2 and the input data 30 deviated from the learning sample is given, the feature amount extraction unit 1 is out of the optimization range, and thus the pixel in the feature amount 40 is erroneously determined. Another will be included. Misjudgment is likely to occur particularly in the input data 30 where the learning sample and the image appear different.
- the feature quantity extraction unit 1 stores the input data 30 (one or more) in the input sample 10 and the learning model m2 (two or more) for feature extraction included in the learning model group M2 for feature extraction.
- the feature amount 40 is extracted for each combination.
- FIG. 4 is a diagram for explaining input/output of the data recall unit 3.
- the inter-data recall unit 3 when referring to one of the learning models for recall m4 in the group of learning models for recall M4, obtains one recall result 50 from one of the feature amounts 40 by using the dimension compression and the learning model selection unit. Output to 5.
- FIG. 5 shows an example of one typical feature amount 40 and one early result 50.
- the recall result 50 on the right side of FIG. 5 is a category having the same components as the feature amount 40 (composed of categories of the contour line 41, the closed region 42, and the background 43) shown on the left side of FIG. It is composed of a region 52 and a background 53.
- the inter-data recall unit 4 retrieves the recall result for each combination of the feature amount 40 output from the feature amount extraction unit 1 and the recall learning model 14 included in the recall learning model group M4. Output 50.
- the difference between the feature amount 40 and the recall result 50 is not necessarily clear, but the recall result 50 is information obtained by dimensionally compressing the feature amount 40.
- Dimensional compression in the data recall unit 3 will be described with reference to FIG.
- Dimensional compression means that when the feature amount 40 and the recall result 50 are regarded as high-dimensional data (dimensional data of the number of pixels) composed of constituent elements (pixels), the feature amount 40 has a dimension lower than that of the feature amount 40. After being mapped (compressed) to the data 70, it refers to an operation of mapping (reconstructing) again into the dimension of the recall result 50.
- This dimensional compression if the feature amount 40 is in a predetermined range in the high dimensional space corresponding to the dimensional compressed data 70, almost no information is lost in the process of compressing the feature amount 40 into the dimensional compressed data 70. There is a property that the difference between the recall result 50 and the feature amount 40 becomes small. Contrary to dimensional compression, when the feature amount 40 deviates from a predetermined range in the high-dimensional space, information is lost in the process of compressing the feature amount 40 into the dimensional compressed data 70, and the recall result 50 And the feature amount 40 has a large difference.
- This dimensional compression can be realized by applying a general algorithm such as principal component analysis or deep learning auto encoder.
- FIG. 7 is a diagram for explaining another form of dimension compression in the data recall unit 3.
- the dimensional compression may include intermediate data 71 and 72 that map data between the feature amount 40 and the dimensional compressed data 70 or between the dimensional compressed data 70 and the recall result 50. In this case as well, the properties described above do not change.
- the learning model m4 for recall is a parameter such as a weighting factor or a threshold referred to in dimension reduction.
- the learning model m4 for recollection is obtained from a learning sample composed of one or more feature quantities 40 such that the difference between the feature quantity 40 in the learning sample and its recall result 50 becomes small.
- FIG. 8 is a diagram showing a data storage method in the database DB that stores the feature extraction learning model group M2 and the recall learning model group M4.
- the feature extraction learning model group M2 and the recollection learning model group M4 two or more stored feature extraction learning models m2 and recollection learning models m4 are shown as a and b as shown in FIG.
- the same symbol 20 is allocated and managed in the database DB, for example.
- the symbol 20 may be any symbol such as a serial number.
- the learning model m2 for feature extraction and the learning model m4 for recall to which the same symbol is assigned are a pair calculated from the same learning sample.
- the combination of the processing step S1 and the processing step S6 means that the processing between them is repeatedly executed for each learning model. Further, in this flow, the combination of the processing steps S2 and S4 means that the processing between them is repeatedly executed for each feature amount.
- the feature amount extraction unit 1 outputs the feature amount 40 for each of the feature extraction learning models m2 in the feature extraction learning model group M2 (processing step S1 to processing step S6).
- the difference between the feature amount 40 and the recall result 50 is obtained (processing step S3).
- the statistic of the difference over the plurality of feature amounts 40 is calculated from the difference of the process step S3 obtained from each of the feature amounts 40 (process step S5).
- processing of processing step S7 is entered.
- processing step S7 the minimum value of the statistic of the difference calculated in processing step S5 is obtained from the plurality of learning models m2 for feature extraction.
- processing step S8 the symbol 20 (see FIG. 8) of the learning model m2 for feature extraction when the difference in processing step S3 takes the minimum value in processing step S7 is selected. From the symbol 20 selected in the processing step S8, the learning model m2 for feature extraction and the learning model m4 for recall can be uniquely specified by referring to the database DB.
- FIG. 10a shows a specific configuration example of a learning model m2a for feature extraction (left) and a corresponding learning model for recall m4a (right) stored in the database DB of FIG.
- FIG. 10b shows a specific configuration example of the learning model m2b (left) for feature extraction and the corresponding learning model m4b (right) stored in the database DB of FIG.
- the learning model m2a for feature extraction stored in the database DB of FIG. 8 learns input data 30a and teacher data 60a, and input data 30 similar to the input data 30a and its teacher data as shown in FIG. 10a. Learned as a sample. Further, as shown in FIG. 10b, the learning model m2b for feature extraction is learned using the input data 30b and the teacher data 60b, and the input data 30 and the teacher data similar to the input data 30b as learning samples.
- the left and right closed regions 62a are separated in the central portion 64a, while in the teacher data 60b of FIG. 10b, the left and right closed regions 62a are connected in the central portion 64b.
- the learning model m4a for recall is learned in advance from the teacher data 60a, the image similar to the input data 30a, and the teacher data.
- the learning model m4b for recall is learned from images similar to the teacher data 60ab and the input data 30b and the teacher data.
- FIG. 11 is a diagram showing an example of the feature quantities 40a and 40b output by the feature quantity extraction unit 1 using the learning models m2a and m2b for feature extraction.
- the input data 30a of FIG. Therefore, the categories of the contour line 41a, the closed region 42a, and the background 43a are accurately discriminated everywhere including the central portion 44a.
- the feature amount 40b output using the learning model m2b for feature extraction includes the input data 30b (see FIG. 10b) of the learning sample and the input data 30 similar thereto. Is not similar to the input sample 10, and therefore the category of the contour line 41b, the closed region 42b, and the background 43b in the feature amount 40b includes misjudgment. Further, this misjudgment is concentrated on the central portion 44b where the difference in appearance of the image is large between the input data 30a and the input data 30b.
- FIG. 12 shows recall results 50a and 50b output from the feature quantities 40a and 40b by the inter-data recall unit 3 referring to the learning models m4a and m4b for recall.
- the recollection result 50a on the left side of FIG. 12 shows that the learning sample when the learning model m4a for recollection includes the teacher data 60a similar to the feature amount 40a, and thus the feature in the entire image including the central portion 54a. There is almost no difference between the quantity 40a and the recall result 50a.
- the recall result 50b on the right side of FIG. 12 does not include the feature amount 40 including the misjudgment in the central portion 44b like the feature amount 40b in the learning sample when the learning model m4b for recall is learned. Therefore, a large difference appears between the feature amount 40b and the central portion 54b.
- the difference derivation in the process of the processing step S3 of FIG. 9 is performed by using the vector when the feature amount 40 and the recall result 50 are high-dimensional vectors. Calculate by the distance between.
- an element vector in which the contour lines 41 and 51 of each pixel, the closed regions 42 and 52, and the backgrounds 43 and 53 in the feature amount 40 and the recall result 50 are the first, second, and third elements in order are set as the feature amount 40 and
- the Euclidean distance between feature quantity vectors (3N dimensions if the number of pixels is N) vector-combined by the number of pixels of the recall result 50 can be calculated.
- the Euclidean distance can be calculated, and the distance between the vectors can be calculated by an arbitrary scale as long as the distance between the two feature amount vectors can be measured.
- processing step S5 the statistic of the difference in the processing step S3 obtained for each of the input data 30 in the input sample 10 is calculated.
- Statistic of difference can be calculated by arithmetic average of distances of multiple feature vectors.
- any statistic can be applied as long as it is possible to obtain a representative value from a plurality of feature quantity vectors such as a harmonic mean and a median in addition to the arithmetic mean.
- the difference statistic is smaller than the difference statistic obtained by referring to the learning model m4a for recall. Then, the statistical amount of the difference obtained by referring to the learning model m4b for recall becomes large.
- processing step S7 of FIG. 9 the minimum value of the statistic of the difference in processing step S5 is calculated.
- processing step S8 the symbol 20 assigned to the learning model m2 for feature extraction when the difference statistic in processing step S5 takes the minimum value is output.
- the learning model selection unit 5 outputs information that uniquely determines the learning model m2 for feature extraction, such as the actual state and file name of the file of the learning model m2 for feature extraction specified by the symbol 20. You may do it.
- FIG. 13 is a diagram showing an example of a screen display of the learning model selection unit 5.
- the learning model selection unit 5 may use a screen display such as the screen 80 in FIG. 13 so that an operator who performs the execution control or the like of the first embodiment can visually confirm the selection result.
- the selection result 81 shows the symbols 20 in the database of FIG. 8 selected by the learning model selection unit 5 (example a in the figure).
- the numerical value of the difference between the selected learning models (statistic of the difference in the processing step S5) as in 82 or the learning model selection as in 83, so that the operator can grasp the details of the learning model selection. You may display the selection range of the symbol 20 made into the target of.
- the difference between the feature amount 40 output by the feature amount extraction unit 1 and the recall result 50 output by the inter-data recall unit 3 is obtained by the method described above, and the symbol 20 is set under the condition that the difference is the minimum.
- the image recognition apparatus is configured on the premise that the learning model is appropriately configured, but in the second embodiment, the image recognition apparatus in consideration of the learning model not properly configured. Is proposed.
- FIG. 14 shows a functional configuration example of the image recognition device 7 according to the second embodiment of the present invention.
- the image recognition apparatus 7 of FIG. 14 differs from the configuration of FIG. 1 in that a learning model suitability determination unit 106 is added and the learning model selection unit 5 of FIG. 1 is configured like a learning model reselection unit 107. That is the point.
- m2 and m4 are the learning model for feature extraction and the learning model for recall selected in the first embodiment.
- the symbol assigned to this learning model is x.
- the input sample 10 is a small number of samples of the input data 30 extracted at a predetermined timing during long-term operation of contour extraction.
- the term “long-term operation” refers to the timing at which contour extraction is continued for a predetermined period or more after the learning model is selected by the method of the first embodiment.
- the feature amount extraction unit 1 extracts the feature amount 40 from the input data 30 in the input sample 10 with reference to the learning model m2 for feature extraction.
- the inter-data recall unit 103 outputs the recall result 50 from the feature amount 40 output by the feature amount extraction unit 1 with reference to the learning model m4 for recall.
- the learning model suitability determination unit 106 added in the second embodiment makes a difference from the feature amount 40 and the recall result 50 output by the feature amount extraction unit 1 and the inter-data recall unit 3 in a procedure similar to the processing step S5 of FIG. Calculate the statistics of. Then, when the statistic of the difference becomes larger than a predetermined threshold value set in advance, it is determined that the learning model of the symbol x does not conform to the input data 30 during the long-term operation in which the input sample 10 is sampled. The result of this determination is output by displaying the screen 80 output by the learning model reselecting unit 107 (corresponding to the learning model selecting unit 5 in FIG. 1). Alternatively, it may be output to a file or notified to an external computer via a network.
- a learning model reselection unit 107 may be further provided after the learning model suitability determination unit 106.
- the learning model reselection unit 107 uses the input sample 10 as an input (replacing the old input sample 10 with the new input sample 10) when the learning model suitability determination unit 106 determines that the learning model suitability is not suitable, and follows the procedure of the first embodiment.
- a learning model 12 for feature extraction is selected.
- the property of the input data 30 is changed by the method described above in the course of long-term operation, and the learning model 12 for contour extraction selected by the method of the first embodiment becomes non-conforming. Can be detected. Furthermore, it is possible to reselect the learning model 12 that is optimum for the input sample 110 for contour extraction.
- the configuration of the second embodiment shown in FIG. 14 is that the learning model suitability determination unit 106 is installed between the inter-data recall unit 3 and the learning model selection unit 5 in the configuration of the first embodiment shown in FIG. Therefore, the learning model suitability determination unit 106 does not pass through the learning model suitability determination unit 106 at the beginning of operation of the image recognition device 7, but the learning model suitability determination unit 106 functions based on the driving experience thereafter, and the learning model selection unit 5 performs reselection. It can be called a thing.
- the teacher data necessary for designing and preparing the image recognition device 7 can be easily obtained and the learning model can be obtained. Describe what to learn. Therefore, the learning model as the learning result of the third embodiment is reflected in the first and second embodiments.
- FIG. 15 shows an example of the functional configuration of the image recognition apparatus according to the third embodiment of the present invention.
- the image recognition apparatus 7 of FIG. 15 differs from the configuration of FIG. 1 in that a teacher data creation support unit 208 and a learning model learning unit 209 are added.
- FIG. 15 does not describe the learning model selection unit 5 in FIG. 1 or the learning model suitability determination unit 106 in FIG. 14, but these functions are not described, and in actual operation, the first embodiment will be described.
- the configuration is as in the second embodiment.
- M2 and m4 are the learning model for feature extraction and the learning model for recall selected in the first embodiment.
- the input sample 10 is a set of arbitrary input data 30, and may be the input sample 10 described in the first and second embodiments, for example.
- the feature amount extraction unit 1 extracts the feature amount 40 from the input data 30 in the input sample 10 with reference to the learning model m2 for feature extraction.
- the inter-data recall unit 3 outputs the recall result 50 from the feature amount 40 output by the feature amount extraction unit 201 with reference to the learning model m4 for recall.
- the teacher data creation support unit 208 added in the third embodiment obtains the difference between the feature amount 40 and the recall result 50 output by the feature amount extraction unit 1 and the inter-data recall unit 3 in the procedure of processing step S3 in FIG.
- the user interface for teacher training in which the input points are narrowed down to a place where the difference is large, is included.
- a screen 90 in FIG. 16 is an example of a user interface of the teacher data creation support unit 208, and includes an input screen 91, an input selection 92, and an input pen 93.
- the operator can perform the work of assigning the categories of the outline 61, the closed region 62, and the background 63 by using the input data 30 as a sketch.
- the assignment of labels on the input screen 91 is performed by the operator selecting the categories of the contour line 61, the closed region 62, and the background 63 from the radio buttons of the input selection 92, and operating the input pen 93.
- the user interface in the learning sample creation support unit 208 preferably has a function of drawing the input data as a sketch and drawing the category of the characteristic amount, and further inputting the category of the characteristic amount.
- the teacher data creation support unit 208 discriminates a place with a small difference and a place with a large difference in the processing step S3.
- the small places and the large places are determined to have a large difference if the density of the difference in the processing step S3 when the input data 30 on the input screen 91 is divided into small areas is large, and it is small if the difference density is low.
- the label of the place where the difference in the processing step S3 is small is displayed in the same manner as the feature amount 40. That is, the contour line 41, the closed region 42, and the background 43 in the feature amount 40 are sequentially assigned to the contour line 61, the closed region 62, and the background 63 in the input screen 91. Then, the operator is urged to make an input on the input screen 91 by narrowing down the area having a large difference in the processing step S3.
- the place where there is a difference in the processing step S3 is (input data 30a There is a large difference between the feature amount 40b extracted from the feature amount 40b and the recall result 50b extracted from the feature amount 40b).
- the teacher data creation support unit 208 configures the learning model m2 for feature extraction and the learning model m4 for recall to include a plurality of pairs of the learning model m2 for feature extraction and the learning model m4 for recall. Even if the accuracy of the category in the screen 91 is improved by generating the category (the contour line 61, the closed region 62, the background 63) in the screen 91 from the plurality of feature amounts 40 and the recall result 50. good.
- a category in the screen 91 may be generated by obtaining a location where there is a difference in the processing step S3 from a statistic such as a mode of difference between the plurality of feature amounts 40 and the recall result 50.
- the operator may switch an appropriate one of the plurality of feature amounts 40 and the recall result 50 to be used for generating the category on the screen 91 by operating a button (not shown) on the screen 90. good.
- the learning sample creation support unit 208 obtains an input location using a plurality of feature amounts and recall results, and/or switches the input location.
- the learning model learning unit 209 added in the third embodiment uses the learning sample m2 for the feature extraction of the input data 30 in the input sample 10 and the input result of the screen 90 by using the learning sample in which the teacher data is set. To learn.
- an arbitrary learning sample may be added in addition to the learning sample so that the inference result of the feature amount 40 when the learning model is referred to is excellent.
- the learning model m4 for recall is learned in addition to the learning model m2 for feature extraction, and a new learning model m4 is learned.
- the symbol 20 may be allocated and added to the database DB of FIG.
- the learning model learning unit further learns the learning model for recall, and the learning model of the feature amount learned by the learning model learning unit is used as a feature extraction learning model group, and the learning model learning unit learns The learning model for C is added to the learning model group for feature extraction.
- a learning model m2 for optimal feature extraction with respect to the population sampled from the input sample 10 is selected by the teacher data creation support unit 208 at a location where the operator inputs it. You can learn by using limited teacher data. By narrowing down the places where the operator inputs, it is possible to reduce the man-hours for creating the teacher data as compared with the case where the teacher data is assigned to all the pixels of the input data 30 in the input sample 10.
- Example 4 describes how to easily obtain an optimal learning model.
- FIG. 17 shows an example of the functional configuration of the image recognition device 7A according to the fourth embodiment of the present invention.
- the configuration of FIG. 17 is obtained by excluding the configuration of the inter-data recall unit 3 from the configuration of FIG. 1, but the feature extraction learning model group M2, the feature amount extraction unit 1, and the learning model selection unit 5 are partially Since the handling data, the internal configuration, the processing content, and the like are different, these are represented as the feature extraction learning model group M2A, the feature amount extraction unit 1A, and the learning model selection unit 5A in FIG. 17, respectively.
- the feature-extraction learning model group M2A includes, among the learning models m2 for feature extraction, a learning model for feature extraction m2A of a type that can output a score for each category when the feature amount 40 is extracted. It is a set.
- the feature amount extraction unit 1A refers to each of the feature extraction learning models m2A in the feature extraction learning model group M2A and outputs the feature amount 40 and the score from each input data 30 in the input sample 10.
- the learning model selection unit 5A calculates a common scale with which the reliability of the category discrimination results can be compared among the learning models m2A for extracting a plurality of types of features from the score, and the common scale is set to the minimum value as a condition. An optimal learning model m2A for feature extraction is selected.
- FIG. 18 is a diagram showing a signal processing flow of the learning model selection unit 5A of FIG.
- the combination of processing step S301 and processing step S306 means that the processing between them is repeatedly executed for each learning model.
- the combination of the processing step S302 and the processing step S304 means that the processing between them is repeatedly executed for each input data 30.
- the statistical value of the common scale is calculated from the average value or the median of the common scale of each pixel in each input data 30 in the processing step S305.
- step S307 After the above iterative processing is executed for all learning models and the input data 30, the processing of step S307 is started.
- processing step S307 the maximum value of the statistics of the common scale calculated in processing step S305 is calculated.
- processing step S308 the symbol 20 of the learning model m2A for feature extraction when the common scale has the maximum value is selected.
- FIG. 19 shows an example of the common scale in the processing step S303 of FIG.
- Graphs 311 and 312 show the scores for each category obtained from the learning models m2A for extracting the different types of feature amounts.
- the type means that a mathematical model for machine learning, a network structure for deep learning, and the like in the learning model m2A for feature amount extraction are different.
- the categories in the graph 311 and the graph 312 refer to the labels allocated to the contour line 41, the closed region 42, and the background 43 that form the feature amount 40. Looking at the two scores in the graph 311 and the graph 312, the value in the graph 312 is larger than that in the graph 311, but the magnitudes cannot be compared because the types are different and the scales are different.
- the learning model m2A for feature extraction is classified into the category having the largest score.
- the more the difference between the maximum value of the score and the other values the more reliable the category discrimination can be.
- the score of the graph 312 is highest in category 3, the difference between the scores of category 1 and category 2 is small. Therefore, it can be considered that the determination of the category 3 from the graph 312 has low reliability that the determination result of the category is changed if the score changes due to a slight disturbance.
- the score of the graph 312 has a large difference between the category 3 having the largest value and the other categories 1 and 2. Therefore, it can be considered that the determination of the category 3 from the graph 311 has high reliability that the determination result of the category does not change even if there is some disturbance.
- the variation of the score is used as a common measure.
- the variation is a statistic indicating the degree of variation such as the standard deviation and entropy of the score, and the larger the value, the more the score is different between categories as shown in the graph 311.
- the degree of protrusion of the score may be used as a common measure.
- the protrusion degree is an index indicating how much the maximum value of the score is significantly larger than other scores, and for example, the difference between the maximum value of the score and the average value of the score in the graph 311, or It can be calculated by the difference between the maximum value of the score and the second largest value of the score.
- the certainty factor 1 in the graph 321 is the maximum value of the score in the graph 311. In this way, setting the maximum value of the score as the certainty factor is common in the category discrimination algorithm using machine learning.
- the certainty factor 2 in the graph 322 is the maximum value of the score in the graph 321.
- the correct answer rate in the graph 321 and the graph 322 is the expectation of the correct answer rate that indicates the probability that the correct answer is obtained when the determination result of the category when the certainty factor 1 and the certainty factor 2 have predetermined values is the population. This is a value indicator.
- the learning sample obtained when the learning model 12 for feature extraction is learned can be applied to the population, but the present invention is not limited to this, and a set of arbitrary input data 30 and its teacher data can be applied.
- the accuracy rate can be used as a common measure.
- the certainty factors calculated from the graph 311 and the graph 312 are k1 and k2
- the correct answer rates in the graph 321 and the graph 322 are y1 and y2, and y1 is higher than y2. Since the accuracy of the obtained category discrimination result is higher, the reliability is considered to be higher. Therefore, in the processing step S303, the certainty factor such as the certainty factor 1 and the certainty factor 2 can be converted into the correct answer rate to be a common index.
- processing step S303 of FIG. 18 in the learning models m2A for extracting a plurality of types of features, when the variations and the magnitudes of the protrusions are significantly different, the same procedure as described in the description of FIG. 20 is performed. It may be converted into a correct answer rate and then used as a common scale. Alternatively, in order to suppress the variation in size among the plurality of types of learning models m2A for feature extraction, a statistic amount such as the variation or the average value of the protrusion degree in the population is obtained and divided by this statistic amount. You may normalize.
- the learning model m2A for feature extraction is limited to the type capable of outputting the score when the feature quantity 40 is extracted by the method described above, a plurality of features It is possible to select the optimum one for the input sample 10 from the learning model m2A for extraction. Further, unlike Patent Document 2, even if the confidence factors of the feature extraction learning model m2A in the feature extraction learning model group M2A are different indices, the feature extraction learning model m2A can be selected. Become.
- FIG. 21 shows a functional configuration example of the image recognition device 7A according to the first modification of the fourth embodiment.
- the upper part of the image recognition device 7A of FIG. 21 adopts the configuration of FIG. 17, and the lower half is a combination of the partial offensives of FIG.
- a learning model suitability determination unit 306 that determines the suitability of the learning model m2 for feature extraction selected by the learning model selection unit 5A for the input sample 10 by using the statistics of the common scale may be provided.
- the learning model suitability determination unit 306 determines that the reliability is low and does not match if the statistic of the common scale obtained by the same procedure as the processing step S305 is smaller than a predetermined threshold.
- a learning model reselection unit 307 that selects an appropriate learning model m2 for feature extraction for the input sample 10 from the feature extraction learning model group M2. (Including the functions of the feature amount extraction unit 301 and the learning model selection unit 306) may be provided.
- FIG. 22 shows a functional configuration example of the image recognition device 7A according to the second modification of the fourth embodiment.
- the upper part of the image recognition device 7A of FIG. 22 adopts the configuration of FIG. 17, and the lower half is a combination of the partial offensives of FIG.
- the teacher data creation support unit 308 provided with a user interface narrowed down to the part of the common scale of the processing step S305 (the part where the reliability of the category discrimination of the feature amount 40 is low), and the teacher data created by the teacher data creation support unit 308.
- a learning model learning unit 309 that learns the learning model m2 for feature amount extraction using may be provided.
- the learning model m2 for optimal feature extraction with respect to the population sampled from the input sample 210 can be learned by using the teacher data in which the locations input by the worker are narrowed down. .. Further, the learning model learning unit 309 may add the learned learning model m2 for feature amount extraction to the feature extraction learning model group M2 so that the learning model reselection unit 307 can select it.
- the categories forming the feature amount 40 are not limited to the outline 41, the closed region 42, and the background 43.
- a category such as a corner point of the contour line may be added.
- categories may be omitted from the contour line 41, the closed region 42, and the background 43.
- the constituent elements of the category of the teacher data such as the recall result 50 or 60a also change.
- the feature amount 40 may be any feature amount that can be extracted from the input data 30 (that is, an image) other than the contour line described above.
- a design drawing of the input data 30 or a defect in the input data 30 may be used as the feature amount 40.
- the categories forming the teacher data such as the recall results 50 and 60a also change.
- the arbitrary feature amount is not limited to the category of each pixel as long as the recall result 50 can be acquired.
- the arbitrary feature amount may be the brightness of each pixel.
- the feature amount extraction unit 1 may perform image processing in which appropriate parameters differ depending on the input sample 10 other than the method of extracting the feature amount 40 using the machine learning described above.
- the learning model m2 for feature extraction is the parameter.
- the image processing for example, the lightness gradient and the lightness are obtained for each pixel in the input data 30, and compared with a predetermined threshold value in the parameter, and each pixel in the input data 30 is classified into a contour line 41 and a background 43. It may be a discriminator.
- the feature quantity extraction unit 1 may mix machine learning and the image processing. In this case, the feature quantity extraction unit 1 may switch between machine learning and the image processing according to the feature extraction learning model m2 in the feature extraction learning model group M2.
- the input data 30 may be any data that allows the inter-data recall unit 3 to output a recall result accompanied by dimension compression in the first to third embodiments.
- the categories forming the teacher data such as the recall results 50 and 60a also change.
- the input data 30 may be a speech voice and the feature amount 40 may be an alphabet.
- the learning model selection of the present invention can be applied to all systems using arbitrary machine learning that handles a feature amount that can be recalled from the feature amount with dimension compression, in addition to the selection of the learning model for contour line extraction.
- 1 feature amount extraction unit
- 2 feature extraction learning model group
- 3 inter-data recall unit
- 4 recall learning model group
- 5 learning model selection unit
- 10 input sample
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
Description
この結果、学習サンプルと類似した特徴量40がデータ間想起部3に与えられたときには、低次元データ70に圧縮されても失われる情報が小さい(もしくほぼ全く無い)ため、想起結果50と特徴量40の差分は小さくなる。一方で、学習サンプルと乖離した特徴量40がデータ間想起部3に与えられたときには、低次元データ70に圧縮される過程で多くの情報が失われるため、想起結果50と特徴量40の差分は大きくなる。
Claims (26)
- 複数の特徴抽出用の学習モデルを格納する特徴抽出用学習モデル群と、前記特徴抽出用の学習モデルと対になった想起用の学習モデルを格納する想起用学習モデル群と、前記特徴抽出用の学習モデルを参照して入力データから特徴量を抽出する特徴量抽出部と、前記想起用の学習モデルを参照して前記特徴量の次元圧縮を伴う想起結果を出力するデータ間想起部と、前記特徴量と前記想起結果の差分が最小になることを条件に特徴抽出用学習モデル群の中から前記特徴抽出用の学習モデルを選択する学習モデル選択部を備えることを特徴とする画像認識装置。
- 請求項1に記載の画像認識装置であって、
前記特徴量と前記想起結果の差分から前記入力データのサンプルをサンプリングした母集団に対して選択された前記特徴抽出用の学習モデルが適合するかを判定する学習モデル適否判定部を備えることを特徴とする画像認識装置。 - 請求項2に記載の画像認識装置であって、
前記学習モデル適否判定部が適合しないと判定した場合に、前記入力データのサンプルを用いて前記特徴抽出用の学習モデルを再選択することを特徴とする画像認識装置。 - 請求項1に記載の画像認識装置であって、
前記入力データのサンプルにおいて前記特徴量と前記想起結果の差分が多いところに入力箇所を絞った教師付けのユーザインタフェースを設ける教師データ作成支援部と、教師データ作成支援部で作成した教師データを用いて前記特徴抽出用の学習モデルを学習する学習モデル学習部を備えることを特徴とする画像認識装置。 - 請求項4に記載の画像認識装置であって、
前記教師データ作成支援部における前記ユーザインタフェースにおいて、前記入力データを下絵にして前記特徴量のカテゴリを描画し、さらに前記特徴量のカテゴリを入力できる機能を有すことを特徴とする画像認識装置。 - 請求項4に記載の画像認識装置であって、
前記教師データ作成支援部が複数の前記特徴量と前記想起結果を用いて前記入力箇所を求めるか、あるいは前記入力箇所を切り替えることの少なくとも一方を行うことを特徴とする画像認識装置。 - 請求項4に記載の画像認識装置であって、
前記学習モデル学習部がさらに前記想起用の学習モデルを学習し、前記学習モデル学習部が学習した前記特徴量の学習モデルを前記特徴抽出用学習モデル群に、前記学習モデル学習部が学習した前記想起用の学習モデルを特徴抽出用学習モデル群に追加することを特徴とする画像認識装置。 - 請求項1から請求項7のいずれか1項に記載の画像認識装置であって、
前記特徴量が前記入力データ中の要素のカテゴリであることを特徴とする画像認識装置。 - 請求項1から請求項8のいずれか1項に記載の画像認識装置であって、
前記入力データが画像であって、前記特徴量が輪郭線もしくは設計図であることを特徴とする画像認識装置。 - 請求項1から請求項9のいずれか1項に記載の画像認識装置であって、
前記次元圧縮が主成分分析あるいはオートエンコーダを用いて行われることを特徴とする画像認識装置。 - 請求項1から請求項10のいずれか1項に記載の画像認識装置であって、
前記特徴量抽出部の中に機械学習以外の手法を用いた特徴量抽出部がひとつ以上含まれることを特徴とする画像認識装置。 - 請求項1から請求項11のいずれか1項に記載の画像認識装置であって、
前記学習モデル選択部が前記特徴抽出用の学習モデルの選択結果、前記差分、前記特徴抽出用の学習モデルの選択の範囲のうちひとつ以上を画面表示することを特徴とする画像認識装置。 - 複数の特徴抽出用の学習モデルを格納する特徴抽出用学習モデル群と、前記特徴抽出用の学習モデルを参照して入力データから特徴量を抽出する特徴量抽出部と、前記特徴量抽出部が前記特徴量を抽出するときのスコアから複数種類の学習モデル間で比較が可能な共通尺度を計算し、特徴抽出用学習モデル群の中から前記共通尺度を用いて前記特徴抽出用の学習モデルを選択する学習モデル選択部を備えることを特徴とする画像認識装置。
- 請求項13に記載の画像認識装置であって、
前記共通尺度から選択された前記特徴抽出用の学習モデルが適合するかを判定する学習モデル適否判定部を備えることを特徴とする画像認識装置。 - 請求項14に記載の画像認識装置であって、
前記学習モデル適否判定部が適合しないと判定した場合に、前記入力データのサンプルを用いて前記特徴抽出用の学習モデルを再選択する学習モデル再選択部を備えることを特徴とする画像認識装置。 - 請求項13に記載の画像認識装置であって、
入力データのサンプルにおいて前記共通尺度が小さいところに入力箇所を絞った教師付けのユーザインタフェースを設ける教師データ作成支援部と、教師データ作成支援部で作成した教師データを用いて前記特徴抽出用の学習モデルを学習する学習モデル学習部を備えることを特徴とする画像認識装置。 - 請求項16に記載の画像認識装置であって、
前記教師データ作成支援部における前記ユーザインタフェースは、前記入力データを下絵にして前記特徴量のカテゴリを描画し、前記特徴量のカテゴリを入力できる機能を有することを特徴とする画像認識装置。 - 請求項16に記載の画像認識装置であって、
前記学習モデル学習部が学習した前記特徴量の学習モデルを特徴抽出用学習モデル群に追加することを特徴とする画像認識装置。 - 請求項13から請求項18のいずれか1項に記載の画像認識装置であって、
前記特徴量が前記入力データ中の要素のカテゴリであることを特徴とする画像認識装置。 - 請求項13から請求項19のいずれか1項に記載の画像認識装置であって、
前記入力データが画像であって、前記特徴量が輪郭線もしくは設計図であることを特徴とする画像認識装置。 - 請求項13から請求項20のいずれか1項に記載の画像認識装置であって、
前記共通尺度が前記スコアのばらつきの程度を表す統計量あるいは前記スコアの突出の程度を表す統計量であることを特徴とする画像認識装置。 - 請求項13から請求項21のいずれか1項に記載の画像認識装置であって、
記共通尺度が前記スコアから換算した正解率であることを特徴とする画像認識装置。 - 請求項13から請求項22のいずれか1項に記載の画像認識装置であって、
前記特徴量抽出部の中に機械学習以外の手法を用いた特徴量抽出部がひとつ以上含まれることを特徴とする画像認識装置。 - 請求項13から請求項23のいずれか1項に記載の画像認識装置であって、
前記学習モデル選択部が前記特徴抽出用の学習モデルの選択結果、差分、前記特徴抽出用の学習モデルの選択の範囲のうちひとつ以上を画面表示することを特徴とする画像認識装置。 - 複数の特徴抽出用の学習モデルと、前記特徴抽出用の学習モデルと対になった複数の想起用の学習モデルとを備え、前記特徴抽出用の学習モデルを参照して入力データから特徴量を抽出し、前記想起用の学習モデルを参照して前記特徴量の次元圧縮を伴う想起結果を得、前記特徴量と前記想起結果の差分が最小になることを条件に特徴抽出用学習モデル群の中から前記特徴抽出用の学習モデルを選択することを特徴とする画像認識方法。
- 複数の特徴抽出用の学習モデルを備え、前記特徴抽出用の学習モデルを参照して入力データから特徴量を抽出し、前記特徴量を抽出するときのスコアから複数種類の学習モデル間で比較が可能な共通尺度を計算し、複数の特徴抽出用の学習モデルの中から前記共通尺度を用いて前記特徴抽出用の学習モデルを選択することを特徴とする画像認識方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2018/047224 WO2020129235A1 (ja) | 2018-12-21 | 2018-12-21 | 画像認識装置及び方法 |
US17/286,604 US12014530B2 (en) | 2018-12-21 | 2018-12-21 | Image recognition device and method |
KR1020217015935A KR102654003B1 (ko) | 2018-12-21 | 2018-12-21 | 화상 인식 장치 및 방법 |
TW108139465A TWI731459B (zh) | 2018-12-21 | 2019-10-31 | 圖像辨識裝置及方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2018/047224 WO2020129235A1 (ja) | 2018-12-21 | 2018-12-21 | 画像認識装置及び方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020129235A1 true WO2020129235A1 (ja) | 2020-06-25 |
Family
ID=71102706
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/047224 WO2020129235A1 (ja) | 2018-12-21 | 2018-12-21 | 画像認識装置及び方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US12014530B2 (ja) |
KR (1) | KR102654003B1 (ja) |
TW (1) | TWI731459B (ja) |
WO (1) | WO2020129235A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020119048A (ja) * | 2019-01-18 | 2020-08-06 | 富士通株式会社 | Dnn選択プログラム、dnn選択方法および情報処理装置 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020156738A1 (en) * | 2019-01-29 | 2020-08-06 | Asml Netherlands B.V. | Methods and apparatus for controlling a lithographic process |
TWI732370B (zh) * | 2019-12-04 | 2021-07-01 | 財團法人工業技術研究院 | 神經網路模型的訓練裝置和訓練方法 |
CN116342923A (zh) * | 2022-12-16 | 2023-06-27 | 环旭电子股份有限公司 | 影像识别深度学习模型的训练方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012068965A (ja) * | 2010-09-24 | 2012-04-05 | Denso Corp | 画像認識装置 |
JP2015001888A (ja) * | 2013-06-17 | 2015-01-05 | 富士ゼロックス株式会社 | 情報処理プログラム及び情報処理装置 |
JP2017215828A (ja) * | 2016-06-01 | 2017-12-07 | 富士通株式会社 | 学習モデル差分提供プログラム、学習モデル差分提供方法、および学習モデル差分提供システム |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3743247B2 (ja) | 2000-02-22 | 2006-02-08 | 富士電機システムズ株式会社 | ニューラルネットワークによる予測装置 |
JP4478290B2 (ja) | 2000-05-29 | 2010-06-09 | マスプロ電工株式会社 | 波形補正回路 |
EP3306534B1 (en) * | 2015-06-03 | 2021-07-21 | Mitsubishi Electric Corporation | Inference device and inference method |
JP6639123B2 (ja) * | 2015-07-06 | 2020-02-05 | キヤノン株式会社 | 画像処理装置、画像処理方法、及びプログラム |
US10217236B2 (en) * | 2016-04-08 | 2019-02-26 | Orbital Insight, Inc. | Remote determination of containers in geographical region |
JP6824125B2 (ja) * | 2017-07-28 | 2021-02-03 | 株式会社日立製作所 | 医用撮像装置及び画像処理方法 |
-
2018
- 2018-12-21 US US17/286,604 patent/US12014530B2/en active Active
- 2018-12-21 KR KR1020217015935A patent/KR102654003B1/ko active IP Right Grant
- 2018-12-21 WO PCT/JP2018/047224 patent/WO2020129235A1/ja active Application Filing
-
2019
- 2019-10-31 TW TW108139465A patent/TWI731459B/zh active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012068965A (ja) * | 2010-09-24 | 2012-04-05 | Denso Corp | 画像認識装置 |
JP2015001888A (ja) * | 2013-06-17 | 2015-01-05 | 富士ゼロックス株式会社 | 情報処理プログラム及び情報処理装置 |
JP2017215828A (ja) * | 2016-06-01 | 2017-12-07 | 富士通株式会社 | 学習モデル差分提供プログラム、学習モデル差分提供方法、および学習モデル差分提供システム |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020119048A (ja) * | 2019-01-18 | 2020-08-06 | 富士通株式会社 | Dnn選択プログラム、dnn選択方法および情報処理装置 |
JP7151501B2 (ja) | 2019-01-18 | 2022-10-12 | 富士通株式会社 | Dnn選択プログラム、dnn選択方法および情報処理装置 |
Also Published As
Publication number | Publication date |
---|---|
US12014530B2 (en) | 2024-06-18 |
KR102654003B1 (ko) | 2024-04-04 |
US20210374403A1 (en) | 2021-12-02 |
TWI731459B (zh) | 2021-06-21 |
KR20210082222A (ko) | 2021-07-02 |
TW202029013A (zh) | 2020-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020129235A1 (ja) | 画像認識装置及び方法 | |
US6021220A (en) | System and method for pattern recognition | |
CN111402979B (zh) | 病情描述与诊断一致性检测方法及装置 | |
CA3066029A1 (en) | Image feature acquisition | |
JPH0636038A (ja) | 監視統計パターン認識を用いる特徴分類 | |
US20050036712A1 (en) | Image retrieving apparatus and image retrieving program | |
WO2019026134A1 (ja) | 情報処理装置および情報処理方法 | |
CN108334805A (zh) | 检测文档阅读顺序的方法和装置 | |
CN116453438B (zh) | 一种显示屏参数检测方法、装置、设备及存储介质 | |
JP2017102906A (ja) | 情報処理装置、情報処理方法及びプログラム | |
CN108470194A (zh) | 一种特征筛选方法及装置 | |
CN111414930A (zh) | 深度学习模型训练方法及装置、电子设备及存储介质 | |
US20230325413A1 (en) | Error Factor Estimation Device and Error Factor Estimation Method | |
CN114330090A (zh) | 一种缺陷检测方法、装置、计算机设备和存储介质 | |
CN111767273B (zh) | 基于改进som算法的数据智能检测方法及装置 | |
Liang et al. | Performance evaluation of document structure extraction algorithms | |
JP2004192555A (ja) | 情報管理方法、情報管理装置及び情報管理プログラム | |
CN116467466A (zh) | 基于知识图谱的编码推荐方法、装置、设备及介质 | |
Wang et al. | A study on software metric selection for software fault prediction | |
CN114022698A (zh) | 一种基于二叉树结构的多标签行为识别方法及装置 | |
JP6701479B2 (ja) | 校正支援装置、および校正支援プログラム | |
JP2017142601A (ja) | 品質予測装置、品質予測方法、プログラム及びコンピュータ読み取り可能な記録媒体 | |
CN113240021B (zh) | 一种筛选目标样本的方法、装置、设备及存储介质 | |
CN118628499B (zh) | 一种基于网络架构搜索的航空发动机叶片缺陷检测方法 | |
CN112153370B (zh) | 基于群敏感对比回归的视频动作质量评价方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18943918 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20217015935 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18943918 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: JP |