WO2023190644A1 - Dispositif d'indexation de performance, procédé d'indexation de performance et programme - Google Patents

Dispositif d'indexation de performance, procédé d'indexation de performance et programme Download PDF

Info

Publication number
WO2023190644A1
WO2023190644A1 PCT/JP2023/012736 JP2023012736W WO2023190644A1 WO 2023190644 A1 WO2023190644 A1 WO 2023190644A1 JP 2023012736 W JP2023012736 W JP 2023012736W WO 2023190644 A1 WO2023190644 A1 WO 2023190644A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
likelihood
detection
image
value
Prior art date
Application number
PCT/JP2023/012736
Other languages
English (en)
Japanese (ja)
Inventor
洋一 小倉
晋也 松山
健志 緑川
直大 岩橋
肇 片山
Original Assignee
ヌヴォトンテクノロジージャパン株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヌヴォトンテクノロジージャパン株式会社 filed Critical ヌヴォトンテクノロジージャパン株式会社
Publication of WO2023190644A1 publication Critical patent/WO2023190644A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present invention relates to a performance indexing device, a performance indexing method, and a program for accurately analyzing the performance of a model that detects an object in an image, and weaknesses in the versatility and robustness of a model learning dictionary, as well as reinforcement policies.
  • AI artificial intelligence
  • a model of neurons in the human brain and a wide variety of models have been developed to detect objects from images.
  • the quality of the augmented data is required for the training data. If the quality is not met, the augmented data becomes noise and may reduce the quality and efficiency of learning. Therefore, the editing parameters of multiple learning data obtained by editing the original data representing the judgment target are A means for determining each data, a means for generating a plurality of learning data each representing a determination target from the original data based on the parameters, and a means for learning a model using each of the plurality of learning data. , a method for improving the quality of extended data for learning has been proposed (see Patent Document 1).
  • the present invention has been made in view of the above-mentioned problems, and is a performance index for accurately analyzing the performance of a model that detects objects in images and weaknesses in the versatility and robustness of a model learning dictionary, as well as reinforcement policies.
  • the purpose is to provide devices, performance indexing methods, and programs.
  • a performance indexing device is a performance indexing device for an object detection model, and includes an image processing unit that acquires an image and processes it appropriately, and a system that processes various types of images acquired by the image processing unit.
  • an object detection model including a model preprocessing means for processing a plurality of images according to processing parameters; and a model learning dictionary for inferring an object position and likelihood with respect to input of the plurality of images processed by the model preprocessing means; , based on the inference result of the object detection model, position information and first likelihood information including the first detection frame for each detected object in the plurality of images are set to a second detection frame having an appropriate value.
  • a model post-processing means for correcting the position information including the second likelihood information and the second likelihood information, and the position information including the second detection frame which is the output result of the model post-processing means, the second likelihood information and the various types of and robustness verification means for verifying the robustness of the object detection model based on the processing parameters.
  • the performance indexing method includes an image processing step of acquiring and appropriately processing an image, and a model of processing the image acquired in the image processing step into a plurality of images according to various processing parameters.
  • a preprocessing step a preprocessing step
  • an object detection model including a model learning dictionary that infers object positions and likelihoods based on the input of the plurality of images processed in the model preprocessing step
  • an inference result of the object detection model Based on the position information including the first detection frame and the first likelihood information for each detection object in the plurality of images, the position information including the second detection frame and the second likelihood, which are appropriate values.
  • a model post-processing step for correcting information, and detecting the object based on position information including a second detection frame that is an output result of the model post-processing step, second likelihood information, and the various processing parameters. and a robustness verification step of verifying the robustness of the model.
  • a program according to one aspect of the present invention is a program for causing a computer to execute the performance indexing method described above.
  • a performance indexing device for accurately analyzing the performance of a model that detects an object in an image, and weaknesses in the versatility and robustness of a model learning dictionary, as well as reinforcement policies. is provided.
  • FIG. 1 is a diagram showing a performance indexing device for an object detection model according to an embodiment of the present invention
  • FIG. FIG. 2 is a diagram showing the configuration of an artificial neuron model.
  • FIG. 2 is a diagram illustrating the configuration of a YOLO model according to an embodiment.
  • FIG. 3 is a diagram illustrating the operating principle of the YOLO model according to an embodiment. It is a figure showing the calculation concept of the IOU value in object detection.
  • FIG. 6 is a diagram showing a flowchart of the individual identification means of the model post-processing means according to the embodiment of the present invention.
  • FIG. 6 is a diagram showing the operation of the individual identification means of the model post-processing means according to the embodiment of the present invention.
  • FIG. 6 is a diagram showing a flowchart of the individual identification means of the model post-processing means according to the embodiment of the present invention.
  • FIG. 6 is a diagram showing the operation of the individual identification means of the model post-processing means according to the embodiment of the present invention.
  • FIG. 1 is a diagram illustrating problems of a conventional object detection model performance indexing device.
  • FIG. 2 is a second diagram illustrating problems with a conventional object detection model performance indexing device.
  • FIG. 6 is a diagram illustrating the operation of the position shifting function of the model preprocessing means according to an embodiment of the invention.
  • FIG. 6 is a diagram illustrating the operation of the resizing function of the model preprocessing means according to the embodiment of the present invention.
  • FIG. 1 is a diagram illustrating problems of a conventional object detection model performance indexing device.
  • FIG. 2 is a second diagram illustrating problems with a conventional object detection model performance indexing device.
  • FIG. 6 is a diagram illustrating the operation of the position shifting function
  • FIG. 6 is a diagram showing the operation of the probability statistical calculation means of the robustness verification means according to the embodiment of the present invention.
  • FIG. 6 is a diagram showing the operation of the probability statistical calculation means of the robustness verification means according to the embodiment of the present invention.
  • FIG. 6 is a diagram showing the operation of the probability statistical calculation means of the robustness verification means according to the embodiment of the present invention.
  • FIG. 6 is a diagram showing the operation of the tone conversion function of the model preprocessing means according to the embodiment of the present invention.
  • FIG. 6 is a diagram illustrating the operation of the aspect ratio changing function of the model preprocessing means according to the embodiment of the present invention.
  • FIG. 6 is a diagram illustrating the operation of the rotation function of the model preprocessing means according to an embodiment of the invention.
  • FIG. 1 is a diagram showing a performance indexing device for an object detection model according to an embodiment of the present invention
  • FIG. FIG. 2 is a diagram showing a conventional object detection model performance indexing device.
  • FIG. 2 is a diagram illustrating a summary of the object detection model performance indexing device of the present invention.
  • AI artificial intelligence
  • image eye information
  • class identification class identification
  • one of the indicators indicating the detection reliability of the target object is the reliability shown in (Equation 1) below.
  • There is a degree score (for example, see Non-Patent Document 1).
  • the confidence score is sometimes commonly referred to as likelihood.
  • Object) indicates the class probability to which class the Object (target object) belongs, and the sum of all class probabilities is "1".
  • Pr(Object) indicates the probability that an Object is included in a BoundingBox (hereinafter referred to as BBox).
  • IOUTtruth pred is an index indicating how much the two frame areas of ground truth BBox, which is the correct frame information, and BBox predicted (inferred) by a model such as YOLO overlap, and IOU( Intersection Over Union) value.
  • IOU Area of Union ⁇ Area of Intersection (Formula 2)
  • Area of Union is the area of the union of the two frame areas to be compared.
  • Area of Intersection is the area of the common portion of the two frame regions to be compared.
  • mAP mean average precision
  • AP average precision
  • mAP and AP in object detection are calculated by the following method.
  • the Precision and Recall values are calculated from "0", which is the minimum probability, to "1", which is the maximum probability that the above-mentioned Object is included in the BBox for each class to be identified.
  • the sum of the areas of the two-dimensional graph is calculated as AP, and the average of the APs calculated for all identification classes is calculated as mAP.
  • FIG. 17 is a block diagram showing a conventional performance indexing device for analyzing the robustness and reinforcement policy of a model learning dictionary for a model that detects the position of an object in an image and identifies its class.
  • the image processing means 100 that acquires and appropriately processes images receives light emitted from a lens (for example, standard zoom, wide-angle zoom, fisheye) and an object passing through the lens, and converts the brightness of the light into electrical information.
  • the image sensor which is a device that converts into Equipped with an image processing processor equipped with a correction function and a local tone mapping function, it makes it easy to see or find the object to be detected while absorbing time-series fluctuation conditions such as illuminance in the shooting environment. Perform image processing.
  • the image generated by the image processing means 100 is input to the image output control means 110 and sent to a display and data storage means 120 such as a monitor, an external memory such as a PC (personal computer), a cloud server, etc.
  • a display and data storage means 120 such as a monitor, an external memory such as a PC (personal computer), a cloud server, etc.
  • model preprocessing means 200 may be configured with an electronic circuit, or may be realized by an image processing processor 290 configured with an affine transformation function 291, a projective transformation function 292 (library), and a CPU or an arithmetic processor. be.
  • the image processed by the model preprocessing means 200 is input to the object detection model 300, and by inference (prediction), it is detected where the target object is, and whether the object is a person or a vehicle. It is identified whether it corresponds to a class (class identification).
  • class identification class identification
  • position information 301 including zero or multiple first detection frames including undetectable and false detection position information 301 including zero or multiple first detection frames including undetectable and false detection
  • first likelihood information 302 is output.
  • the position information 301 including the first detection frame is, for example, information including the center coordinates, horizontal width, and vertical height of the detection frame
  • the first likelihood information 302 is, for example, These are the likelihood and class identification information that indicate detection accuracy.
  • the object detection model 300 includes, for example, a model learning dictionary 320 and a deep neural network (DNN) model 310 using a convolutional neural network (CNN).
  • the DNN model 310 may use, for example, YOLO (for example, see Non-Patent Document 1), SSD, etc., which are models with high superiority in detection processing speed.
  • YOLO for example, see Non-Patent Document 1
  • SSD etc.
  • FasterR-CNN EfficientDet, or the like
  • MobileNet when performing mainly class identification without detecting the position of the object, for example, MobileNet may be used.
  • the model learning dictionary 320 is a collection of weighting coefficient data of the DNN model 310, and in the case of the DNN model 310, it is initially learned or re-learned by the deep learning means 640.
  • position information 301 including zero or multiple first detection frames including undetectable and false detection and first likelihood information 302 are , after inputting it to the model post-processing means 400, the most appropriate one for each detection object is selected by sorting the position information 301 including the first detection frame based on mutual IOU values, determining the maximum of the first likelihood information 302, etc.
  • the position information 401 including the possible second detection frame and the second likelihood information 402 are corrected and sent to a display and data storage means 120 such as a monitor, an external memory such as a PC (personal computer), a cloud server, etc. be done.
  • the position information 401 including the second detection frame is, for example, information including the center coordinates, horizontal width, and vertical height of the detection frame
  • the second likelihood information 402 is, for example, These are the likelihood and class identification information that indicate detection accuracy.
  • a series of means for generating position information 401 including the second detection frame and second likelihood information 402 by these image processing means 100, model pre-processing means 200, object detection model 300, and model post-processing means 400 is as follows: This is a first performance indexing device 30 for analyzing the robustness and reinforcement policy of a model learning dictionary for a model that detects the position of an object in an image and identifies its class.
  • learning material data considered appropriate for the purpose of use is extracted from the learning material database storage means 610 in which material data for deep learning such as large-scale open source datasets are stored.
  • material data for learning necessary images depending on the purpose of use are utilized, for example, image data that is displayed from the image processing means 100 using the image output control means 110 and stored in the data storage means 120. In some cases.
  • the annotation unit 620 adds class identification information and groundtruth BBox, which is a correct answer frame, to the learning material data extracted from the learning material database storage unit 610 to create supervised data.
  • the supervised data generated by the annotation means 620 is augmented by the augmentation means 630 as a learning image 631 in order to enhance versatility and robustness.
  • the learning image 631 is input to the deep learning means 640, the weighting coefficient of the DNN model 310 is calculated, and the calculated weighting coefficient is converted into, for example, ONNX format to create the model learning dictionary 320.
  • the model learning dictionary 320 By reflecting the model learning dictionary 320 in the object detection model 300, it becomes possible to detect the position of the object in the image and identify the class.
  • Validation material data for verifying detection accuracy, detection performance, versatility, and robustness required for the purpose of use is extracted from the aforementioned learning material database storage means 610.
  • Validation material data is an image output from a large-scale open source dataset or image processing means 100, for example, to verify the detection accuracy, detection performance, versatility, and robustness required for the purpose of use.
  • the control means 110 is used to display the image data and the image data stored in the data storage means 120 is utilized.
  • the annotation unit 620 adds class identification information and groundtruth BBox, which is a correct answer frame, to the validation material data extracted from the learning material database storage unit 610 to create validation data 623.
  • the validation data 623 is input to the first mAP calculation means 660 that is capable of inference (prediction) equivalent to that of the object detection model 300, and the mAP is calculated as a result of inference (prediction) with the groundtruth BBox that is the correct answer frame.
  • IOU value 653 by comparing Predicted BBox (predicted BBox), and calculation of Precision 654 which indicates the percentage of all prediction results for all validation data 623 where IOU value 653 was correctly predicted at or above an arbitrary threshold value.
  • Recall 655 which indicates the proportion of BBoxes near the correct result whose IOU value 653 is greater than or equal to an arbitrary threshold, can be predicted, and the above-mentioned object detection accuracy and performance are compared.
  • AP (Average Precision) value 651 for each class as an index and mAP (mean Average Precision) value 652 averaged over all classes are calculated (for example, see Non-Patent Document 2).
  • the first mAP calculation means 660 is equipped with an open source inference environment called darknet and an arithmetic processor (including a personal computer and a supercomputer).
  • the object detection model 300 has the same inference (prediction) performance as the object detection model 300. Furthermore, it is provided with means for calculating the IOU value 653, Precision 654, Recall 655, AP value 651, and mAP value 652 described above.
  • a series of means for generating the IOU value 653, Precision 654, Recall 655, AP value 651, and mAP value 652 by the learning material database storage means 610, the annotation means 620, and the first mAP calculation means 660 is the This is a second performance indexing device 40 for analyzing the robustness and reinforcement policy of a model learning dictionary for a model that performs position detection and class identification.
  • the quality of the augmented data is required for the training data. If the quality is not met, the augmented data becomes noise and may reduce the quality and efficiency of learning. Therefore, the editing parameters of multiple learning data obtained by editing the original data representing the judgment target are A means for determining each data, a means for generating a plurality of learning data each representing a determination target from the original data based on the parameters, and a means for learning a model using each of the plurality of learning data. , a method for improving the quality of extended data for learning has been proposed (see, for example, Patent Document 1).
  • Versatility and robustness items and various fluctuation conditions for a model that detects objects in images acquired by a camera etc. include the background (scenery), camera lens specifications, image size, such as the height and elevation/depression angle at which the camera is mounted. Detection target area and field of view, dewarp processing method when using a fisheye lens, special conditions such as changes in illuminance depending on sunlight and lighting, crushed shadows, blown highlights, backlight, etc., sunny, cloudy, rain, snow , weather conditions such as fog, position (left, right, top, bottom, and depth) of the target detection object in the image, size, brightness level, shape and characteristics including color information, aspect ratio, rotation angle, number of target detection objects, and mutual overlap. These include the state, the type, size, and position of the attached object, whether or not the lens has IR cut, the moving speed of the object to be detected, and the moving speed of the camera itself.
  • a performance indexing method hereinafter sometimes simply referred to as a method
  • a program that generates an image When the position and size of the detection target fluctuates over time, even though the same object is being detected, the detection frame is inferred (predicted) due to the configuration conditions of the DNN model and issues caused by the algorithm.
  • Variations may occur in specific patterns in location information and likelihood information, including This phenomenon occurs when cameras for detecting objects are made smaller, consume less power, and reduce costs due to limitations in the performance of arithmetic processors such as DSPs (digital signal processors) installed in DNN models. This problem is thought to be particularly noticeable when the input image size is reduced.
  • DSPs digital signal processors
  • YOLO which is said to have a high processing speed because it simultaneously detects the position of an object and identifies its class
  • the detection Depending on the location of the object, there may be locations where the likelihood decreases in a unique grid pattern. For example, in the case of YOLO, this occurs because the area is divided into grid cells of arbitrary size and class probabilities are calculated in order to detect the object position and identify the class at the same time, as shown in Figure 3B. This is considered a potential issue.
  • the second performance indexing device 40, method, and program are used to analyze the robustness and reinforcement policy of a model learning dictionary for a model that detects the position and class of an object in an image.
  • indexing performance it is possible to understand the overall and average detection accuracy and detection performance for the validation data selected for verification, but it is important to understand in detail the versatility and robustness against various fluctuation conditions. I can't.
  • the first mAP calculation means 660 in FIG. 17 the first When the performance indexing device 30, method, and program of Conditions that require reinforcement will not be fully understood. Therefore, when a model learning dictionary is learned by deep learning or the like, improvements in versatility and robustness against various fluctuation conditions may not be sufficient.
  • the present invention has been made in view of the above-mentioned problems, and is a performance index for accurately analyzing the performance of a model that detects objects in images and weaknesses in the versatility and robustness of a model learning dictionary, as well as reinforcement policies.
  • the purpose is to provide devices, methods, and programs. Furthermore, in order to reduce the size, power consumption, and cost of cameras for detecting objects, even if performance limitations are placed on the arithmetic processors such as DSPs (digital signal processors) installed, The purpose is to provide a performance indexing device, method, and program to ensure detection accuracy and performance.
  • a performance indexing device is a device that performs performance indexing in an object detection model, and includes an image processing means for acquiring and appropriately processing an image, and an image processing means for acquiring an image and appropriately processing the image.
  • a model preprocessing means for processing an image into a plurality of images according to various processing parameters, and a model learning for inferring an object position and likelihood (degree of certainty) for the plurality of images processed by the model preprocessing means.
  • An object detection model including a dictionary, and position information including a first detection frame and first likelihood information for each detected object in the plurality of images are set to appropriate values based on the inference results of the object detection model.
  • model post-processing means for correcting position information including a second detection frame and second likelihood information; and position information including the second detection frame and second likelihood that are output results of the model post-processing unit.
  • the present invention is characterized by comprising a robustness verification means for verifying the robustness of the object detection model based on the information and the various processing parameters.
  • a performance indexing device is the performance indexing device according to the first aspect, wherein the model preprocessing means processes the plurality of images input to the object detection model.
  • the model preprocessing means processes the plurality of images input to the object detection model.
  • a performance indexing device is the performance indexing device according to the first or second aspect, wherein the model preprocessing means When processing an image of In the (pixel) step, a total of N ⁇ M ⁇ L position-shifted images are generated using N (any integer) times in the horizontal direction and M (any integer) times in the vertical direction.
  • a performance indexing device is the performance indexing device according to any one of the first to third aspects, wherein the model preprocessing means When processing the plurality of images input to the , the brightness level is set to an arbitrary value using P (arbitrary integer) types of contrast correction curves or gradation conversion curves as the various processing parameters. Generate a modified image.
  • a performance indexing device is the performance indexing device according to any one of the first to fourth aspects, wherein the model preprocessing means When processing the plurality of images input to the image processing apparatus, Q (arbitrary integer) types of aspect ratios are further used as the various processing parameters to generate images with changed aspect ratios.
  • a performance indexing device is the performance indexing device according to any one of the first to fifth aspects, wherein the model preprocessing means When processing the plurality of images that are input to the image processing apparatus, R (arbitrary integer) types of angles are further used as the various processing parameters to generate images with changed rotation angles.
  • a performance indexing device is the performance indexing device according to any one of the first to sixth aspects, wherein the model preprocessing means When processing the plurality of images input to the computer, an image is generated by pasting the average luminance level of the effective images in a blank space where no effective images are generated due to the processing.
  • a performance indexing device is the performance indexing device according to any one of the first to seventh aspects, wherein the model post-processing means For each of one or more detected objects of the output results of the object detection model existing in one image, position information including zero or more first detection frames including undetectable and false detection; and Indicates how much an arbitrary threshold T (arbitrary decimal number) for the first likelihood information overlaps with the region of position information including the first detection frame with respect to the first likelihood information. Correct the position information including the second detection frame with the maximum likelihood and the second likelihood information for each detected object using an arbitrary threshold value U (arbitrary decimal number) for the IOU (Intersection over Union) value, which is an index. It is characterized by having individual identification means.
  • a performance indexing device is the performance indexing device according to any one of the first to eighth aspects, wherein the model post-processing means If there is positional information and class identification information that include a correct detection frame, the function corrects the positional information that includes the correct detection frame according to the contents of the various processing parameters, and For each of one or more detected objects of the output results of the object detection model existing in one image, position information including zero or more first detection frames including undetectable and false detection; and For the first likelihood information, an arbitrary threshold T (arbitrary decimal number) for the first likelihood information, position information including the correct detection frame, and position information including the first detection frame.
  • T arbitrary decimal number
  • the maximum likelihood position information including the second detection frame for each detected object is calculated using an arbitrary threshold value U (an arbitrary decimal number) for the IOU (Intersection over Union) value, which is an index showing how much the regions overlap. It is characterized by comprising an individual identification means for correcting the second likelihood information.
  • U an arbitrary decimal number
  • IOU Intersection over Union
  • a performance indexing device is the performance indexing device according to the eighth or ninth aspect, wherein the model post-processing means is configured to The various processing parameters used in image processing and the output results of the individual identification means are individually linked for each detected object and output to the robustness verification means.
  • a performance indexing device includes any one of the second to tenth aspects that cite the second aspect, or the third to tenth aspects that cite the third aspect.
  • the performance indexing device according to any one aspect, wherein the robustness verification means includes position information including the second detection frame, which is an output result of the model post-processing means, and the second likelihood information.
  • a likelihood distribution indicating the variation accompanying the position shift for each of the detected objects for each of the various processing parameters based on the likelihood in the above, and an average likelihood that is the average value of the effective area of the likelihood;
  • a histogram of the likelihood, the standard deviation of the likelihood which is the standard deviation of the valid area of the likelihood, the maximum likelihood which is the maximum value of the valid area of the likelihood, and the minimum value of the valid area of the likelihood is characterized by comprising probability statistical calculation means for calculating either or all of the minimum likelihood and the IOU value corresponding to the likelihood.
  • a performance indexing device may be any one of the second to eleventh aspects that cite the second aspect, or the third to 11th aspects that cite the third aspect.
  • the robustness verification means detects the The IOU value of the position information including the second detection frame which is the output result of the model post-processing means, the position information including the correct detection frame, the class identification information in the second likelihood information, and the IOU distribution showing the variation due to the position shift for each detected object with respect to the IOU value and the class identification correct answer rate for each of the various processing parameters, based on the class identification correct answer rate calculated from the class identification information that is the correct answer.
  • an average IOU value and an average class identification correct answer rate that are the average values of the effective area of the IOU value and the class identification correct answer rate, a histogram of the IOU value and a histogram of the class identification correct answer rate, The standard deviation of the IOU value, which is the standard deviation of the effective area of the IOU value and the class identification correct answer rate, the standard deviation of the class identification correct answer rate, and the maximum value, which is the maximum value of the effective area of the IOU value and the class identification correct answer rate.
  • the probability statistical calculation means calculates either or all of the IOU value and the maximum class identification accuracy rate, the minimum IOU value and the minimum class identification accuracy rate that are the minimum values of the effective area of the IOU value and the class identification accuracy rate. It is characterized by having the following.
  • a performance indexing device is a performance indexing device according to the eleventh or twelfth aspect that cites the eleventh aspect, and the robustness verification means further comprises: For each processing parameter, extraction of a position or region where the likelihood distribution for each detected object is below an arbitrary threshold value, extraction of the detected object where the average likelihood is below an arbitrary threshold value, and extraction of the detected object where the average likelihood is below an arbitrary threshold value, and Extraction of the detected object whose standard deviation is greater than or equal to an arbitrary threshold, extraction of the detected object whose maximum likelihood is equal to or less than an arbitrary threshold, and extraction of the detected object whose minimum likelihood is equal to or less than an arbitrary threshold.
  • the present invention is characterized by having a learning reinforcement necessary item extracting unit that extracts any or all of the detected objects whose IOU value is equal to or less than an arbitrary threshold value.
  • a performance indexing device is a performance indexing device according to the twelfth or thirteenth aspect that cites the twelfth aspect, and the robustness verification means further comprises: For each of the various processing parameters, extracting a position or area where the IOU distribution for each detected object is below an arbitrary threshold value, extracting a position or area where the class identification accuracy rate distribution is below an arbitrary threshold value, and extracting the average IOU extraction of the detected object whose value is equal to or less than an arbitrary threshold; extraction of the detected object whose average class identification accuracy rate is equal to or less than the arbitrary threshold; and detection of the detected object whose standard deviation of the IOU value is equal to or greater than the arbitrary threshold.
  • the present invention is characterized by having a means for extracting necessary items for learning reinforcement that includes any or all of the extraction methods.
  • a performance indexing device is the performance indexing device according to the fourteenth aspect, in which the probability statistical calculation means of the robustness verification means and the learning reinforcement necessary item
  • the extraction means is configured to perform a probability statistical calculation based on the likelihood, the IOU value, and the class classification correct answer rate for an image in which pixels related to the target detection object are missing at an arbitrary rate. It is characterized by having a function to exclude it from calculation targets.
  • a performance indexing device is the performance indexing device according to any one of the thirteenth to fifteenth aspects, which performs analysis based on the output of the probability statistical calculation means.
  • a learning image is prepared based on the result of the learning reinforcement necessary item extraction means, and the model learning dictionary is used by the built-in or external dictionary learning means. It is characterized by relearning.
  • a performance indexing device is the performance indexing device according to any one of the first to sixteenth aspects, wherein the object detection model is created by deep learning. It is characterized by being a neural network that includes a model learning dictionary.
  • a performance indexing method is a method for creating a performance index in an object detection model, which includes an image processing step of acquiring and appropriately processing an image, and an image processing step of acquiring and appropriately processing an image.
  • a model preprocessing step for processing an image into multiple images according to various processing parameters, and model learning for inferring object positions and likelihoods (degrees of certainty) for the multiple images processed in the model preprocessing step.
  • An object detection model including a dictionary, and position information including a first detection frame and first likelihood information for each detected object in the plurality of images are set to appropriate values based on the inference results of the object detection model.
  • the method is characterized in that it includes a robustness verification step of verifying the robustness of the object detection model based on information and the various processing parameters, and the method executes each of the means as steps.
  • a performance indexing program is a program for causing a computer to perform performance indexing in an object detection model, and includes an image processing step of acquiring and appropriately processing an image, and an image processing step of acquiring and appropriately processing an image.
  • a model preprocessing step for processing the image acquired in the step into a plurality of images according to various processing parameters, and an object position and likelihood (degree of certainty) for the plurality of images processed in the model preprocessing step.
  • an object detection model including a model learning dictionary that infers the object detection model; and position information including a first detection frame and first likelihood information for each detected object in the plurality of images based on the inference result of the object detection model.
  • a model post-processing step for correcting position information including a second detection frame that is an appropriate value and second likelihood information; and position information including the second detection frame that is an output result of the model post-processing step.
  • a robustness verification step of verifying the robustness of the object detection model based on the second likelihood information and the various processing parameters; It is characterized by being a program.
  • the image from the image processing means is used as a basic image, and the model preprocessing means uses it as various processing parameters in S (arbitrary decimal) pixel (pixel) steps.
  • a total of N ⁇ M position shifted images are generated using N (any integer) times in the horizontal direction and M (any integer) times in the vertical direction, and for each of the multiple images.
  • the model post-processing means corrects it so that individual identification is possible, and the robustness verification means calculates the likelihood distribution for the position of each detected object.
  • the robustness verification means further includes a likelihood distribution indicating variations due to a position shift for each detected object, an average likelihood that is an average value of a valid region of likelihood, and a histogram of likelihoods.
  • the standard deviation of likelihood which is the standard deviation of the valid region of likelihood
  • the maximum likelihood which is the maximum value of the valid region of likelihood
  • the minimum likelihood which is the minimum value of the valid region of likelihood
  • the robustness verification means when position information including a detection frame that is a correct answer and class identification information that is a correct answer exist, the robustness verification means further calculates the IOU distribution and class Identification accuracy rate distribution, average IOU value, average class identification accuracy rate, histogram of IOU values, histogram of class identification accuracy rate, standard deviation of IOU value, standard deviation of class identification accuracy rate, maximum IOU value , the maximum class identification accuracy rate, the minimum IOU value, and the minimum class identification accuracy rate, or all of them. It becomes possible to extract a feature in which position information including the detection frame and class identification information fluctuate due to fluctuations in the position of the detected object on the screen.
  • the probability statistical calculation means of the robustness verification means and the learning reinforcement necessary item extraction means further perform the probability statistical calculation based on the likelihood, IOU value, and class classification correct answer rate.
  • the position and model of the object in the image can be used as the basis for verification. Even in cases where the effective range of the object to be detected is missing depending on the position of the object after processing various processing parameters of the preprocessing means, the performance and features of the accurate object detection model and the versatility and robustness of the model learning dictionary It becomes possible to verify the gender. Therefore, it is possible to improve the DNN model with respect to the detected object size and to enhance the versatility and robustness of the model learning dictionary.
  • the model preprocessing means further generates an enlarged or reduced image using L (arbitrary integer) types of arbitrary magnifications as various processing parameters, and then generates the above-mentioned position shifted image.
  • the robustness verification means equipped with the probability statistical calculation means calculates the likelihood distribution, the average likelihood of the effective area of the likelihood, the histogram of the likelihood, and the standard deviation of the likelihood for the position of each detected object for each of the L types of sizes. It becomes possible to check the maximum likelihood, minimum likelihood, and IOU value.
  • P (arbitrary integer) types of contrast correction curves or gradation conversion curves are used to generate an image in which the brightness level is changed to an arbitrary value.
  • the likelihood distribution and the average likelihood of the effective area of the likelihood for the position of each detected object for each of P (arbitrary integer) types of contrast correction curves or gradation conversion curves are determined by a robustness verification means equipped with a probability statistical calculation means. It becomes possible to check the histogram of the likelihood, the standard deviation of the likelihood, the maximum likelihood, the minimum likelihood, and the IOU value.
  • the distribution, histogram, standard deviation, maximum value, minimum value of the IOU value for each detected object position for each P (arbitrary integer) type of contrast correction curve or gradation conversion curve, and the distribution of the class identification accuracy rate You can check the histogram, standard deviation, maximum value, and minimum value. Therefore, it is possible to improve the DNN model and strengthen the versatility and robustness of the model learning dictionary with respect to the brightness levels of the detected object and background that change depending on the weather conditions, shooting time, and illuminance conditions of the shooting environment.
  • the robustness verification means equipped with the probability statistical calculation means
  • the likelihood distribution, the average likelihood of the effective area of the likelihood, the histogram of the likelihood, the standard deviation of the likelihood, the maximum likelihood, the minimum likelihood, and the likelihood distribution for each detected object position for each aspect ratio of Q (arbitrary integer) types It becomes possible to check the IOU value.
  • the distribution, histogram, standard deviation, maximum value, and minimum value of IOU values for the position of each detected object for each aspect ratio of Q (arbitrary integer) types, and the distribution, histogram, standard deviation, and maximum value of the class identification accuracy rate. it becomes possible to check the minimum value. Therefore, it is possible to improve the DNN model for various aspect ratios of the detected object and to enhance the versatility and robustness of the model learning dictionary.
  • the model preprocessing means uses R (arbitrary integer) types of angles as various processing parameters to generate images with changed rotation angles
  • the robustness verification means equipped with probability statistical calculation means (Arbitrary integer) Likelihood distribution for the position of each detected object by type of angle, average likelihood of the valid area of likelihood, histogram of likelihood, standard deviation of likelihood, maximum likelihood, minimum likelihood, and IOU value It becomes possible to confirm.
  • the distribution, histogram, standard deviation, maximum value, and minimum value of IOU values for the position of each detected object for each R (arbitrary integer) type of angle, and the distribution, histogram, standard deviation, maximum value, and class identification accuracy rate It becomes possible to check the minimum value. Therefore, it is possible to improve the DNN model for various inclinations of the detected object and to enhance the versatility and robustness of the model learning dictionary.
  • the model post-processing means uses a series of means for individually linking each output result and various processing parameters for each detected object and outputting them to the robustness verification means. For this purpose, it becomes possible to extract features whose likelihood changes due to fluctuations in the position of the detected object in the screen. Therefore, it is possible to more accurately extract problems related to accuracy and performance during inference that the neural network itself including the DNN model in the object detection model has latently.
  • training images are prepared and the model learning dictionary is retrained by the built-in or external dictionary learning means, thereby determining the area near the detected object.
  • various processing parameters other than position shift in any range position such as left, right, top, bottom, and depth of an object in the screen, object size, contrast, gradation, aspect ratio, rotation, etc.
  • deep learning etc. It is possible to accurately understand weaknesses and reinforcement policies in generality and robustness against various fluctuation conditions caused by the model learning dictionary that is created, by separating them from potential issues with neural networks themselves, including DNN models. Become. Therefore, effective learning image data and supervised data can be applied to deep learning and the like. Therefore, it is possible to enhance the versatility and robustness of the model learning dictionary.
  • the model preprocessing means when a plurality of images to be input to an object detection model are processed by the model preprocessing means, the average brightness level of the valid images is pasted to the blank area where no valid images exist due to the processing.
  • the model post-processing means further includes position information including zero or a plurality of first detection frames including undetectable and false detection frames and a first likelihood for each detection object present in the image.
  • IOU Intersection over By having an individual identification means that corrects the position information and second likelihood information including the second detection frame with the maximum likelihood for each detected object using an arbitrary threshold value U (arbitrary decimal number) for the union) value, abnormal data can be detected. It is possible to correct the position information and likelihood information including the detection frame for each detected object to appropriate information, so the likelihood distribution and the average likelihood of the effective area of the likelihood for the position of each detected object can be corrected.
  • the model post-processing means selects the correct detection frame according to the contents of various processing parameters. It has a function to correct the position information contained in the image, and for each detection object present in the image, it corrects the position information and first likelihood information including zero or multiple first detection frames including undetectable and false detection. is an index that indicates how much overlap between an arbitrary threshold value T (an arbitrary decimal number) for the first likelihood information and the area of the positional information including the correct detection frame and the area of the positional information including the first detection frame.
  • T an arbitrary decimal number
  • each detected object has an individual identification means that corrects each detected object to position information and second likelihood information including a second detection frame with the maximum likelihood based on an arbitrary threshold value U (arbitrary decimal number) for a certain IOU (Intersection over Union) value.
  • U arbitrary decimal number
  • IOU Intersection over Union
  • the distribution, histogram, standard deviation, maximum value, and minimum value of the IOU value for each detected object position and the distribution, histogram, standard deviation, maximum value, and minimum value of the class identification accuracy rate are compared with the correct data for accuracy. It is possible to check. Therefore, it is possible to more accurately improve the DNN model and strengthen the versatility and robustness of the model learning dictionary.
  • the robustness verification means further extracts, for each processing parameter, a position or region where the likelihood distribution for each detected object is below an arbitrary threshold, and where the average likelihood is below an arbitrary threshold. Extraction of detected objects, extraction of detected objects whose standard deviation of likelihood is greater than or equal to an arbitrary threshold, extraction of detected objects whose maximum likelihood is less than or equal to an arbitrary threshold, and extraction of detected objects whose minimum likelihood is less than or equal to an arbitrary threshold.
  • the robustness verification means further extracts a position or region that is equal to or less than an arbitrary threshold value in the IOU distribution for each detected object, and selects an arbitrary value in the class identification accuracy rate distribution for each detected object, for each various processing parameters.
  • FIG. 1 is a block diagram showing a performance indexing device 10 for detecting objects in images according to Embodiment 1 of the present invention.
  • each means, each function, and each process described in Embodiment 1 of the present invention described later may be replaced with a step, and each device may be replaced with a method.
  • each means and each device described in Embodiment 1 of the present invention may be realized as a program executed by a computer.
  • the image processing means 100 that acquires and appropriately processes images includes a lens 101, an image sensor 102 that is a device that receives light emitted from an object through the lens, and converts the brightness of the light into electrical information. Equipped with black level adjustment function, HDR (high dynamic range) composition function, gain adjustment function, exposure adjustment function, defective pixel correction function, shading correction function, white balance function, color correction function, gamma correction function, local tone mapping function, etc.
  • the main component is an image processing processor 103. Further, functions other than those described above may also be provided.
  • the lens 101 may be, for example, a standard zoom lens, a wide-angle zoom lens, a fisheye lens, or the like, depending on the purpose of object detection.
  • time-series fluctuation conditions such as illuminance are detected and controlled by various functions installed in the image processing processor 103, and the object to be detected is detected while suppressing fluctuations. Apply image processing to make it easier to see or find.
  • the image generated by the image processing means 100 is input to the image output control means 110 and sent to a display and data storage means 120 such as a monitor device, an external memory such as a PC (personal computer), a cloud server, etc. Ru.
  • the image output control means 110 may have a function of transmitting image data according to horizontal and vertical synchronization signals of the display and data storage means 120, for example.
  • the image output control means 110 also refers to the position information 401 including the second detection frame, which is the output result of the model post-processing means 400, and the second likelihood information 402, so as to mark the detected object. It may also have a function of superimposing frame depiction and likelihood information on the output image. Further, the position information 401 including the second detection frame and the second likelihood information 402 are directly transmitted to the display and data storage means 120 using a serial communication function, a parallel communication function, or a UART that converts both. There may be.
  • the image data generated by the image processing means 100 is input to the model preprocessing means 200, and the model is processed so that the image is suitable for input to the object detection model 300.
  • the input image is processed into an input image 210.
  • the object detection model 300 is a model that performs object detection using image data with only brightness levels
  • the image for object detection generated by the image processing means 100 has only brightness levels.
  • the object detection model 300 is a model that performs object detection using color image data including color information
  • the object detection model generated by the image processing means 100 may be converted into brightness data.
  • the image may be color image data having pixels such as RGB.
  • the object detection model 300 is a model that performs object detection using image data of only the brightness level, and the image for object detection generated by the image processing means 100 is A case where the luminance data is converted into luminance data having only levels will be explained.
  • the model preprocessing means 200 may be configured with electronic circuits such as adders, subtracters, multipliers, dividers, and comparators, or may be configured with functions (library ) or a fisheye lens to an image equivalent to a human visual field, and an image processing processor 290 comprising a CPU or an arithmetic processor. Note that the image processing processor 290 may be replaced by the image processing processor 103 included in the image processing means 100.
  • the model preprocessing means 200 uses the above-mentioned affine transformation function 291, projective transformation function 292, image processing processor 290, or electronic circuit to have a function of cutting out a specific area, and a function of cutting out an image when cutting out a specific area.
  • a position shift function 220 for shifting the image to an arbitrary position in the horizontal and vertical directions, a resizing function 230 for enlarging or reducing the image to an arbitrary magnification, and a rotation function 240 for rotating the image to an arbitrary angle.
  • An aspect ratio change function 250 for arbitrarily changing the ratio between the horizontal and vertical directions, a gradation conversion function 260 for changing the brightness level with an arbitrary curve, and a dewarp function for performing distortion correction, cylindrical conversion, etc. It may include some or all of the function 270 and a margin padding function 280 for padding an area where no valid pixels exist with an arbitrary brightness level.
  • the model preprocessing means 200 uses the image data generated by the image processing means 100 as a reference image, and processes it using various processing parameters 510 according to the purpose of creating a performance index. , is processed into a plurality of model input images 210 and output to the object detection model 300, and its usage and operation will be explained in the explanation of the robustness verification means 500 described later.
  • the object detection model 300 is a model that performs object detection using image data of only the brightness level
  • the object detection model 300 is a model for object detection generated by the model preprocessing means 200. A case will be described in which the input image 210 is converted into luminance data having only luminance levels.
  • the image processed by the model preprocessing means 200 is input to the object detection model 300, and by inference (prediction), it is detected where the target object is, and whether the object is a person or a vehicle. It is identified whether it corresponds to a class (class identification).
  • class identification class identification
  • position information 301 including zero or multiple first detection frames including undetectable and false detection position information 301 including zero or multiple first detection frames including undetectable and false detection
  • first likelihood information 302 is output.
  • the position information 301 including the first detection frame is, for example, information including the center coordinates, horizontal width, and vertical height of the detection frame
  • the first likelihood information 302 is, for example, These are the likelihood and class identification information that indicate detection accuracy.
  • the object detection model 300 is, for example, a deep neural network (DNN) model 310 that uses a model learning dictionary 320 and a convolutional neural network (CNN), which is a model of human brain neurons. Consists of.
  • the DNN model 310 uses, for example, YOLO (for example, see Non-Patent Document 1), SSD, etc., which are models with a high advantage in detection processing speed.
  • YOLO for example, see Non-Patent Document 1
  • SSD etc.
  • FasterR-CNN, EfficientDet, or the like may be used, for example.
  • MobileNet when performing mainly class identification without detecting the position of the object, for example, MobileNet may be used.
  • FIG. 2 shows a schematic configuration of an artificial neuron model 330 and a neural network 340, which are the basic configuration of the CNN described above.
  • the artificial neuron model 330 receives the output signals of one or more neurons such as X0, An output to the next neuron is generated through an activation function 350 for the sum of the multiplication results.
  • b is a bias (offset).
  • a collection of these many artificial neuron models is a neural network 340.
  • the neural network 340 is composed of an input layer, an intermediate layer, and an output layer, and the output of each artificial neuron model 330 is input to each artificial neuron model 330 at the next stage.
  • the artificial neuron model 330 may be realized by hardware such as an electronic circuit, an arithmetic processor, and a program.
  • the weighting coefficients of each artificial neuron model 330 are calculated as dictionary data using deep learning.
  • the dictionary data that is, the model learning dictionary 320 shown in FIG. It is something that is initially learned or re-learned.
  • the activation function 350 needs to be a non-linear transformation, since repeating a linear transformation only transforms it into a linear transformation.
  • the activation function 350 is a step function that simply identifies "0" or "1", a sigmoid function 351, a ramp function, etc.;
  • a ramp function such as ReLU (Rectified Linear Unit) 352 is often used because the calculation speed decreases.
  • ReLU352 is a function whose output value is always 0 when the input value to the function is less than or equal to 0, and whose output value is the same as the input value when the input value is greater than 0.
  • Leaky ReLU Leaky Rectified Linear Unit
  • LeakyReLU353 multiplies the input value by ⁇ if the input value is lower than 0 ( ⁇ multiplication is, for example, 0.01 times (basic)), and if the input value is higher than 0, the output value is the same value as the input value. This is the function.
  • Other activation functions 350 include a softmax function that is used when identifying the class of a detected object, and a suitable function is used depending on the purpose of use. The softmax function converts and outputs a plurality of output values so that the sum total becomes 1.0 (100%).
  • 3A and 3B are examples of the configuration of a YOLO model 360, which is one of the DNN models 310.
  • the YOLO model 360 shown in FIG. 3A may have, for example, a horizontal pixel Xi and a vertical pixel Yi as the input image size.
  • Convolution layers 370 to 387 that can compress and extract region-based feature amounts by convolving the region of surrounding pixels by filtering, and Pooling that functions to absorb positional deviation of the filter shape in the input image.
  • the basic configuration may be layers 390 to 395, a fully connected layer, and an output layer.
  • the upsampling layer 364 and 365 for upsampling using deconvolution may also be used.
  • the model input image size, the pixel size of the convolution layer, pooling layer, detection layer, upsampling layer, etc., the number and combination of various layers, the number and arrangement of detection layers, etc., depend on the intended use. It may be increased, decreased, or changed.
  • the Convolution layers 370 to 387 correspond to models of simple cells that respond to a specific shape or various shapes, and are used to recognize objects with complex shapes.
  • the Pooling layers 390 to 395 correspond to models of complex cells that function to absorb spatial deviations in shape, and when the position of an object of one shape shifts, it changes to another shape. It works so that all parts can be regarded as having the same shape.
  • Upsampling layers 364 and 365 perform class classification on the original image and use the results in each layer of the CNN as a feature map through skip connections shown at 366 and 367 in FIG. , and the third detection layer 363 enable detailed region identification. Note that the skip connections 367 and 366 connect networks having the same configuration as the convolution layers 373 and 374 after the convolution layers 385 and 381, respectively.
  • a method for calculating a confidence score 317 (corresponding to likelihood), which corresponds to the detection accuracy and confidence of the YOLO model 360 in an embodiment, will be explained with reference to FIG. 3B, in which one person is used as the detection object.
  • the model input image 311 is Divide the image area into grid cells of arbitrary size (a 7x7 example is shown in FIG. 3B).
  • a step 312 of estimating a plurality of Bounding BBoxes and Confidence (reliability) 313 (Pr(Object) ⁇ IOU), and a step 312 of estimating a plurality of Bounding BBoxes and Confidence (reliability) 313 (Pr(Object) ) 315 calculation process 314 are processed in parallel. Thereafter, both are multiplied when calculating a confidence score 317 in a final detection step 316. Therefore, processing speed can be improved by simultaneously detecting the position of the object and identifying the class.
  • a position information detection frame 318 including the first detection frame indicated by a dotted line in the final detection step 316 is a detection frame displayed as a detection result for a person.
  • the position information 301 including the first detection frame is sorted based on mutual IOU values, the maximum likelihood information 302 is determined, etc., and each detected object is The position information 401 and the second likelihood information 402 are corrected to include the second detection frame considered to be the most appropriate.
  • Position information 401 including the second detection frame and second likelihood information 402 are input to the image output control means 110 and the robustness verification means 500.
  • the position information 401 including the second detection frame is, for example, information including the center coordinates, horizontal width, and vertical height of the detection frame
  • the second likelihood information 402 is, for example, These are the likelihood and class identification information that indicate detection accuracy.
  • the IOU value will be explained with reference to FIG.
  • the denominator of the formula representing the IOU value 420 in FIG. 4(a) is the Area of Union 422 in the above-mentioned (Formula 1), which is the area of the union of the two frame areas to be compared.
  • the numerator of the formula representing the IOU value 420 in (a) of FIG. 4 is the Area of Intersection 423 in the above-mentioned (Formula 1), which is the area of the common portion of the two frame regions to be compared.
  • the maximum value is "1.0", indicating that the two frame data completely overlap.
  • the groundtruth BBox 425 which is the correct answer frame for the person 424
  • the PredictedBBox 426 which is calculated as a result of inference (prediction)
  • the IOU value 427 of both will drop to about 0.65.
  • Corrected position information 401 and second likelihood information 402 including the maximum likelihood second detection frame for each detected object using an arbitrary threshold value U (arbitrary decimal number) for the IOU (Intersection over Union) value, which is an index representing the It may also be characterized by having an individual identification means 410.
  • position information 301 including zero or a plurality of first detection frames including undetectable and false detection for each detection object and first likelihood information 302 are input.
  • position information 441, 442, 443, and 444 including the four first detection frames output from the object detection model 300 and the four first likelihood information It is assumed that likelihoods 445, 446, 447, and 448 are input.
  • the likelihood in the first likelihood information 302 is compared with the threshold value "T", and if the likelihood is determined to be false because the likelihood is less than the threshold value "T", in the deletion step S433, the corresponding The position information 301 including the first detection frame and the first likelihood information 302 are deleted from the calculation target, and if the likelihood is equal to or higher than the threshold "T", it is determined to be true, and the mutual IOU value calculation step S434 is performed. , a process is performed to calculate the IOU value of the mutual combination of the position information 301 including all the first detection frames to be calculated. In FIG.
  • comparison step S435 all mutual IOU values are compared with the threshold value "U", and if the mutual IOU value is less than the threshold value "U” and determined to be false, it is determined that the detection results are independent. Then, in output step S437, the position information 401 including the second detection frame and the second likelihood information 402 are outputted, and if the mutual IOU value is equal to or greater than the threshold value "U", it is determined to be true, and the same detection It is assumed that the object is detected redundantly, and the process proceeds to the next maximum likelihood determination step S436. In FIG.
  • the information other than the one with the maximum likelihood is determined to be false, and in the deletion step S433, the position information 301 including the corresponding first detection frame and the first detection frame are determined to be false.
  • the likelihood information 302 of is deleted from the calculation target, and the one with the maximum likelihood is determined to be true, and in output step S437, the position information 401 including the second detection frame and the second detection frame are deleted. It may also be output as likelihood information 402. In FIG.
  • the first likelihood information including likelihood 447 (0.75) and the position information 443 including the first detection frame are deleted from the calculation target, and the position including the first likelihood information including the likelihood 448 (0.92) determined to be the maximum likelihood and the first detection frame is calculated.
  • the information 444 is output as position information 452 including the second detection frame and second likelihood information including the likelihood 454 (0.92) in output step S437.
  • the mutual IOU value threshold "U" is set low, when there are multiple detected objects, the detection results of multiple detected objects will be merged more than expected, especially for objects that are close to each other. Leaks are more likely to occur. On the other hand, if the value is set high, duplicate detection results may remain even though the same object is detected. Therefore, it is desirable to set appropriately according to the performance of the object detection model 300.
  • the individual identification means 410 may perform individual identification using a combination of steps other than the flowchart shown in FIG. 5A.
  • the class identification information in the first likelihood information 302 may be used to limit the objects for which mutual IOU values are calculated in the mutual IOU value calculation step S434 to the same class, or the maximum When determining the likelihood, processing for determining the maximum likelihood within the same class may be added.
  • model post-processing means 400 and the individual identification means 410 as shown in FIGS. 5A and 5B, it is possible to eliminate abnormal data and to obtain position information 401 including a second detection frame and second likelihood information for each detected object. 402 can be corrected to appropriate information.
  • the position information 301 includes zero or more first detection frames including undetectable and false detection.
  • the likelihood information 302 of Correction is made to position information 401 including the second detection frame with the maximum likelihood and second likelihood information 402 for each detected object using an arbitrary threshold value U (arbitrary decimal number) for the IOU value, which is an index showing how much overlap. It may also be characterized by having an individual identification means 410.
  • the annotation means 620 may create supervised data by adding class identification information and a groundtruth BBox, which is a correct answer frame, to the image stored in the display and data storage means 120, for example.
  • the position information 621 including the correct detection frame and the correct class identification
  • position information 301 including zero or a plurality of first detection frames including undetectable and false detection for each detection object and first likelihood information 302 are input.
  • position information 621 including a detection frame that is the correct answer for each detected object and class identification information 622 that is the correct answer are input.
  • position information 471, 472, 473, and 474 including the four first detection frames output from the object detection model 300, and four first likelihood information It is assumed that likelihoods 475, 476, 477, and 478 are input.
  • the likelihood in the first likelihood information 302 is compared with the threshold value "T", and if the likelihood is determined to be false because the likelihood is less than the threshold value "T", in the deletion step S433, the corresponding The position information 301 including the first detection frame and the first likelihood information 302 are deleted from the calculation target, and if the likelihood is equal to or higher than the threshold value “T”, it is determined to be true, and the IOU value with the correct frame is calculated.
  • step S461 processing is performed to calculate the IOU value of the combination of the position information 301 including all the first detection frames to be calculated for each piece of position information 621 including the correct detection frame.
  • the first likelihood information included is deleted from the calculation target. There are three calculation candidates remaining, and the IOU values of the position information 471, 473, and 474 including the first detection frame are calculated for each of the position information 480 and 481 including the correct detection frame.
  • comparison step S462 all IOU values are compared with the threshold value "U", and if the IOU value for the position information 621 including the correct detection frame is less than the threshold value "U" and determined to be false, It is determined that the position information 301 and the first likelihood information 302 including the corresponding first detection frame are deleted from the calculation target in a deletion step S433, and the IOU value is set to the threshold “U”. ⁇ If it is, it is determined to be true, and it is regarded as a detection target candidate with a small difference from the correct answer frame, and the process proceeds to the next class identification determination step S463. In FIG. 6B, candidates that are determined to be false are not applicable, and the three calculation candidates become the determination targets of the class identification determination step S463.
  • class identification determination step S463 the class identification information 622 that is the correct answer and the class identification information in the first likelihood information 302 that is the correct answer are compared, and if they are identified as different classes, it is determined to be false. Then, in a deletion step S433, the position information 301 including the corresponding first detection frame and the first likelihood information 302 are deleted from the calculation target, and if they are identified as the same class, it is determined to be true. The process then proceeds to the next maximum likelihood determination step S436. In FIG. 6B, assuming that all the candidates are determined to be "human" as a result of class identification, the three calculation candidates are directly subjected to the determination in the maximum likelihood determination step S436.
  • the maximum likelihood determination step S436 the information other than the one with the maximum likelihood is determined to be false, and in the deletion step S433, the position information 301 including the corresponding first detection frame and the first detection frame are determined to be false.
  • the likelihood information 302 of is deleted from the calculation target, and the one with the maximum likelihood is determined to be true, and in output step S464, the position information 401 including the second detection frame and the second detection frame are deleted.
  • the likelihood information 402 and the calculated IOU value may be output.
  • the maximum likelihood is calculated from the two likelihoods 477 (0.75) and 478 (0.92).
  • the first likelihood information including the likelihood 477 (0.75) and the position information 473 including the first detection frame are deleted from the calculation target, and the likelihood determined to be the maximum likelihood is 478 (0.92) and position information 474 including the first detection frame are combined with position information 491 including the second detection frame and second likelihood information 493 (0.92). It is output as likelihood information in output step S464. Further, an IOU value of 495 (0.85) is output in output step S464.
  • Position information 471 including one detection frame is output as position information 490 including a second detection frame and second likelihood information including likelihood 492 (0.85) in output step S464. Further, the IOU value 494 (0.73) is outputted in output step S464.
  • the threshold value "U" of the IOU value with the correct answer frame is set lower than that of the individual identification means 410 described with reference to FIGS. 5A and 5B, and more calculation candidates are left, the detection results in a correct answer. Since direct comparison can be made with the position information 621 including the frame, there is an advantage that detection omissions are less likely to occur and the accuracy of the detection results is improved. Furthermore, by arbitrarily changing the threshold value “U” and processing, it is also possible to understand and verify the accuracy of the detection frame of the position information 301 including the first detection frame calculated by the object detection model 300. Become.
  • model post-processing means 400 and the individual identification means 410 as shown in FIGS. 6A and 6B, it is possible to eliminate abnormal data and to obtain position information 401 including a second detection frame and second likelihood information for each detected object. 402 can be corrected to appropriate information.
  • a series of means for generating position information 401 including the second detection frame and second likelihood information 402 using the image processing means 100, model pre-processing means 200, object detection model 300, and model post-processing means 400 is as follows: This was a conventional first performance indexing device 30 shown in FIG. 17 for analyzing the robustness and reinforcement policy of a model learning dictionary for a model that detects the position of an object in an image and identifies its class.
  • a one-stage DNN model is typified by a one-stage DNN model that is said to have a high processing speed because it simultaneously detects the position of an object and identifies its class.
  • the case where the YOLO model 360 is applied will be explained using FIGS. 7A and 7B. As shown in FIG.
  • the position of the person in the image is calculated even though the same person is detected.
  • the respective likelihoods may vary greatly, such as 0.92, 0.39, and 0.89.
  • the distance between the camera and the person is 1 m in image 204, 2 m in image 205, and 3 m in image 206, as a result of the change in the size of the person and the position in the image.
  • the positional information 211, 212, and 213 including the detection frame and the likelihoods 217, 218, and 219 in the second likelihood information are calculated, when considering the performance of the original YOLO model, the person size is small. It is known that the detection accuracy and performance deteriorate as the distance between the person and the person increases.
  • the degree 217 is 0.92, and the likelihood 219 in the second likelihood information of the image 206 with the detected object distance of 3 m is 0.71, whereas the second likelihood information of the image 205 with the detected object distance of 2 m is Irregular results may be obtained in which the likelihood 218 is significantly reduced to 0.45.
  • FIG. 1 When processing a plurality of model input images 210, as various processing parameters 510, in S (arbitrary decimal) pixel steps, N (arbitrary integer) times in the horizontal direction and M (arbitrary integer) in the vertical direction. It may also include a position shift function 220 that uses the position shifts to generate a total of N ⁇ M position shifted model input images 221 to 224. Further, it may be provided with a function of cutting out an arbitrary area. Note that the position shift function 220 may be a function realized by executing the affine transformation function 291 or the projective transformation function 292 in the image processing processor 290.
  • the model preprocessing means 200 when processing the plurality of model input images 210 to be input to the object detection model, the model preprocessing means 200 further sets L (arbitrary integer) types of arbitrary processing parameters 510 to the object detection model.
  • the resizing function 230 generates an enlarged or reduced image using a magnification of Even if it includes a position shift function 220 that uses M (any integer) position shifts in the vertical direction to generate a total of N ⁇ M ⁇ L resized and position-shifted model input images 210. good. Further, it may be provided with a function of cutting out an arbitrary area. Note that the position shift function 220 and the resizing function 230 may be realized by executing the affine transformation function 291 and the projective transformation function 292 in the image processing processor 290.
  • a plurality of model input images 210 processed by the position shift function 220 and resizing function 230 of the model pre-processing means 200 as shown in FIGS. 8 and 9 are combined with the object detection model 300 shown in FIG. 1 and the model post-processing means 400, the position information 401 including the second detection frame and the second likelihood information 402 are calculated for each of the plurality of model input images 210, and then the versatility and the object detection model 300 are calculated based on various processing parameters 510. It is input to a robustness verification means 500 that verifies robustness.
  • the items and various variable conditions to be verified by the robustness verification means 500 include, for example, the background (scenery), camera lens specifications, the height at which the camera is mounted, etc.
  • Detection target area and field of view including image size such as elevation/depression angle, dewarping processing method when using a fisheye lens, illuminance changes depending on sunlight and lighting, special conditions such as blackout, overexposure, backlighting, clear weather, etc.
  • Weather conditions include cloudy weather, rain, snow, and fog.
  • the position (left, right, top, bottom, and depth) of the target detection object in the image size, brightness level, shape and characteristics including color information, aspect ratio, rotation angle, number of target detection objects, mutual overlap status, and attachments are also included. These include the type, size, attached position, whether or not the lens has IR cut, the moving speed of the target detection object, and the moving speed of the camera itself.
  • items and conditions other than those described above may be added.
  • Various processing parameters 510 are set based on these various conditions and items. Alternatively, various processing parameters 510 are selected or determined.
  • Various processing parameters 510 are input to model pre-processing means 200 and model post-processing means 400.
  • Various processing parameters 510 input to the model preprocessing means 200 include parameters related to the position shift function 220 for verifying the influence of fluctuations accompanying the object position, camera lens specifications, the height and elevation/depression angle at which the camera is mounted, etc. It is a combination of parameters related to the resizing function 230 to verify the versatility and robustness of the detection target area including the image size and the object size of the field of view, such as the conditions for , and other multiple parameters described below. It's okay to have one.
  • the model post-processing means 400 individually links various processing parameters 510 used in processing the plurality of images by the model pre-processing means 200 and the output results of the individual identification means 410 for each detected object.
  • the detected detection result 403 (including position information 401 including the second detection frame, second likelihood information 402, etc.) may be output to the robustness verification means 500.
  • the robustness verification means 500 is based on the likelihood in the position information 401 including the second detection frame and the second likelihood information 402, which are the output results of the model post-processing means 400.
  • a likelihood distribution 540 indicating the variation due to the position shift of each detected object, an average likelihood 501 which is the average value of the effective area of the likelihood, a histogram 550 of the likelihood, and a likelihood
  • the standard deviation of likelihood 502 which is the standard deviation of the valid region of likelihood
  • the maximum likelihood 503 which is the maximum value of the valid region of likelihood
  • the minimum likelihood 504 which is the minimum value of the valid region of likelihood
  • the likelihood It may be characterized by comprising a probability statistical calculation means 520 that calculates any or all of the IOU values 505 for the IOU values 505.
  • the robustness verification means 500 uses the output result of the model post-processing means 400 when there is position information 621 including a correct detection frame and correct class identification information 622 for each detected object. Calculated from the IOU value of the position information 401 including the second detection frame, the position information 621 including the correct detection frame, the class identification information in the second likelihood information 402, and the correct class identification information 622. Based on the class identification accuracy rate, the IOU distribution and class identification accuracy rate distribution showing the variation due to the position shift of each detected object with respect to the IOU value and class identification accuracy rate, IOU value and class identification are calculated for each of various processing parameters 510.
  • the average IOU value which is the average value of the effective area of the correct answer rate, the average class identification correct answer rate, the histogram of the IOU value, the histogram of the class identification correct answer rate, and the standard deviation of the effective area of the IOU value and the class identification correct answer rate.
  • the standard deviation of the IOU value and the standard deviation of the class identification accuracy rate, the maximum IOU value and the maximum class identification accuracy rate which are the maximum values of the valid area of the IOU value and the class identification accuracy rate, and the effective area of the IOU value and the class identification accuracy rate.
  • the present invention may be characterized in that it includes a probability statistical calculation means 520 that calculates either or both of the minimum IOU value, which is the minimum value of , and the minimum class identification correct answer rate.
  • the robustness verification means 500 further extracts a position or region in which the likelihood distribution 540 for each detected object is equal to or less than an arbitrary threshold value for each of the various processing parameters 510, and extracts a position or region where the average likelihood 501 is an arbitrary value.
  • the present invention is characterized by having learning reinforcement necessary item extraction means 530 that includes one or both of the following: extracting a detected object whose IOU value 504 is equal to or less than an arbitrary threshold value; and extracting a detected object whose IOU value 505 is equal to or less than an arbitrary threshold value. It may be something that you do.
  • the robustness verification means 500 further extracts a position or region that is equal to or less than an arbitrary threshold value in the IOU distribution for each detected object, and extracts an arbitrary value in the class identification accuracy rate distribution for each of the various processing parameters 510. Extraction of a position or area where the average IOU value is below a threshold value, extraction of detected objects whose average IOU value is below an arbitrary threshold value, extraction of detected objects whose average class identification accuracy rate is below an arbitrary threshold value, and standard deviation of the IOU value.
  • extraction of detected objects for which the standard deviation of the class classification accuracy rate is greater than or equal to an arbitrary threshold extraction of detected objects for which the maximum IOU value is less than or equal to an arbitrary threshold; Extraction of detected objects whose class identification accuracy rate is below an arbitrary threshold, extraction of detected objects whose minimum IOU value is below an arbitrary threshold, and extraction of detected objects whose minimum class identification accuracy rate is below an arbitrary threshold. It may be characterized by having learning reinforcement necessary item extraction means 530 that includes any or all of them.
  • the probability statistical calculation means 520 of the robustness verification means 500 and the learning reinforcement necessary item extraction means 530 perform probability statistical calculations based on the likelihood, IOU value, and class identification correct answer rate. Furthermore, it may be characterized by having a function of excluding from the calculation target images in which pixels related to the target detection object are missing at an arbitrary rate.
  • model learning dictionary 320 It becomes possible to specify the reinforcement targets of the model learning dictionary 320 based on the detected objects and judgment conditions extracted using the learning reinforcement necessary item extraction means 530. Furthermore, it is also possible to extract problems for the object detection model 300. Furthermore, the versatility and robustness of the model learning dictionary 320 can be improved by inputting this extracted information 531 into a dictionary learning means 600, which will be described later in the second embodiment, and reflecting it in the selection of learning materials, the augmentation method, and the learning parameters. It becomes possible to strengthen it.
  • the versatility and robustness of the object detection model 300 is input by inputting a detection result 403 in which various processing parameters 510, position information 401 including the second detection frame, and second likelihood information 402 are linked individually for each detected object.
  • 1 is an embodiment of a performance indexing device 10 in object detection of the present invention for analysis.
  • the performance indexing device for object detection of the present invention may further include a dictionary learning means 600 of Embodiment 2 described later for generating a model learning dictionary 320, and a second mAP calculation means 650. good.
  • FIG. FIG. 10 and FIG. 11 show the results of analyzing the irregular variation phenomenon in the likelihood, etc., which is the detection result by the object detection model 300, with respect to , and the size of the detected object.
  • the analysis results shown in FIGS. 10 and 11 are obtained by setting Xi, which is the number of pixels in the horizontal direction, of the plurality of model input images shown in FIGS. 7A, 7B, 8, and 9 to 128, and pixels in the vertical direction. This is a case where the number Yi is set to 128. Furthermore, the detection target is one person.
  • the resizing function 230 of the model preprocessing means 200 is used to resize three types (three types L) of a standard size image 232, a 30% reduced image 231, and a 30% enlarged image 233.
  • the analysis results shown in FIGS. 10 and 11 are based on the YOLO model 360 (object detection model) shown in FIGS. 3A and 3B, which has 128 input pixels in the horizontal direction and 128 pixels in the vertical direction.
  • the position information 401 including the second detection frame for one person and the second
  • the likelihood information 402 of a likelihood distribution 540 indicating dispersion an average likelihood 501 that is the average value of the valid region of likelihood, a histogram 550 of likelihood, and a standard deviation of likelihood 502 that is the standard deviation of the valid region of likelihood;
  • This is the result of calculating the maximum likelihood 503, which is the maximum value of the valid region of likelihood, and the minimum likelihood 504, which is the minimum value of the valid region of likelihood.
  • the likelihood distribution 540, average likelihood 501, likelihood histogram 550, likelihood standard deviation 502, maximum likelihood 503, and minimum likelihood 504 are based on the maximum likelihood value “1”.
  • the likelihood may also be expressed as a percentage (%) of "100%".
  • the likelihood is expressed in percentage (%). Note that it is also possible to directly process the value as a decimal number without converting it to a percentage.
  • the class identification distribution and statistical results for the class identification information in may be calculated.
  • various processing parameters 511 to 513 shown in FIG. is linked to the information 402, and may be used when the probability statistical calculation means 520 calculates analysis results for each of various processing parameters.
  • Likelihood distributions 541, 542, and 543 shown in FIG. 10 correspond to the level of likelihood (%) for fluctuations in the position (in pixels) of a person on the screen according to the gray scale bar 521 from white to black. It is displayed in shades of white (corresponding to 0% likelihood) to black (corresponding to 100% likelihood).
  • This corresponds to a mapping of the likelihood calculated in .
  • the likelihood distributions 541, 542, and 543 indicate that the stronger the black level, the higher the likelihood, and conversely, the stronger the white level, the lower the likelihood.
  • S, N, and M which are the various processing parameters 510 of the position shift function 220, may be changed depending on the use and purpose.
  • S which is the pixel step setting, may be set to different values in the horizontal direction and the vertical direction. Setting S to a small value has the advantage of allowing detailed verification, but has the disadvantage of increasing calculation processing time.
  • the processing parameters for position shifting N times in the horizontal direction and M times in the vertical direction are preferably set to appropriate values that allow verification of positional fluctuations, depending on the structure of the object detection model 300.
  • the likelihood histogram 551 shown in FIG. 11 is obtained by normalizing the frequency of the likelihood (%) calculated by the probability statistical calculation means 520 for the likelihood distribution 541 shown in FIG. .0). Further, the statistical result 561 displays the average likelihood (%), standard deviation of the likelihood (%), maximum likelihood (%), and minimum likelihood (%) for the likelihood distribution 541. Furthermore, the likelihood 571 of the conventional method corresponds to the likelihood calculated by the conventional first performance indexing device 30 described above, which is a pinpoint value of the model input image 231 serving as a reference image for position shift shown in FIG. The calculated likelihood is displayed. Similarly, likelihood histograms 552 and 553, statistical results 562 and 563, and likelihoods 572 and 573 of the conventional method shown in FIG. 11 correspond to likelihood distributions 542 and 543 shown in FIG. 10, respectively. be.
  • the average likelihood (%) in the statistical results 561, 562, and 563 is an index for verifying the average detection accuracy and detection performance with respect to fluctuations depending on the position in the screen, and the higher the value, the more the model learning dictionary 320 is included. It can be considered that the object detection model 300 has high performance. Further, the standard deviation (%) of the likelihood is an index indicating the dispersion of the likelihood with respect to fluctuations depending on the position in the screen, and it can be considered that the smaller the standard deviation (%), the higher the stability of the object detection model 300 including the model learning dictionary 320. .
  • the standard deviation (%) of the likelihood is large, either there is a problem with the object detection model 300 itself, or the learning of the model learning dictionary 320 for the detected object position on the screen is insufficient. is possible. Furthermore, by checking the likelihood distributions 541, 542, and 543 explained in FIG. 10, it is possible to verify which factor is stronger. Furthermore, by verifying the maximum likelihood (%) and the minimum likelihood (%), it is also possible to determine whether the dispersion of the likelihood is close to a normal distribution. It can be considered that the higher the maximum likelihood (%) and the minimum likelihood (%), the higher the performance of the object detection model 300 including the model learning dictionary 320. On the other hand, if it becomes extremely low, either there is a problem with the object detection model 300 itself, or the learning of the model learning dictionary 320 for the detected object position on the screen is insufficient.
  • this example shows the case where the detection target is one person, but if there are multiple detection targets or there are multiple objects of classes other than people, the likelihood is calculated for each detection target. It may be possible to calculate a degree distribution and its statistical results, an IOU distribution and its statistical results, a class identification distribution and its statistical results.
  • FIGS. 10 and 11 showing the verification results calculated by the performance indexing device 10 in object detection according to the first embodiment of the present invention
  • the model input is resized into three types in which one person exists as the detected object.
  • An example of the verification method used when performing verification, issue analysis, and factor analysis is shown below.
  • An example of the verification method described in this example is the implementation of an electronic circuit as a means of operating the YOLO model 360 in order to miniaturize, save power, and reduce costs of cameras for detecting objects.
  • the image size input to YOLO Model 360 may not be the original recommended size due to limitations in area or power consumption, limitations in memory capacity, or limitations in the performance of arithmetic processors such as the DSP (digital signal processor) installed. This is a verification result assuming a case where the input image size of the YOLO model 360 has to be made smaller than that of the currently used YOLO model 360, and does not always occur with the various recommended variations of the YOLO model 360.
  • the gray or white levels have a strong likelihood in a particular grid-like pattern. It can be confirmed that there is a region where the value is low. Therefore, as explained in FIG. 7A, even when the same object is detected, if the position of the detected object in the image fluctuates, a phenomenon occurs in which the likelihood, which is one of the detection results, varies greatly. Conceivable.
  • the specific grid pattern seen in the likelihood distributions 541 and 542 is characterized by a pattern of about 8 pixels square
  • the specific grid pattern seen in the likelihood distribution 543 is characterized by a pattern of about 16 pixels square.
  • the region can be set to any size in order to detect the position of the object and identify the class (classification) at the same time.
  • Object) 315 In order to calculate the conditional class probability Pr(Classi
  • the likelihood (%) 572 of the conventional method which is the standard size, is 49.27%, which is much lower than the likelihood (%) 571 of the conventional method when the size is reduced by 30%. For this reason, simply checking this result may lead to an erroneous conclusion that the learning of the model learning dictionary 320 for a person of the reference size is insufficient, and unnecessary additional learning may be performed.
  • the likelihood (%) 571 of the conventional method when reduced by 30% is 70.12%, which is considered a passing score, and the fact that additional learning is not performed in the first place reduces the versatility of the model learning dictionary 320. It is also conceivable that the robustness enhancement will be insufficient.
  • the likelihood histograms 551, 552, and 553 indicate at what level the likelihoods of the likelihood distributions 541, 542, and 543 in FIG. 10 exist. It can be considered that the performance is better when the occurrence frequency is concentrated on the right end where the likelihood is high. Also, it can be considered that the less variation there is, the more stable it is. As far as I checked the likelihood histograms 551, 552, and 553, unlike the likelihoods (%) 571, 572, and 573 of the conventional method, the likelihoods (%) are distributed in descending order of person size. I understand that.
  • the statistical results 561, 562, and 563 which are the results of statistical analysis of the likelihood distributions 541, 542, and 543 and the likelihood histograms 551, 552, and 553 shown in FIG.
  • the average likelihood (%) becomes closer to the original ideal as the person size increases, 60.85% ⁇ 71. It can be seen that the rates increase in the order of 82% ⁇ 89.98%. Therefore, the results of likelihood (%) 571, 572, and 573 of the conventional method have the problem that the detection results are blurred due to fluctuations in the position of the person in the image depending on the specific grid pattern. was confirmed.
  • the development goal of the model learning dictionary 320 was to increase the average likelihood (%) to 70% or more, for example.
  • the average likelihood (%) threshold to 70%, in the case of a size reduced by 30%, the likelihood (%) result of the conventional method seemed to be achieved by chance.
  • it is less than the threshold which is more than 9% short, so it becomes possible to find out that reinforcement through additional learning is necessary for the person who has been reduced by 30%.
  • maximum likelihood (%) and minimum likelihood (%) can be used as materials for various judgments.
  • the minimum likelihood threshold is set to 30%, the standard size that will be 30% or less Regarding a person and a person reduced by 30%, if the object position stops at that position, there is a risk that it may become undetectable, so it is also possible to extract these potential issues and problems in advance.
  • a model input image 526 of 128 pixels in the horizontal direction and 128 pixels in the vertical direction in which a person is located far away (at the top of the screen) is used as a reference image, and the position of the model preprocessing means 200 is shifted.
  • the results of calculating the likelihood distribution 544 by the robustness verification means 500 including the probability statistical calculation means 520 and the learning reinforcement necessary item extraction means 530 are shown.
  • the likelihood distribution 544 changes from white (equivalent to 0% likelihood) according to the level of likelihood (%) for fluctuations in the position (in pixels) of the person on the screen, according to the gray scale bar 521 from white to black. It is displayed in shades of black (likelihood 100%).
  • the upper side of the likelihood distribution 544 that is, the area 527 surrounded by the dotted line, is an area where the white level is stronger and the likelihood is lower than other areas. I understand that there is something.
  • an area 527 surrounded by a dotted line can be considered to indicate a case where a person exists in an area 528 surrounded by a dotted line that extends to the lower right side of the center of the person in the model input image 526.
  • the above-mentioned specific grid pattern can also be observed, but the area 527 surrounded by the dotted line has a particularly low likelihood, so the learning reinforcement necessary item extraction means 530 concentrates it and reduces the likelihood. If it is a region, it can be extracted. Therefore, if the person in the model input image is located in the area 528 surrounded by the dotted line, it can be confirmed that the object detection ability is low, and it can be realized that the model learning dictionary 320 needs to be strengthened. . Therefore, the model learning dictionary 320 can be efficiently strengthened by the dictionary learning means 600 of Embodiment 2, which will be described later. Leads to reinforcement.
  • the verification method in this example shows the case where the detection target is one person, but if there are multiple detection targets or if there are multiple objects of classes other than people, the verification method for each detection target is Then, a model learning dictionary is created based on the detected objects and judgment conditions extracted using the learning reinforcement necessary item extraction means 530 for the likelihood distribution and its statistical results, the IOU distribution and its statistical results, and the class identification distribution and its statistical results.
  • 320 reinforcement targets may be specified.
  • problems for the object detection model 300 may be extracted.
  • the versatility and robustness of the model learning dictionary 320 may be enhanced by a dictionary learning means 600, which will be described later, with reference to the extracted information 531.
  • the object detection model 300 may be applied to a DNN model such as the same one-stage SDD.
  • the present invention may be applied to a two-stage DNN model such as EfficientDet, which processes object position detection and class identification in two stages.
  • it may be applied to object detection models and machine learning models that do not use neural networks.
  • Performance indexing device 10 in object detection of the present invention using the image processing means 100, model pre-processing means 200, object detection model 300, model post-processing means 400, and robustness verification means 500 described in Embodiment 1 so far As a result, the following usefulness and effects can be expected.
  • the object detection model is created by checking the likelihood distribution 540 for the position of each detected object with respect to the plurality of model input images 210 processed by the position shift function 220 of the model preprocessing means 200.
  • This makes it possible to extract features whose likelihoods fluctuate due to fluctuations in the position of the detected object on the screen due to the potential problems that the neural network itself including the DNN model in the object detection model has. This makes it possible to accurately identify issues related to accuracy and performance during inference.
  • it is possible to effectively formulate methods and methods for solving problems it is possible to improve the detection accuracy and detection performance of the object detection model.
  • the robustness verification means 500 further calculates a likelihood distribution 540 indicating the dispersion due to the position shift of each detected object, an average likelihood 501 that is the average value of the valid area of the likelihood, and a likelihood a histogram 550, a standard deviation of likelihood 502 which is the standard deviation of the valid area of likelihood, a maximum likelihood 503 which is the maximum value of the valid area of likelihood, and a minimum value which is the minimum value of the valid area of likelihood.
  • the detected object position on the screen can be This makes it possible to extract features whose likelihoods fluctuate due to fluctuations in It becomes possible to extract Furthermore, since methods and methods for solving problems can be formulated more effectively, detection accuracy and detection performance of the object detection model 300 can be further improved. Furthermore, when combined with various machining parameters 510 other than position shift, the DNN model can be used to eliminate weaknesses and enhancement policies in versatility and robustness against various fluctuation conditions caused by the model learning dictionary 320 created by deep learning etc. This makes it possible to accurately understand problems that may exist within the neural network itself. Therefore, it is possible to apply learning image data and supervised data that are effective through deep learning, etc., thereby making it possible to enhance the versatility and robustness of the model learning dictionary 320.
  • the robustness verification means 500 further detects variations due to position shifts for each detected object.
  • IOU distribution and class identification accuracy rate distribution showing the average IOU value, average class identification accuracy rate, histogram of IOU value, histogram of class identification accuracy rate, standard deviation of IOU value, standard of class identification accuracy rate.
  • the DNN model can be used to eliminate weaknesses and enhancement policies in versatility and robustness against various fluctuation conditions caused by the model learning dictionary 320 created by deep learning etc. This makes it possible to accurately understand problems that may exist within the neural network itself. Therefore, it is possible to apply learning image data and supervised data that are more effective in deep learning, etc., thereby making it possible to enhance the versatility and robustness of the model learning dictionary.
  • the model preprocessing means 200 further generates an enlarged or reduced image using L (arbitrary integer) types of arbitrary magnification as various processing parameters 510, and then performs the above-mentioned position shift.
  • the robustness verification means 500 including the probability statistical calculation means 520 calculates the likelihood distribution 540 for the position of each detected object for each L size, the average likelihood 501 of the valid area of the likelihood, and the likelihood distribution 540 for the position of each detected object for each L size. It becomes possible to check the histogram 550, the standard deviation 502 of the likelihood, the maximum likelihood 503, the minimum likelihood 504, and the IOU value 505.
  • the model post-processing means 400 further includes an individual identification means 410, thereby eliminating abnormal data and correcting position information and likelihood information including a detection frame for each detected object to suitable information. Therefore, the likelihood distribution 540 for the position of each detected object, the average likelihood 501 of the effective area of the likelihood, the histogram 550 of the likelihood, the standard deviation of the likelihood 502, the maximum likelihood 503, the minimum likelihood 504, It becomes possible to calculate the IOU value 505 more accurately. Furthermore, it is possible to more accurately check the distribution, histogram, standard deviation, maximum value, and minimum value of IOU values for the position of each detected object, and the distribution, histogram, standard deviation, maximum value, and minimum value of the class identification accuracy rate. It becomes possible. Therefore, it is possible to improve the DNN model and enhance the versatility and robustness of the model learning dictionary 320 more accurately.
  • the model post-processing means uses the individual identification means 410 to eliminate abnormal data. Since the position information including the detection frame and the likelihood information can be corrected to the optimal information for each detected object, the likelihood distribution 540 for the position of each detected object, the average likelihood 501 of the effective area of the likelihood, and the likelihood It becomes possible to accurately calculate the degree histogram 550, the standard deviation 502 of the likelihood, the maximum likelihood 503, the minimum likelihood 504, and the IOU value 505 by comparing them with the correct data.
  • the distribution, histogram, standard deviation, maximum value, and minimum value of the IOU value for each detected object position and the distribution, histogram, standard deviation, maximum value, and minimum value of the class identification accuracy rate are compared with the correct data for accuracy. It is possible to check. Therefore, it is possible to improve the DNN model and enhance the versatility and robustness of the model learning dictionary 320 more accurately.
  • the model post-processing means 400 further associates each output result with the various machining parameters 510 for each detected object and outputs the results to the robustness verification means.
  • the robustness verification means 500 further extracts a position or region that is equal to or less than an arbitrary threshold value in the likelihood distribution 540 for each detected object, and extracts the average likelihood 501 for each of the various processing parameters 510. extraction of detected objects for which the standard deviation 502 of the likelihood is equal to or greater than an arbitrary threshold; extraction of detected objects for which the maximum likelihood 503 is equal to or less than an arbitrary threshold; It has a learning reinforcement necessary item extraction means 530 that includes either or all of the following: extracting a detected object whose likelihood 504 is equal to or less than an arbitrary threshold; and extracting a detected object whose IOU value 505 is equal to or less than an arbitrary threshold.
  • the robustness verification means 500 further extracts a position or region where the IOU distribution is equal to or less than an arbitrary threshold value in the IOU distribution for each detected object, and determines the class identification accuracy rate for each detected object, for each of the various processing parameters 510.
  • the model learning dictionary 320 is created by deep learning or the like based on the position information including the detection frame and the class identification information by having the learning enhancement necessary item extraction means 530 that includes any or all of the object extraction functions. It becomes possible to more accurately understand the weaknesses in generality and robustness against various fluctuation conditions caused by this, as well as the strengthening policy, by separating them from the potential problems of the neural network itself, including the DNN model. Therefore, it is possible to apply learning image data and supervised data that are effective through deep learning, etc., thereby making it possible to enhance the versatility and robustness of the model learning dictionary 320.
  • the probability statistical calculation means 520 of the robustness verification means 500 and the learning reinforcement necessary item extraction means 530 further perform probability statistical calculation based on the likelihood, IOU value, and class identification correct answer rate.
  • the object in the image that serves as the reference for verification is provided. Even if the effective range of the object to be detected is missing depending on the position of the object and the position of the object after processing various processing parameters 510 of the model preprocessing means 200, the performance and characteristics of the object detection model 300 are accurate. It becomes possible to verify the versatility and robustness of the learning dictionary 320. Therefore, it is possible to improve the DNN model with respect to the detected object size and to enhance the versatility and robustness of the model learning dictionary 320.
  • the model preprocessing means 200 when processing the plurality of model input images 210 input to the object detection model 300, the model preprocessing means 200 further includes P (arbitrary integer) types of contrast correction as various processing parameters 510. It may be characterized by generating an image with the brightness level changed to an arbitrary value using a curve or a gradation conversion curve. Furthermore, after changing the gradation, the image is shifted horizontally by N (any integer) times and vertically by M (any integer) times in S (any decimal) pixel steps.
  • the image forming apparatus may include a position shift function 220 that generates a total of N ⁇ M ⁇ P tone-converted and position-shifted model input images 210. Further, it may be provided with a function of cutting out an arbitrary area. Note that when changing the brightness level using a contrast correction curve or a gradation conversion curve, the function may be implemented by being executed by the image processing processor 290.
  • a low brightness level image 261 with a lower level and a gradation conversion curve 266 (P 3) that simulates clear weather conditions with high illuminance, backlighting, overexposure, and a shooting studio illuminated with strong light.
  • a case is shown in which three types of gradation-converted images, such as a high-luminance-level image 263 with a higher luminance level processed as a result of application, are generated.
  • three types of gradation-converted images such as a high-luminance-level image 263 with a higher luminance level processed as a result of application.
  • N ⁇ M position-shifted images are generated in S pixel steps, and a total of 3 ⁇ N ⁇ M multiple model input images 210 are generated. It may also be something that processes.
  • a plurality of model input images 210 processed by the position shift function 220 and tone conversion function 260 of the model preprocessing means 200 as shown in FIGS. 8 and 13 are combined with the object detection model 300 shown in FIG.
  • the processing means 400 calculates the position information 401 including the second detection frame and the second likelihood information 402 for each of the plurality of model input images 210
  • the general purpose of the object detection model 300 is calculated based on the various processing parameters 510. 10 and 11 for P (arbitrary integer) types of contrast correction curves or gradation conversion curves that are various processing parameters 510.
  • the probability statistical calculation means 520 as described above generates a likelihood distribution 540 indicating the variation due to the position shift of one person, an average likelihood 501 which is the average value of the effective area of the likelihood, and a likelihood histogram 550. , the standard deviation of likelihood 502 which is the standard deviation of the valid region of likelihood, the maximum likelihood 503 which is the maximum value of the valid region of likelihood, and the minimum likelihood 504 which is the minimum value of the valid region of likelihood. It may also be something that calculates.
  • the IOU value 505 may be calculated.
  • histogram, standard deviation, maximum value, minimum value of the IOU value for each detected object position for each P (arbitrary integer) type of contrast correction curve or gradation conversion curve, and the distribution of the class identification accuracy rate. , histogram, standard deviation, maximum value, and minimum value may be calculated.
  • the learning reinforcement necessary item extraction means 530 of the robustness verification means 500 described above may be provided.
  • model preprocessing means 200 is equipped with a gradation conversion function 260, it is possible to improve the DNN model for the brightness levels of the detected object and background that change depending on weather conditions, shooting time, and illuminance conditions of the shooting environment, and to use a general-purpose model learning dictionary. This makes it possible to enhance performance and robustness.
  • the model preprocessing means 200 when processing the plurality of model input images 210 input to the object detection model 300, the model preprocessing means 200 further uses Q (arbitrary integer) types of aspect ratios as various processing parameters 510. may be used to generate an image with a changed aspect ratio. After changing the aspect ratio, use a position shift of N (any integer) horizontally and M (any integer) vertically in S (any decimal) pixel steps. , a position shifting function 220 that generates a model input image 210 with a total of N ⁇ M ⁇ Q aspect ratio changes and position shifts. Further, it may be provided with a function of cutting out an arbitrary area. Note that when changing the aspect ratio using Q types of aspect ratios, the function may be realized by executing the affine transformation function 291 or the projective transformation function 292 in the image processing processor 290.
  • N ⁇ M position-shifted images are generated in S pixel steps, and a total of 3 ⁇ N ⁇ M multiple model input images 210 are generated. It may also be something that processes.
  • a plurality of model input images 210 processed by the position shift function 220 and aspect ratio change function 250 of the model preprocessing means 200 as shown in FIGS. 8 and 14 are combined with the object detection model 300 shown in FIG.
  • the processing means 400 calculates the position information 401 including the second detection frame and the second likelihood information 402 for each of the plurality of model input images 210
  • the general purpose of the object detection model 300 is calculated based on the various processing parameters 510.
  • the input data is input to the robustness verification means 500 that verifies the stability and robustness, and the probability statistical calculations as explained in FIGS.
  • the means 520 generates a likelihood distribution 540 showing the variation due to the position shift of one person, an average likelihood 501 which is the average value of the effective area of the likelihood, a histogram 550 of the likelihood, and an effective area of the likelihood. Even if it calculates the standard deviation of likelihood 502 which is the standard deviation, the maximum likelihood 503 which is the maximum value of the valid area of likelihood, and the minimum likelihood 504 which is the minimum value of the valid area of likelihood. good.
  • the IOU value 505 may be calculated.
  • the distribution, histogram, standard deviation, maximum value, and minimum value of IOU values for the position of each detected object for each aspect ratio of Q (arbitrary integer) types, and the distribution, histogram, standard deviation, and maximum value of the class identification accuracy rate. , the minimum value may be calculated.
  • the learning reinforcement necessary item extraction means 530 of the robustness verification means 500 described above may be provided.
  • the model preprocessing means 200 further processes R (arbitrary integer) types of angles as various processing parameters 510 when processing the plurality of model input images 210 input to the object detection model 300. It may also be characterized in that it is used to generate an image with a changed rotation angle. After changing the rotation angle, the image is moved in S (any decimal) pixel steps using a position shift of N (any integer) horizontally and M (any integer) vertically. , a total of N ⁇ M ⁇ R rotation angle changes and a position shift function 220 that generates a position-shifted model input image 210 may be provided. Further, it may be provided with a function of cutting out an arbitrary area. Note that when changing the rotation angle using R types of angles, the function may be realized by executing the affine transformation function 291 or the projective transformation function 292 in the image processing processor 290.
  • N ⁇ M position-shifted images are generated in S pixel steps, and a total of 3 ⁇ N ⁇ M multiple model input images 210 are generated. It may also be something that processes.
  • a plurality of model input images 210 processed by the position shift function 220 and rotation function 240 of the model preprocessing means 200 as shown in FIGS. 8 and 15 are combined with the object detection model 300 shown in FIG. 400, the position information 401 including the second detection frame and the second likelihood information 402 are calculated for each of the plurality of model input images 210, and then the versatility and the object detection model 300 are calculated based on various processing parameters 510.
  • the robustness is input to the robustness verification means 500 for verifying robustness, and the probability statistical calculation means 520 as explained in FIGS.
  • a likelihood distribution 540 showing the variation due to the position shift of one person, an average likelihood 501 which is the average value of the effective area of the likelihood, a histogram 550 of the likelihood, and a standard deviation of the effective area of the likelihood. It may be possible to calculate a standard deviation 502 of a certain likelihood, a maximum likelihood 503 which is the maximum value of the valid region of likelihood, and a minimum likelihood 504 which is the minimum value of the valid region of likelihood.
  • the IOU value 505 may be calculated.
  • the distribution, histogram, standard deviation, maximum value, and minimum value of IOU values for the position of each detected object for each R (arbitrary integer) type of angle, and the distribution, histogram, standard deviation, maximum value, and class identification accuracy rate, The minimum value may be calculated.
  • the learning reinforcement necessary item extraction means 530 of the robustness verification means 500 described above may be provided.
  • model preprocessing means 200 By equipping the model preprocessing means 200 with the rotation function 240, it becomes possible to improve the DNN model for various rotation angles of the detected object and to enhance the versatility and robustness of the model learning dictionary.
  • the model preprocessing means 200 processes the plurality of model input images 210 inputted to the object detection model 300 by processing 281 to 281 in FIGS. 8, 9, 14, and 15. Calculate the average brightness level of the valid image in the blank space where no valid image exists due to the position shift process, resize process, aspect ratio change process, or rotation process shown in 288, and make the average brightness level uniform. It may also include a margin padding function 280 for pasting to generate an image.
  • the blank space may be interpolated using the effective image area existing in the output image of the image processing means 100.
  • the blank space may be filled with images that do not affect learning or inference.
  • the model preprocessing means 200 is equipped with the margin padding function 280, the influence of features including margins on the inference accuracy of the object detection model 300 can be reduced. It becomes possible to more accurately calculate the average likelihood 501 of the effective region of likelihood, the histogram 550 of likelihood, the standard deviation 502 of likelihood, the maximum likelihood 503, the minimum likelihood 504, and the IOU value 505. Furthermore, it is possible to more accurately check the distribution, histogram, standard deviation, maximum value, and minimum value of IOU values for the position of each detected object, and the distribution, histogram, standard deviation, maximum value, and minimum value of the class identification accuracy rate. It becomes possible. Therefore, it is possible to more accurately improve the DNN model and strengthen the versatility and robustness of the model learning dictionary.
  • the various processing parameters 510 used for processing by the model preprocessing means 200 include a resizing function 230 involving the position shift function 220 described above, a rotation function 240, an aspect ratio changing function 250, and a gradation function. It is also possible to perform a plurality of processes in which the conversion functions 260 are intertwined with each other. Further, the probability statistical calculation means 520 and the learning reinforcement necessary item extraction means 530 of the robustness verification means 500 may be used as a method for analyzing the interdependence of the plurality of various processing parameters 510. Furthermore, although the explanation was omitted in the first embodiment, various types of It becomes possible to improve the DNN model with respect to distortion of the detected object and background, and to enhance the versatility and robustness of the model learning dictionary.
  • the detection performance, detection accuracy, and versatility and robustness of the object detection model 300 and model learning dictionary 320, including variations and imperfections, are evaluated. Based on the results of verifying the problem, we will improve the object detection model 300 and repeatedly perform deep learning using the dictionary learning means 600 described later in Embodiment 2 to improve the object detection model 300 and solve and strengthen it. It becomes possible to realize object detection that is highly versatile and robust even under various conditions.
  • FIG. 16 is a block diagram showing a performance indexing device 20 for object detection in an image according to Embodiment 2 of the present invention.
  • Each means, each function, each process, each step, and each device, each method, each program, etc. for realizing them are the same as those in Embodiment 1, so they will be explained in the text of Embodiment 2. omitted.
  • each means, each function, each process, each step, each apparatus, each method, each program, etc. of the other embodiments described in Embodiment 1 may be used and implemented.
  • each means, each function, and each process described in Embodiment 2 of the present invention described later may be replaced with a step, and each device may be replaced with a method.
  • each means and each device described in Embodiment 2 of the present invention may be realized by a program operated by a computer.
  • dictionary learning means 600 which is deep learning for creating the model learning dictionary 320, which is one of the components of the object detection model 300, will be described.
  • learning material data that is considered appropriate for the purpose of use is extracted from the learning material database storage means 610 in which material data (image data) for deep learning is stored.
  • the material data for learning stored in the learning material database storage means 610 may be one that utilizes a large-scale open source dataset such as COCO (Common Object in Context) or Pascal VOC Dataset.
  • COCO Common Object in Context
  • Pascal VOC Dataset a large-scale open source dataset
  • image processing means 100 displays a necessary image according to the purpose of use using image output control means 110, and image data stored in data storage means 120 is utilized.
  • the annotation unit 620 adds class identification information and groundtruth BBox, which is a correct answer frame, to the learning material data extracted from the learning material database storage unit 610 to create supervised data.
  • open source datasets such as COCO and Pascal VOC Dataset may be used directly as supervised data without using the annotation means 620 if the data has already been annotated.
  • the supervised data is augmented by an Augment means 630 as a learning image 631 to enhance versatility and robustness.
  • the Augment means 630 is, for example, a means for shifting an image to an arbitrary position in the horizontal and vertical directions, a means for enlarging or reducing an image to an arbitrary magnification, a means for rotating an image to an arbitrary angle, and a means for changing the aspect ratio. It is equipped with a means for changing the ratio and a dewarping means for performing distortion correction, cylindrical conversion, etc., and the image is padded by combining various means depending on the purpose of use.
  • the training image 631 padded by the Augment means 630 is input to the deep learning means 640 to calculate the weighting coefficients of the DNN model 310, and the calculated weighting coefficients are converted into the ONNX format for example.
  • a learning dictionary 320 is created. Note that the model learning dictionary 320 may be created by converting into a format other than ONNX format.
  • the deep learning means 640 is realized by an open source learning environment called darknet and an arithmetic processor (including a personal computer and a supercomputer).
  • the darknet has learning parameters called hyperparameters, and it is possible to set appropriate hyperparameters depending on the usage and purpose, and to strengthen versatility and robustness in conjunction with the augment means 630. is also possible.
  • the model learning dictionary 320 created by the deep learning means 640 may be configured by an electronic circuit.
  • a learning environment configured using a programming language may be used depending on the DNN model 310 to be applied.
  • Validation material data for verifying detection accuracy, detection performance, versatility, and robustness required for the purpose of use is extracted from the aforementioned learning material database storage means 610.
  • the image data for validation stored in the learning material database storage means 610 is obtained by utilizing a large-scale open source validation image dataset such as COCO (Common Object in Context) or Pascal VOC Dataset. Anything is fine.
  • images for verifying the detection accuracy, detection performance, versatility, and robustness necessary for the purpose of use are displayed and sent to the data storage means 120 from the image processing means 100 using the image output control means 110, for example. Stored image data may also be used.
  • the annotation unit 620 adds class identification information and groundtruth BBox, which is a correct answer frame, to the validation material data extracted from the learning material database storage unit 610 to create validation data 623.
  • open source datasets such as COCO and Pascal VOC Dataset may be used directly as validation data 623 without using the annotation means 620 if the data has already been annotated.
  • the validation data 623 is transferred to a second mAP calculation means 650 equipped with a model post-processing means 400 having inference (prediction) ability equivalent to that of the object detection model 300 and the individual identification means 410 described in the first embodiment.
  • IOU value 653 by comparing groundtruth BBox, which is the correct answer frame, and PredictedBBox (predicted BBox) calculated as a result of inference (prediction), and calculation of all prediction results for all validation data 623.
  • Calculation of Precision 654 which indicates the percentage of the IOU values 653 that were correctly predicted above an arbitrary threshold value, and among the actual correct results, the IOU value 653 was above an arbitrary threshold value and the BBox in a position close to the correct result was predicted.
  • Recall 655 indicating the ratio, AP (Average Precision) value 651 for each class as an index for comparing the accuracy and performance of object detection mentioned above, and mAP (mean Average Precision) value 652 averaged over all classes. It may also be something that calculates.
  • AP Average Precision
  • mAP mean Average Precision
  • the second mAP calculation means 650 uses an open source inference environment called darknet and an arithmetic processor (personal computer or super It is desirable that the object detection model 300 has the same inference (prediction) performance as the object detection model 300.
  • the IOU value is 653, Precision 654, Recall 655, AP value 651, and mAP value 652 calculation means may be provided.
  • the individual identification means 410 of the model post-processing means 400 as described in FIGS. 6A and 6B of the first embodiment is provided with the second mAP calculation means of the second embodiment, so that abnormal data can be Since the position information including the detection frame and the likelihood information can be corrected to the optimal information for each exclusion and detection object, the likelihood distribution 540 for the position of each detection object and the average likelihood 501 of the effective area of the likelihood It becomes possible to accurately calculate the likelihood histogram 550, the standard deviation 502 of the likelihood, the maximum likelihood 503, the minimum likelihood 504, and the IOU value 505 by comparing them with the correct data. Therefore, it is possible to more accurately improve the DNN model and strengthen the versatility and robustness of the model learning dictionary.
  • the robustness verification means 500 extracts a position or region where the likelihood distribution for each detected object is below an arbitrary threshold value, and extracts a position or region where the average likelihood 501 is below an arbitrary threshold value, for each of the various processing parameters 510.
  • the distribution, histogram, standard deviation, maximum value, and minimum value of the IOU value for the position of each detected object, and the distribution, histogram, standard deviation, maximum value, and minimum value of the class identification accuracy rate are calculated using arbitrary thresholds.
  • a learning image is prepared based on the result of the learning reinforcement required item extraction means 530, and a built-in or external dictionary is used. It may be characterized by re-learning by the learning means 600.
  • various processing parameters 510 other than position shift in an arbitrary range near the detected object position such as left, right, top, bottom, and depth of the object in the screen, object size, contrast, gradation, aspect ratio, rotation, etc.
  • the neural network itself including the DNN model, will be able to overcome weaknesses in versatility and robustness against various fluctuating conditions and enhancement policies caused by the model learning dictionary 320 created by deep learning etc. It will be possible to separate them from potential issues and accurately understand them. Therefore, it is possible to apply learning image data and supervised data that are effective through deep learning, etc., thereby making it possible to enhance the versatility and robustness of the model learning dictionary 320.
  • probability statistical calculation means 520 calculates the distribution, histogram, standard deviation, maximum value, and minimum value of IOU values for the position of each detected object, and the distribution, histogram, standard deviation, maximum value, and minimum value of the class identification accuracy rate.
  • the detection performance, detection accuracy, and versatility and robustness of the object detection model 300 and model learning dictionary 320, including variations and imperfections, are evaluated. Based on the results of verifying the problem, the object detection model 300 is improved, and the dictionary learning means 600 repeatedly performs deep learning to solve and strengthen the object detection model, resulting in higher detection ability and a general-purpose model that can be used under various fluctuating conditions. This makes it possible to realize object detection with high performance and robustness.
  • FIG. 18 is a diagram illustrating a summary of the object detection model performance indexing device of the present invention.
  • the performance indexing device, method, and program for the object detection model of the present invention apply to image data generated by an image processing means that acquires an image including a detection target and processes it appropriately.
  • the robustness verification means calculates the average likelihood with respect to the object position fluctuation for each of the various processing parameters Performance indicators such as degree and standard deviation of likelihood are calculated. Furthermore, based on the results of the performance indexing, the dictionary learning means performs robust reinforcement of the model learning dictionary.
  • each component is configured with dedicated hardware, but it may also be realized by executing a software program suitable for each component.
  • Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.
  • the software that implements the performance indexing device and the like of the above embodiment is the following program.
  • this program is a program that causes a computer to execute a performance indexing method.
  • the present invention is useful in the technical field of identifying the position and class of an object in an image using an object detection model. Among these, it is particularly useful in the technical field of reducing the size, power consumption, and cost of cameras and the like for detecting objects.
  • Second performance indexing device 100 Image processing means 101 Lens 102 Image sensor 103, 290 Image processing processor 110 Image output control means 120 Display and data storage means 200 Model preprocessing means 201, 202, 203, 204, 205, 206, 210, 221, 222, 223, 224, 231, 232, 233, 241, 242, 243, 251, 252, 253, 261, 262, 263, 311, 440, 470, 526 Model input images 207, 208, 209, 211, 212, 213, 401, 451, 452, 490, 491 Position information including second detection frame 214, 215, 216, 217, 218, 219, 453, 454, 492, 493 Likelihood in second likelihood information 220 Position shift function 230 Resize function 240 Rotation function 250 Aspect ratio change function 260 Tone conversion function 264, 265, 266 Tone conversion curve 270 Dewarp function 280 Margin padding function 281, 282, 283, 284, 285, 286, 287

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)

Abstract

Ce dispositif d'indexation de performance comprend : un moyen de traitement d'image qui acquiert et traite une image ; un moyen de prétraitement de modèle qui traite une image acquise en une pluralité d'images conformément à divers paramètres de traitement ; un modèle de détection d'objet qui comprend un dictionnaire d'apprentissage de modèle pour déduire une position d'objet et une probabilité par rapport à l'entrée de la pluralité d'images traitées ; un moyen de post-traitement de modèle qui, pour chaque objet de détection de la pluralité d'images, corrige des informations de position qui comprennent une première trame de détection, et des premières informations de probabilité, de façon à obtenir des informations de position qui comprennent une seconde trame de détection, et des secondes informations de probabilité, sur la base d'un résultat d'inférence du modèle de détection d'objet ; et un moyen de vérification de robustesse pour vérifier la robustesse du modèle de détection d'objet, sur la base des divers paramètres de traitement, et des informations de position qui comprennent la seconde trame de détection et les secondes informations de probabilité qui sont la sortie du moyen de post-traitement de modèle.
PCT/JP2023/012736 2022-03-31 2023-03-29 Dispositif d'indexation de performance, procédé d'indexation de performance et programme WO2023190644A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022-059640 2022-03-31
JP2022059640 2022-03-31

Publications (1)

Publication Number Publication Date
WO2023190644A1 true WO2023190644A1 (fr) 2023-10-05

Family

ID=88202015

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/012736 WO2023190644A1 (fr) 2022-03-31 2023-03-29 Dispositif d'indexation de performance, procédé d'indexation de performance et programme

Country Status (1)

Country Link
WO (1) WO2023190644A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002208011A (ja) * 2001-01-12 2002-07-26 Fujitsu Ltd 画像照合処理システムおよび画像照合方法
CN113255526A (zh) * 2021-05-28 2021-08-13 华中科技大学 基于动量的对人群计数模型的对抗样本生成方法及系统
JP2021162892A (ja) * 2020-03-30 2021-10-11 株式会社日立製作所 評価装置、評価方法及び記憶媒体

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002208011A (ja) * 2001-01-12 2002-07-26 Fujitsu Ltd 画像照合処理システムおよび画像照合方法
JP2021162892A (ja) * 2020-03-30 2021-10-11 株式会社日立製作所 評価装置、評価方法及び記憶媒体
CN113255526A (zh) * 2021-05-28 2021-08-13 华中科技大学 基于动量的对人群计数模型的对抗样本生成方法及系统

Similar Documents

Publication Publication Date Title
US11798132B2 (en) Image inpainting method and apparatus, computer device, and storage medium
CN112967243B (zh) 一种基于yolo的深度学习芯片封装裂纹缺陷检测方法
Kundu et al. No-reference quality assessment of tone-mapped HDR pictures
CN110992238B (zh) 一种基于双通道网络的数字图像篡改盲检测方法
CN109753878B (zh) 一种恶劣天气下的成像识别方法及系统
CN110956126A (zh) 一种联合超分辨率重建的小目标检测方法
CN109191424B (zh) 一种乳腺肿块检测与分类系统、计算机可读存储介质
CN110751195B (zh) 一种基于改进YOLOv3的细粒度图像分类方法
CN111242026B (zh) 一种基于空间层次感知模块和度量学习的遥感图像目标检测方法
CN111768415A (zh) 一种无量化池化的图像实例分割方法
CN110807362A (zh) 一种图像检测方法、装置和计算机可读存储介质
CN111209858A (zh) 一种基于深度卷积神经网络的实时车牌检测方法
CN115830004A (zh) 表面缺陷检测方法、装置、计算机设备和存储介质
CN111524113A (zh) 提升链异常识别方法、系统、设备及介质
CN111597845A (zh) 一种二维码检测方法、装置、设备及可读存储介质
CN113850761B (zh) 一种基于多角度检测框的遥感图像目标检测方法
CN113516697B (zh) 图像配准的方法、装置、电子设备及计算机可读存储介质
CN112991236B (zh) 一种基于模板的图像增强方法及装置
WO2023190644A1 (fr) Dispositif d'indexation de performance, procédé d'indexation de performance et programme
CN115861922B (zh) 一种稀疏烟火检测方法、装置、计算机设备及存储介质
CN114372941B (zh) 一种低光照图像增强方法、装置、设备及介质
CN111126187A (zh) 火情检测方法、系统、电子设备及存储介质
CN113256528B (zh) 基于多尺度级联深度残差网络的低照度视频增强方法
CN112926500B (zh) 一种结合头部和整体信息的行人检测方法
CN113240611A (zh) 一种基于图片序列的异物检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23780653

Country of ref document: EP

Kind code of ref document: A1