WO2024085114A1 - 画像分類学習装置、画像分類学習方法、画像分類学習プログラムおよび画像分類学習済モデル - Google Patents

画像分類学習装置、画像分類学習方法、画像分類学習プログラムおよび画像分類学習済モデル Download PDF

Info

Publication number
WO2024085114A1
WO2024085114A1 PCT/JP2023/037394 JP2023037394W WO2024085114A1 WO 2024085114 A1 WO2024085114 A1 WO 2024085114A1 JP 2023037394 W JP2023037394 W JP 2023037394W WO 2024085114 A1 WO2024085114 A1 WO 2024085114A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
classification
learning
image data
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2023/037394
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
修一 鶴田
悠太 中島
良知 李
博文 王
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Osaka NUC
Original Assignee
Osaka University NUC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Osaka University NUC filed Critical Osaka University NUC
Priority to JP2024551795A priority Critical patent/JPWO2024085114A1/ja
Publication of WO2024085114A1 publication Critical patent/WO2024085114A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present invention relates to an image classification learning device, an image classification learning method, an image classification learning program, and an image classification trained model that can help humans understand the process of performing image classification.
  • XAI explainable artificial intelligence
  • Non-Patent Document 1 understanding the behavior of neural networks is a big challenge, especially for medical applications (see Non-Patent Document 1) and for identifying biases in neural networks (see Non-Patent Document 2).
  • Non-Patent Document 2 For this reason, a lot of research effort has been devoted to providing post-hoc explanations of artificial intelligence models after they have been generated by machine learning (see Non-Patent Document 3).
  • This kind of explanation successfully provides a low-level (or pixel-by-pixel) relationship between the image and the model's judgment by highlighting some regions in the image as a heat map, but the interpretation of these relationships remains a problem.
  • an attention matrix (attention weights) is generated from the similarity between a query (Q) generated from a weight matrix generation model consisting of multiple weight vector columns (slots) and a key (K) representing the image features, and this attention matrix is used to extract areas from the image that are used for object detection.
  • This representation of image features in “object detection” is also called “object-centric representation” (see non-patent document 5).
  • a simple way to predefine concepts is to use human knowledge (see non-patent document 7).
  • Other methods use a manually created set of concepts and quantify the importance of each concept to the decision using directional derivatives, while the Broden dataset unifies several densely labeled image datasets to provide a large concept corpus that can be used to directly and automatically match Convolutional Neural Network (CNN) representations with labeled interpretations (see non-patent document 8).
  • CNN Convolutional Neural Network
  • SENN Self-Explaining Neural Networks
  • Alvarez-Melis et al. utilizes the bottleneck of concepts and treats the activation of concepts as input to a regression model (see Non-Patent Document 9). (Application of image recognition processing to determine the soundness of concrete structures, etc.)
  • Non-Patent Document 10 the Ministry of Land, Infrastructure, Transport and Tourism's Guidelines for Periodic Bridge Inspection (see Non-Patent Document 10) states that the damage level of concrete walls is classified based on the crack width, whether the cracks are in a lattice pattern, and the occurrence of water leakage and free lime.
  • Inspection of concrete structures requires close visual inspection by technicians with specialized knowledge, and this is done based on a comprehensive judgment that takes into account various aspects such as the state of deterioration, type, location, and traffic volume. In other words, judging the soundness of concrete structures relies heavily on the know-how (tacit knowledge) of experienced technicians, which cannot be put into a manual.
  • Patent Document 1 discloses a configuration in which cracks are detected as deformed areas using a feature map created with a CNN (Convolutional Neural Network) and the crack width is determined as attribute information of the deformed area.
  • CNN Convolutional Neural Network
  • Patent Document 2 also discloses a configuration that uses deep learning to provide a performance evaluation system for concrete structures that makes it possible to efficiently carry out a series of maintenance and management tasks, from inputting deformations to performance inspections.
  • the deep learning unit performs machine learning using artificial intelligence based on the discrepancy between the results automatically calculated by the performance evaluation system, which are accumulated for each inspection, and the results corrected by the inspector.
  • the configuration disclosed shows that the results of the machine learning are then reflected in subsequent judgments and predictions.
  • Patent Document 3 discloses the following technology:
  • Patent Document 3 makes it possible to determine the condition of a wide area based on a local area and a wide area, and to determine the degree of damage to the concrete wall surface of an infrastructure structure.
  • concept learning is guided by a learning process using an autoencoder structure to reconstruct the original image. It is not yet clear whether such a structure can be applied to learning from natural images.
  • the present invention has been made to solve the above problems, and aims to provide an image classification learning device, an image classification learning method, an image classification learning program, and an image classification trained model that can learn the "concepts" that a trained model uses to make judgments through learning on a given task, so that the judgment process can be compared with that of a human.
  • the present invention also aims to provide an image classification learning device, an image classification learning method, an image classification learning program, and an image classification trained model that can assist or replace the judgment of the soundness of concrete structures by using a trained model of artificial intelligence.
  • an image classification learning device includes a storage device for storing learning data including a plurality of image data and image labels corresponding to the image data, and a calculation processing means for reading out the learning data stored in the storage device and executing a process of machine learning a plurality of concepts in the image data for classifying the image data.
  • the calculation processing means includes an image recognition means for extracting a set of features expressing the image data, learning and generating a classification model that identifies and classifies image labels for the image data based on the extracted set of features, and storing the classification model in the storage device; an attention mechanism processing means for converting a slot vector according to an image feature defined by a slot vector in a concept matrix that corresponds to each of the plurality of concepts and defines an image region in which a feature valued in the classification processing of the image recognition means appears, and storing the slot vector in the storage device; a loss evaluation means for calculating a loss based on a classification loss that is calculated by evaluating the classification rate of the image recognition means and decreases as the classification rate increases, and a separation loss that is calculated by evaluating the degree to which features corresponding to the plurality of concepts are separated from each other in the feature space and decreases as the degree of separation increases; and a learning processing means for executing machine learning on the classification model and concept matrix stored in the storage device so as to reduce the loss.
  • the attention mechanism processing means includes an attention matrix learning means for learning an attention matrix for extracting image regions in the set of features to which attention is directed in the classification process of the image classification means in accordance with the degree of similarity with the concept matrix
  • the image classification means includes a concept occurrence calculation means for generating an activity vector based on the attention matrix, the activity vector having elements representing the degree to which each concept corresponding to the slot vector appears in the image data, and a classifier for performing classification of image labels using the activity vector corresponding to the image data as input.
  • the set of features representing the image data is a feature map output from a convolutional neural network image recognition model.
  • the segregation loss includes a consistency loss, which is a loss that decreases as a single concept occupies a smaller volume in feature space, and a discrimination loss, which is a loss that decreases as pairs of concepts become less likely to occupy the same region in feature space.
  • the image data is data of images of the surfaces of multiple concrete structures captured by a camera
  • the image labels are labels indicating the soundness of the concrete structures that each correspond to the image data.
  • an image classification learning method in which a computer learns multiple concepts in image data for classifying image data based on learning data including multiple image data and image labels corresponding to the image data, the computer including a storage device for storing the learning data and a calculation device for executing a machine learning process, the computer including a step of extracting a set of features expressing the image data, and learning and generating a classification model that identifies and classifies image labels for the image data based on the extracted set of features, a step of converting a slot vector in a concept matrix including slot vectors that correspond to each of the multiple concepts and define an image region in which a feature value that is emphasized in the classification process appears, according to an image feature defined by the slot vector, a step of calculating a loss based on a classification loss calculated by evaluating a classification rate in the classification of image data and decreasing as the classification rate increases, and a separation loss calculated by evaluating the degree to which features corresponding to the multiple concepts are separated from each other
  • the image data is data of images of the surfaces of multiple concrete structures captured by a camera
  • the image labels are labels indicating the soundness of the concrete structures that each correspond to the image data.
  • an image classification learning program for machine learning a plurality of concepts in image data for classifying image data, based on learning data including a plurality of image data and image labels corresponding to the image data, by a computer.
  • the computer includes a calculation device and a storage device, and includes the steps of: for image data stored in the storage device, the calculation device extracts a set of features expressing the image data, and learns and generates a classification model that identifies and classifies image labels for the image data based on the extracted set of features; the calculation device converts slot vectors in a concept matrix including slot vectors that correspond to each of the plurality of concepts and define image regions in which features that are emphasized in the classification process appear, according to image features defined by the slot vectors; the calculation device calculates a loss based on a classification loss calculated by evaluating a classification rate in the classification of image data and that decreases as the classification rate increases, and a separation loss calculated by evaluating the degree to which features corresponding to the plurality of concepts are separated from each other in a feature space and that decreases as the degree of separation increases; and the calculation device learns the classification model and the concept matrix so as to reduce the loss.
  • the computer-readable non-transitory recording medium stores an image classification learning program.
  • an image classification trained model generated by an image classification learning method that machine-learns a plurality of concepts in image data for classifying image data based on training data including a plurality of image data and image labels corresponding to the image data, the image classification trained model having a configuration of a classifier model that uses as input an activation vector whose elements are the degree to which each of the concepts appears in the image data and classifies the image data based on the co-occurrence relationship of the elements, the image classification trained model includes a step of extracting a set of features that express the image data, and updating by learning a classifier model that identifies and classifies an image label for the image data based on the extracted set of features, and a step of converting a slot vector in a concept matrix composed of slot vectors that correspond to each of the plurality of concepts and define an image area in which a feature that is emphasized in the classification process appears, according to an image feature defined by the slot vector.
  • the method is generated by a step of calculating a loss based on a classification loss calculated by evaluating the classification rate in classifying image data and decreasing as the classification rate increases, and a separation loss calculated by evaluating the degree to which features corresponding to multiple concepts are separated from each other in feature space and decreasing as the degree of separation increases, and a step of training a model and a concept matrix so as to reduce the loss
  • the step of converting the slot vector includes a step of training an attention matrix for extracting image regions in the set of features to which attention is directed in the classification process according to the similarity with the concept matrix
  • the step of updating the classifier model by training includes a step of generating an activation vector based on the attention matrix, the elements of which are the degree to which each of the concepts corresponding to the slot vector appears in the image data, and a step of training parameters of the classifier model to perform classification for image labels using the activation vector corresponding to the image data as an input.
  • an image classification learning device includes a storage device for storing learning data including image data of the surfaces of a plurality of concrete structures captured by a camera and image labels indicating the soundness of the concrete structures corresponding to the image data, and a calculation device for executing a process of machine learning a plurality of concepts in the image data for classifying the image data in terms of soundness based on the learning data stored in the storage device, the calculation device performing an image identification step of extracting a set of features representing the image data, learning and generating a classification model that identifies and classifies image labels for the image data based on the extracted set of features, and storing the image identification step in the storage device;
  • the system executes an attention mechanism processing step in which the slot vectors are converted and stored in a storage device according to the image features defined by the slot vectors in a concept matrix consisting of slot vectors that correspond to each of the concepts and define the image regions in which the features that are emphasized in the classification model's classification process appear; a loss evaluation step in
  • the attention mechanism processing step includes an attention matrix learning step of learning an attention matrix for extracting image regions in the set of features to which attention is directed in the classification model identification process according to the degree of similarity with the concept matrix
  • the image identification step includes a concept occurrence calculation step of generating an activation vector based on the attention matrix, the activation vector having elements representing the degree to which each concept corresponding to the slot vector appears in the image data, and a step of generating a classifier that performs classification on image labels using the activation vector corresponding to the image data as input.
  • the learning process step includes a step of generating a treatment discrimination model that learns to discriminate treatment labels using the activity vector and treatment labels of repair measures corresponding to image data of the surface of the concrete structure as input.
  • an image classification trained model generated by an image classification learning method that machine-learns a plurality of concepts in image data for classifying image data regarding soundness, based on learning data including image data of the surfaces of a plurality of concrete structures and image labels indicating the soundness of the concrete structures corresponding to the image data, the image classification trained model having a configuration of a classifier model that uses as input an activity vector having elements representing the degree to which each of the concepts appears in the image data, and classifies the image data based on a co-occurrence relationship of the elements, the image classification trained model comprising the steps of: extracting a set of features that represent the image data, and updating by learning a classifier model that identifies and classifies an image label for the image data based on the extracted set of features; and in a concept matrix consisting of slot vectors that correspond to each of the plurality of concepts and define image regions in which features that are emphasized in the discrimination process appear, according to the image features defined by the slot vectors,
  • the image classification trained model comprising the steps of:
  • the image classification learning device, image classification learning method, and image classification learning program of the present invention enable humans to understand what image feature regions are used as the basis for classification by a trained model generated by artificial intelligence learning how to classify images.
  • the feature regions of this image are separated to minimize overlap between different classification classes, so even in classification tasks involving natural images, the activity of the feature regions during the separation process can be displayed and visualized in a way that allows comparison with the "concepts" humans use for classification.
  • image classification learning device when the image classification learning device, image classification learning method, and image classification learning program of the present invention are applied to determining the soundness of concrete structures, it becomes possible to make a determination of soundness by utilizing the accumulated judgment know-how of engineers and experts.
  • FIG. 1 is a functional block diagram for explaining the configuration of an image classification learning device 1000 according to a first embodiment.
  • FIG. 2 is a functional block diagram for explaining the configuration of a concept regularization unit 300.
  • FIG. 2 is a conceptual diagram showing the concept of processing by the concept regularizer 300.
  • FIG. 2 is a conceptual diagram showing the concept of processing by the concept regularizer 300.
  • FIG. 2 is a conceptual diagram showing the concept of processing by the concept regularizer 300.
  • FIG. 1 is a block diagram for explaining the hardware configuration of an image classification learning device 1000. 1 is a flowchart for explaining the learning process of image classification learning device 1000.
  • FIG. 4 is a functional block diagram for explaining the configuration of the image classification device 4000 when performing classification processing for a new image.
  • FIG. 4 is a conceptual diagram for explaining the processing performed by the classifier 400.
  • FIG. 13 shows the classification performance of the classifier 400 for CUB200 and ImageNet.
  • FIG. 13 shows the classification performance of the classifier 400 for CUB200 and ImageNet.
  • FIG. 13 is a diagram for explaining the validity of a concept represented by concept activity level t.
  • FIG. 13 is a diagram for explaining the validity of a concept represented by concept activity level t.
  • FIG. 13 is a diagram for explaining the validity of a concept represented by concept activity level t.
  • FIG. 1 is a diagram showing the attention levels of the five most important concepts (based on "importance" to be described later) for an input image of a black bird with a yellow head.
  • FIG. 1 is a diagram showing the attention levels of the five most important concepts (based on "importance" to be described later) for an input image of a black bird with a yellow head.
  • FIG. 13 is a diagram for explaining a concept represented by concept activity level t for a natural image.
  • FIG. 1 is a diagram showing the importance of each concept in the CUB200 dataset.
  • FIG. 13 is a diagram showing the magnitude of each hyperparameter and the accuracy rate, consistency, and discriminability.
  • FIG. 11 is a conceptual diagram for explaining the operation of the concrete soundness classification device of the second embodiment.
  • FIG. 15 is a conceptual diagram showing the configuration of learning data for generating a trained model of artificial intelligence such as that shown in FIG. 14.
  • FIG. 1 is a diagram showing an example of a system configuration for determining the soundness of concrete.
  • FIG. 5 is a functional block diagram showing the configuration of a terminal 500.1. A block diagram for explaining the hardware configuration of terminal 500.1.
  • FIG. 1 is a diagram for explaining the configuration of learning data based on image data, health level labels corresponding to the images, and corrective action labels.
  • FIG. FIG. 13 is a functional block diagram for explaining the configurations of an image classification learning device 1000 and a classification device 4000 according to a third embodiment.
  • the image classification learning device of the present invention will be described as a computer program that is installed on a standalone computer device and executes the image classification learning method.
  • the processing of the image classification learning device may be distributed among multiple computer devices, and the arithmetic device that executes the computer processing may be single or multiple.
  • the processing of the image classification learning device is not limited to a program installed in such a computer device, and may generally be realized as an arithmetic processing device such as a microcomputer that combines an arithmetic device and a storage device, or may be implemented in a dedicated IC circuit, an FPGA (Field-Programmable Gate Array), or other electronic circuit.
  • Embodiment 1 Concept-based image classification
  • the term “concept” refers to a feature region in an "image” in a training dataset that the classifier "attentions” when classifying in machine learning of an image classifier using a neural network, and that is separated to the extent that it satisfies a predetermined condition.
  • the method of "classification based on concepts” is also called “concept-based classification.”
  • predetermined conditions refer to conditions that enable the trained model to learn concepts so that the original image can be reconstructed or identified from the activation vector alone, while making feature values of feature regions (in different images) corresponding to the same concept as similar as possible, and making feature values of feature regions corresponding to different concepts as dissimilar as possible, regardless of the correct label.
  • the image classifier described below is an artificial intelligence learning model that can learn the optimal bottleneck "concept" for the target image classification task in parallel with learning the image classification task based only on the images that are the training data and the labels that indicate the image classes, and will be described below.
  • the model structure (mathematical configuration, parameter configuration) before learning is called the “learned model,” and after the model parameter values are determined by the learning process, it is called the “trained model.”
  • the “trained model” functions as part of a program by being installed on a computer.
  • the "trained model (classifier)" may be recorded as a program or as part of a program on a computer-readable recording medium and installed on a computer other than the one that performed the learning process.
  • Such a “learning model” includes a “(self) attention mechanism” (described later) and makes it possible to identify the areas in which each of the above-mentioned concepts are discovered during the machine learning process. By displaying such “learning images” that share the detected “concept” together, humans can easily understand what each of the learned concepts represents, thereby providing clues for interpreting the classification and judgment processes.
  • the "attention mechanism” has the function of gating the channels of the "feature map” extracted from the "images" of the input learning data, so that a lot of map information that is considered noteworthy passes through, and not much map information that is considered not noteworthy passes through.
  • the following embodiments aim to provide an image classification learning device, an image classification learning method, and an image classification learning program.
  • the "trained model (image classifier)" of the embodiments uses the activation level of each concept as input to characterize and classify images.
  • [Embodiment 1] (Configuration of an image classification learning device that learns concepts)
  • FIG. 1 is a functional block diagram for explaining the configuration of an image classification learning device 1000 according to the first embodiment.
  • the image classification learning device 1000 uses as input learning data consisting of multiple pieces of image data and image labels (indicating the class to be classified) associated with each piece of image data, and generates a trained model for image classification.
  • the image dataset that is the input learning data is as follows:
  • x i is an image
  • y i is an object class in the set ⁇ associated with x i .
  • the image classification learning apparatus 1000 learns a set of k concepts using only the labels of the images.
  • the image classification learning device 1000 includes a convolutional neural network (hereinafter, referred to as a CNN backbone) 100 that serves as a backbone for generating a feature map from input image data, a concept learner 200, a concept regularizer 300, a classifier 400, a quantization error calculator 500, a loss calculator 600 that calculates the amount of loss during learning as described below, and a learning process controller 700 that controls the learning process according to the loss calculated by the loss calculator 600.
  • a convolutional neural network hereinafter, referred to as a CNN backbone
  • a CNN backbone a convolutional neural network
  • a concept learner 200 that serves as a backbone for generating a feature map from input image data
  • a concept regularizer 300 for generating a feature map from input image data
  • a classifier 400 for generating a feature map from input image data
  • a quantization error calculator 500 a loss calculator 600 that calculates the amount of loss during learning as described below
  • a learning process controller 700 that controls the
  • the CNN backbone 100, the concept learner 200, the concept regularizer 300, the classifier 400, the quantization error calculator 500, the loss calculator 600 that calculates the amount of loss during learning as described below, and the learning process control unit 700 that controls the learning process according to the loss calculated by the loss calculator 600 correspond to functions realized by a computing device that operates based on a program, and in this program, for example, each can be configured to be implemented as a program module.
  • the concept learner 200, concept regularizer 300, classifier 400, and quantization error calculator 500 can be configured as modules in separate neural networks, with parameters adjusted by the learning process control unit 700 based on the loss calculated by the loss calculator 600.
  • the CNN backbone 100 can also be included in the learning target, resulting in a so-called "end-to-end" configuration, and the configuration of the neural network/artificial intelligence is not limited to this configuration.
  • the CNN backbone 100 extracts a feature map F, expressed by the following equation, for input image data x.
  • c is the number of channels, or feature maps.
  • the CNN backbone 100 divides the input image into h x w regions, and in each of these regions there is a vector with c elements. This makes F a c x h x w feature map.
  • the concept prototype processing unit 2100 learns the concept matrix W according to a procedure described below, and each column vector of the matrix W is referred to in this specification as a "concept prototype" to be learned.
  • the concept learner 200 generates a concept activity t indicating the presence of each concept, and an image feature V from the region where each concept exists in x.
  • the concept activity t is used as an input to the classifier 400, which learns to calculate a score s indicating the classification result of the image class.
  • the concept activity level t, the image feature amount V, and the score s are as follows.
  • the concept regularization unit 300 receives the concept activity t and the image feature V as input, and in the concept prototype update process, as described below, imposes restrictions for the consistency of individual concepts and the mutual distinguishability between concepts, and also performs supervised self-learning. (Concept Learner 200)
  • the concept learner 200 uses a "slot attention" technique based on a self-attention mechanism to learn "concepts" for the image dataset D that can be retroactively associated with features that serve as the basis for recognition in human visual recognition.
  • the position information encoding unit 2002 executes position embedding (position information encoding) processing by adding position embedding information P to the input feature map F in order to retain spatial information, as follows:
  • positional information encoding is disclosed in, for example, the following document: Published literature: Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, and Thomas Kipf. Object centric learning with slot attention. Proc. NeurIPS, 2020.
  • the feature map F' with embedded position information is processed in the shaping processing unit 2004 to flatten the spatial dimensions.
  • the similarity calculation unit 2010 calculates the dot product similarity between a query Q(W) obtained by nonlinearly processing the concept matrix W representing the concept prototype, which is successively transformed by the concept prototype processing unit 2100, by the nonlinear processing unit 2008, and a key K(F') obtained by nonlinearly transforming the feature map F' by the nonlinear processing unit 2006.
  • the concept prototype (concept matrix) W is not particularly limited, but can be configured to be generated and converted by a GRU (Gated Recurrent Unit), which is a neural network model capable of learning time-series data, as described in the following literature, for example.
  • GRU Gate Recurrent Unit
  • Published literature Liangzhi Li, Bowen Wang, Manisha Verma, Yuta Nakashima, Ryo Kawasaki, and Hajime Nagahara.
  • Scouter Slot attention-based classifier for explainable image recognition.
  • Proc. ICCV pages 1046-1055, 2021.
  • the slot matrix is converted by the GRU using U (t) , which is a weighted sum of feature amounts in the spatial dimension, and the slot matrix at the previous timing.
  • U (t) is a weighted sum of feature amounts in the spatial dimension
  • the concept matrix W in this embodiment can be configured to be converted to the concept matrix W at the next timing by the GRU using an image feature V to be described later and the concept matrix W at the previous timing.
  • the method of converting the concept matrix W is not limited to this method.
  • the normalization unit 2012 calculates an “attention matrix A” as given by the following equation (1).
  • the function ⁇ is the normalization function.
  • This attention matrix A indicates where in the image the k concepts are located, as shown in Figure 7 below.
  • the normalization function ⁇ determines the spatial distribution of each concept, which depends on the target domain of the classification.
  • images in handwritten digit recognition datasets are typically black and white, and only the shapes formed by the strokes are important. In this case, concepts are unlikely to overlap spatially.
  • natural images have color, texture, and shape, which means concepts may overlap at the same spatial location.
  • can be designed as follows:
  • is a sigmoid function
  • the product between ⁇ and a softmax function is the Hadamard product.
  • the softmax function is applied to the concepts (i.e., each column vector; hereafter, in this specification, this column vector will be referred to as a "slot vector") so that different concepts are not detected at the same spatial location.
  • the concept occurrence calculation unit 2030 calculates the concept activity vector t by taking the sum of A in the spatial dimension as shown in the following formula (4).
  • Each element of the concept activity vector indicates whether or not a corresponding concept has appeared, and each element is called concept activity.
  • the shaping processor 2020 performs shaping processing on the feature map F to flatten the spatial dimension, and calculates the following feature map F * .
  • the similarity calculation unit 2040 calculates and extracts image features V from the feature map F * using the following formula:
  • weighting by ⁇ k gives the attention-weighted average of image features across spatial dimensions.
  • the concept activity level t mentioned above is an index showing the existence of each concept, and can be expressed as a binary value.
  • the concept learner may not be able to consistently capture meaningful features.
  • the concept regularization unit 300 therefore executes a concept regularization process so that learning of the "concept" progresses.
  • FIG. 2 is a functional block diagram for explaining the configuration of the concept regularization unit 300.
  • Figures 3A to 3C are conceptual diagrams showing the processing concept of the conceptual regularization unit 300 in Figure 2.
  • the discrimination loss calculation unit 3010 which ensures the individual consistency of concepts, does not want each learned concept to have many variations in order to make it easier for humans to interpret after it has been extracted as a "concept" through the learning process of the concept learner 200.
  • the concept learner 200 performs so-called "mini-batch learning” to randomly select a portion (n pieces) of N pieces of training data and update the parameters.
  • the kth element t k of concept activity t can be used to identify images in a mini-batch that have concept k.
  • the classification loss calculation unit 3010 calculates the "consistency loss" as follows:
  • Image feature vk which is the k-th row vector of image feature V, contains image features from a region corresponding to concept k if tk is close to 1.
  • Hk denote the set of all pairs of image features vk in the mini-batch where tk is greater than a threshold ⁇ that is empirically and experimentally set in advance.
  • Consistency loss is a loss term used during mini-batch learning to advance learning so that the “image features” of different images that belong to the “same concept” become “more similar” even if they are different images.
  • the discrimination loss calculation unit 3010 calculates the following "discriminability loss" as a loss term.
  • the average image feature amount of concept k in a mini-batch is given by the following formula.
  • set M is the pair set of all pairs of average image features. Note that concept k is excluded from set M if there are no images with concept k in the mini-batch.
  • discriminativity loss is a loss term used to progress through mini-batch learning so that the “average image features” of images belonging to “different concepts” become “more different.”
  • Non-Patent Document 8 uses an autoencoder structure for supervised self-learning. This is effective, for example, in handwritten digit recognition tasks where different visual elements (line patterns) are strongly associated with their positions.
  • a cross with horizontal and vertical lines only appears in the number 4, which is typically placed near the center of the image.
  • the concept regularization unit 300 of this embodiment introduces "supervised self-learning" to evaluate losses based on retrieval of natural images, in addition to losses based on image reconstruction.
  • the reconstruction-based loss calculator 3020 as shown in FIG. 3B or the search-based loss calculator 3030 as shown in FIG. 3A executes the processes described below selectively or in parallel depending on the type of target of the classification task, for example, by external pre-setting, to calculate the loss term in the learning of the concept learner 200.
  • the concept activation t contains enough information to reconstruct the original image.
  • the reconstruction-based loss calculation unit 3020 includes a concept decoder D, which receives the concept activity t as input and reconstructs the original image so that the image x and the output D(t) of the concept decoder D are similar to each other.
  • the concept activity t is insufficient to reconstruct the original image x, since it corresponds to a concept placed at an arbitrary position, and the spatial information required for reconstruction is lost in t, the concept activity t.
  • the search-based loss calculator 3030 instead of reconstructing the original image, performs a simple search task of finding images of the same class in mini-batch B using concept activity t. For any pair (t, t') computed from images x, x' ⁇ B with image labels y and y', respectively, we define a function J as follows:
  • t and t' should be similar to each other if they have the same class label, since similar sets of visual elements should appear in images x and x'. On the other hand, if they do not have the same class label, t and t' should be different.
  • the search-based loss calculation unit 3030 defines the search-based loss l ret in the above-described supervised self-learning by the following equation.
  • search-based loss becomes smaller when the concept activity t is used as a teaching signal in self-learning, for different images with the same class label, as the concept activity becomes more similar, and for different images with different class labels, as the concept activity becomes more dissimilar.
  • the influence of each concept can be visualized by showing the top images to a human based on the revised concept activation t′ and the concept activation t T t ′ calculated for all images in the training image dataset D. (Loss for the classification performance of the classifier)
  • This simple classifier 400 can be interpreted as determining the co-occurrence of the activation level of each concept with the class to be classified.
  • the total loss amount of the image classification learning device 1000 is defined by the following formula, combining the above loss terms:
  • the learning process control unit 700 controls the learning process in accordance with the loss calculated by the loss calculation unit 600 .
  • FIG. 4 is a block diagram for explaining the hardware configuration of the image classification learning device 1000 shown in FIG. 1.
  • the image classification learning device 1000 may be configured so that a computing device (CPU: Central Processing Unit) within its own housing performs the computational processing, or the program processing itself may be executed on a server. In the following, it will be described as if a computing device within its own housing performs the computational processing.
  • a computing device CPU: Central Processing Unit
  • the image classification learning device 1000 includes a computer device 6010, a network communication unit 6300 for communicating with a network, a camera 6400 for providing captured image data to the computer device 6010 as necessary, and a recording medium (e.g., a memory card) 6210 for recording the captured image data and providing it to the computer device 6010.
  • a recording medium e.g., a memory card
  • the recording medium 6210 may be a USB memory, a memory card, or an external storage device.
  • the network communication unit 6300 may be a wired LAN router or a wireless LAN access point.
  • the image data may be provided to the computer device 6010 via the network communication unit 6300.
  • the computer main body constituting this computer device 6010 includes, in addition to a disk drive 6030 and a memory drive 6020, a CPU (Central Processing Unit) 6040, each connected to a bus 6050, memory including a ROM (Read Only Memory) 6060 and a RAM (Random Access Memory) 6070, a non-volatile rewritable storage device such as an SSD (Solid State Drive) 6080, and an input/output interface 6090 for communicating over a network and sending and receiving data with the outside world.
  • An optical disk can be attached to the disk 6030.
  • a memory card 6210 can be attached to the memory drive 6020.
  • the RAM 6070 also functions as a working memory when the CPU 6040 performs calculations, and data and parameters during calculations are stored or read out as needed, and the CPU 6040 executes the calculations.
  • the non-transient recording medium from which the computer can read information such as a program to be installed in the computer main unit may be, for example, a DVD-ROM (Digital Versatile Disc), a memory card, or a USB memory.
  • the computer main unit 6200 is provided with a drive device (memory drive 6020, disk drive 6030) capable of reading these media.
  • the main components of the computer device 6010 are computer hardware and software executed by the CPU 6040.
  • such software is stored in a computer-readable non-transitory storage medium and distributed or distributed via a network, and is obtained via the disk drive 6030 or the network communication unit 6410 and temporarily stored in the SSD 6080. It is then read from the SSD 6080 into the RAM 6070 in the memory and executed by the CPU 6040. Note that when connected to a network, the software may be directly loaded into the RAM and executed without being stored in the SSD 6080.
  • a program for functioning as a computer device 6010 as described below does not necessarily need to include an operating system (OS) that causes the computer main body 6010 to execute functions such as an information processing device.
  • the program only needs to include instructions that call appropriate functions (modules) in a controlled manner to obtain the desired results. How the computer system 6010 operates is well known, and a detailed explanation will be omitted.
  • the CPU 6040 may also be a one-core processor or a multiple-core processor. That is, it may be a single-core processor or a multi-core processor.
  • FIG. 5 is a flowchart for explaining the learning process of the image classification learning device 1000 shown in FIG. 1.
  • the learning image data selected as mini-batch processing is input (S100), and the CNN backbone 100 extracts a feature map (S102).
  • the location information encoding processing unit 2002 encodes the location information of the feature map (S104), and the shaping processing unit 2020 performs flattening processing of the feature map (S106).
  • the shaping processor 2004 performs flattening processing on the feature map F' with the encoded positional information (S108), and the nonlinear processor 2008 performs nonlinear processing on the concept matrix W output from the concept prototype processor 2100 to generate a query Q(W) (S110).
  • the nonlinear processing unit 2006 generates a key K(F') by nonlinear processing of the feature map F' (S112)
  • the similarity calculation unit 2010 calculates the dot product between the query Q(W) and the key K(F')
  • the normalization unit 2012 normalizes the dot product to generate an attention matrix A (S114).
  • the concept occurrence calculation unit 2030 calculates the concept activity t and inputs it to the classifier 400 (S116). Meanwhile, the similarity calculation unit 2040 generates image features V as shown in equation (5) from the dot product of the attention matrix A and the flattened feature map.
  • the concept regularizer 300 calculates the "consistency loss,” “discriminability loss,” “reconstruction-based loss,” and “search-based loss” from the concept activity t and the image feature V, while the quantization error calculator 500 calculates the quantization error loss of the concept activity t according to equation (6).
  • the loss calculator 600 calculates the "classification performance loss” as equation (14), and calculates the total loss L during the learning process of the concept learner 200 using equation (15) (S120).
  • the learning process control unit 700 updates the parameters of the concept learner 200, which is composed of a neural network, based on the total loss L, for example, by the gradient descent method. Note that the method of updating the model parameters is not limited to this method.
  • the learning process control unit 700 determines that the learning process using mini-batch processing meets the specified conditions, it ends the learning process, but if it does not end, it returns the process to step S100.
  • FIG. 5 can be executed by the CPU 2040 in the hardware configuration shown in FIG. 4, for example, using a computer program stored in the non-volatile storage device 2080.
  • Each process can be distributed, or a cloud-type configuration in which the processes are executed by a server device can be used.
  • FIG. 6 is a functional block diagram for explaining the configuration of an image classification device 4000 including a classifier 400 generated by learning in the image classification learning device 1000 shown in FIG. 1 when performing classification processing on a new image.
  • the concept prototype processing unit 2100 when the classification process is performed after the learning process is completed, the concept matrix W is fixed to the one at the end of learning and is not updated, so in Figure 6 it is described as the concept prototype storage unit 4100.
  • the learned parameters are stored in memory.
  • the parameters of the other components in Figure 6, including the classifier 400, are also fixed at the time when learning is completed.
  • FIG. 7 is a conceptual diagram for explaining the processing performed by the classifier 400 in FIG. 6.
  • the concept occurrence calculation unit 2030 calculates the concept activity t for each concept. This is called a "concept bottleneck" in the sense that the diversity of the original image is consolidated into a small number of features.
  • classifier 400 the pattern of co-occurrence of concepts is learned for each label of each training image.
  • a concept bottleneck for a new image is input to classifier 400, the similarity with the pattern of co-occurrence of concepts is calculated, and the label with the highest similarity among the similarities for each label is output as the classification result.
  • FIG. 7 illustrates the case where natural images of birds are learned.
  • concepts such as "yellow head” and "black body” are generated for birds with label 1, and in the process of calculating the classification results, the degree of co-occurrence of each of these concepts with concepts in the target images is determined.
  • the image classification learning device 1000 learns the concept that, for example, an image of a "black bird” with a “yellow head” can be interpreted as a “yellow head” and a "body with black feathers.”
  • the classifier 400 is a single fully connected (FC) layer that encodes the co-occurrence of each concept with each class.
  • the image classification learning device 1000 learns the "classifier” and the “bottleneck concept” simultaneously.
  • concepts are constrained to be individually consistent (i.e., single concepts occupy a smaller volume in feature space) and mutually distinctive (i.e., pairs of concepts do not occupy the same region in feature space, or are separated in such a way that they are less likely to do so).
  • the image classification learning device 1000 can also simultaneously learn classifiers and concepts in an end-to-end manner.
  • the configuration is not necessarily limited to an end-to-end configuration; for example, at least a portion of the CNN backbone 100 may be fixed and only other portions may be trained.
  • the image classification learning device 1000 has a configuration that can essentially explain the classification process to humans in two ways.
  • the classifier provides a prototype of the target class based on the concept.
  • the image classification learning device 1000 is more likely to fail when learning with a large number of classes.
  • the number of concepts k is set to 20 for MNIST and 50 for others by default.
  • Figures 8A and 8B show the classification performance of classifier 400 for CUB200 and ImageNet.
  • BotCL refers to the classifier 400.
  • classifier 400 shows a performance degradation of approximately 3 points.
  • Figure 8B also shows the change in performance versus the number of classes for CUB200 (number of classes: 20-200) and ImageNet (number of classes: 50-300).
  • the classifier 400 has a single fully connected layer configuration that uses the concept activity t as an input, it can be said that there is almost no degradation in classification performance compared to the baseline model. In other words, it can be said that the "concept" corresponding to the concept activity t sufficiently expresses the characteristics of the image classification. (Interpretability) (Validity of detected concepts)
  • the image classification learning device 1000 calculates the concept activity t, which indicates the presence of each concept.
  • This concept activity t corresponds to the sum of the spatial-dimensional attention a k corresponding to the concept k. By visualizing this a k , humans can qualitatively confirm the presence or absence of a concept.
  • Figures 9A to 9C are diagrams for explaining the validity of concepts expressed by concept activity t.
  • Figure 9A shows the original images of the five most frequently activated concepts (i.e., t k >0.5) selected in MNIST, with a k superimposed on them.
  • the area of interest as a concept is displayed brighter by superimposing.
  • the obvious difference between the two is the activation of concept 2 (Cpt.2), and the attention point is the lower vertical line.
  • Figure 9B shows the top five most frequently activated concepts
  • Figure 9C shows the images reconstructed by each concept.
  • the image classification learning device 1000 is designed so that the learned concepts are individually consistent and distinctive from each other.
  • Figure 10 shows the attention levels of the five most important concepts (based on "importance” described below) for the input image of a black bird with a yellow head.
  • Figure 11 is a diagram to explain the concepts expressed by concept activity t for natural images.
  • Figure 11 shows a similar overlay display to that shown in Figure 9B.
  • the classifier 400 consists of one fully connected layer and can be interpreted as learning the co-occurrence of concepts.
  • Figure 12 shows the importance of each concept in the CUB200 dataset.
  • disabling concept 1 results in more images of black-headed birds appearing in the search results.
  • the search task is more robust because the output (highly similar samples) is determined by multiple concepts, and changing one concept does not significantly affect the overall similarity.
  • Figure 12 shows the percentage of samples in the ground truth class among the retrieved samples, allowing us to measure the importance of each concept in this search task.
  • Figure 13 shows the magnitude of each hyperparameter and the accuracy rate (Accuracy: circles), individual consistency (squares: the higher the better), and mutual distinctiveness (triangles: the lower the better).
  • the parameter ⁇ qua controls how close the concept activation t is to a binary value. An appropriate value can regularize the activation and prevent some ambiguous concepts. However, setting an extreme value can cause the gradient to disappear, which can lead to poor learning. The default ⁇ qua of about 0.1 was the optimal value within the scope of this experiment. The impact of ⁇ con and ⁇ dis
  • the image classification learning device 1000 of this embodiment can learn the features used in classification as "concepts" that humans can understand through learning on classification tasks.
  • image classification learning device 1000 can provide not only the learned concepts but also the interpretability of the judgments. [Embodiment 2]
  • the degree of partial damage to the structure is determined based on factors such as the occurrence of cracks on the concrete wall surface.
  • FIG. 14 is a conceptual diagram for explaining the operation of the concrete soundness classification device of embodiment 2.
  • the classification process for the soundness of concrete structures can also be configured so that learning classification and classification processes are performed within an integrated computer device.
  • the learning process is executed using the image data and a discrimination index associated with the image data (in this case, the soundness as correct answer data for the image data), and a learned model of the artificial intelligence is generated.
  • a discrimination index associated with the image data in this case, the soundness as correct answer data for the image data
  • image data capturing the surface of the concrete structure is sent to server 1010.
  • server 1010 a classification process is performed using a trained model of artificial intelligence, and a health level (for example, health level III) is output.
  • a health level for example, health level III
  • the area corresponding to the "concept" used in the classification process in server 1010 is displayed in a frame or the like so that it can be visually recognized by humans.
  • a skilled professional can visually view an image that has been classified in this way and understand not only the classification results of the artificial intelligence, but also which areas of the image attention was focused on to determine the healthiness of the image. The skilled professional can then determine how to respond based on the area of the image that attention was focused on during the classification process.
  • the soundness of concrete structures can be classified into four levels, from soundness I to soundness IV, as disclosed in the Ministry of Land, Infrastructure, Transport and Tourism's "Guidelines for Periodic Bridge Inspection” (Non-Patent Document 9).
  • Non-Patent Document 9 the classification of each soundness level in Non-Patent Document 9 is as follows, taking roads, bridges, etc. as examples.
  • Soundness level I Sound: The road bridge's functionality is not impaired.
  • Level II Preventive maintenance stage: The road bridge's functionality is not impaired, but it is desirable to take preventive maintenance measures.
  • Level III Early action stage: The road bridge's functionality may be impaired, and immediate action is required.
  • Level of soundness IV Emergency action stage: The function of the road bridge is impaired or there is a very high possibility that this will occur, and immediate action is required.
  • the expert technician in response to the soundness level being III, the expert technician makes the decision that "this part is a crack, so let's repair it by injecting resin.”
  • FIG. 15 is a conceptual diagram showing the configuration of training data for generating a trained artificial intelligence model like that shown in FIG. 14.
  • image data is prepared as health level labels corresponding to health levels I to IV.
  • Figure 15 shows health level III as an example. Similar images are assumed to be prepared for the other health levels.
  • Figure 16 shows an example of a system configuration for determining the soundness of concrete.
  • an image of the surface of the concrete structure is captured by the inspector terminal 500.1 and transmitted to the server 1010.
  • the inspector terminal 500.1 transmits, for example, the structure's location information (e.g., latitude and longitude information obtained by a positioning means) and the structure name data entered by the inspector, together with the image data, to the server 1010.
  • the structure's location information e.g., latitude and longitude information obtained by a positioning means
  • the structure name data entered by the inspector together with the image data
  • the server 1010 returns to the inspector terminal 500.1 information on the healthiness assessment result and information indicating areas that were noted in relation to the image sent from the inspector terminal 500.1 (data marked with a frame or the like).
  • the trained artificial intelligence model to be used for classification and judgment processing in server 1010 can be configured to undergo training processing, for example, in another server 1020, and then be transmitted to server 1010, stored, and operated.
  • server 1020 which is in charge of the learning process, collects data such as "position data,” “image data,” and "structure name” from other concrete structures from multiple terminals 500.2 to 500.n (n: natural number). At this time, for example, if each terminal 500.2 to 500.n is operated by a skilled professional, the data collected by server 1020 can be configured so that when an image is captured on the terminal, the soundness is associated with the image data as correct answer data and transmitted to server 1020. Learning data such as that described in FIG. 15 can be generated from the data collected in this way.
  • a specialized technician on the server side may perform a process to associate the soundness with the correct data.
  • server 1020 can use the accumulated learning data to retrain the AI's trained model and improve classification performance.
  • FIG. 17 is a functional block diagram showing the configuration of terminal 500.1 shown in FIG. 16.
  • terminals 500.2 to 500.n have a similar configuration, so their explanation will not be repeated.
  • the terminal 500.1 of this embodiment includes a control unit 5010 for controlling the communication operation and input/output operation of the terminal, a communication processing unit 5040 for generating baseband signals for wireless LAN and mobile communication and sending them to a modulation/demodulation circuit/device, and for obtaining original data or signals from received baseband signals, an imaging sensor 5050 for capturing still images or videos, an image acquisition unit 5060 for converting signals from the imaging sensor 5050 into electrical signals in a predetermined format, a display control unit 5070 for controlling image display on the terminal side, a display unit 5080 for displaying images under the control of the display control unit 5070, a position acquisition unit 5090 for measuring and acquiring the position of the terminal 500.1, and an input interface unit 5100 for receiving input of information from the outside.
  • a control unit 5010 for controlling the communication operation and input/output operation of the terminal
  • a communication processing unit 5040 for generating baseband signals for wireless LAN and mobile communication and sending them to a modulation/demodulation circuit/device, and
  • the imaging sensor 5050 may be a module that combines a lens and a CCD (Charge-Coupled Device) sensor, or a module that combines a lens and a CMOS sensor.
  • CCD Charge-Coupled Device
  • the location acquisition unit 5090 is not limited to a positioning device that uses GPS (Global Positioning System) as an outdoor positioning means, a positioning device that also uses signals from quasi-zenith satellites, a device that enables indoor positioning using beacon signals, etc., and may be any device capable of acquiring location information of the terminal 500.1.
  • GPS Global Positioning System
  • the input interface unit 5100 converts external input into text data using a touch panel or voice recognition of voice input.
  • the control unit 5010 includes an acquired image transmission processing unit 5020 that integrates information such as image data from the image acquisition unit 5060, position data from the position acquisition unit 5090, and data on the structure name from the input interface unit 5100, and transmits the integrated information from the communication processing unit 5040 to the server 1010, and a judgment table generation unit 5030 that generates image data representing a judgment table to be displayed on the display unit 5080 by the display control unit 5070 from the structure name, health data, and data indicating the image and area of interest received from the server 1010 via the communication processing unit 5040.
  • an acquired image transmission processing unit 5020 that integrates information such as image data from the image acquisition unit 5060, position data from the position acquisition unit 5090, and data on the structure name from the input interface unit 5100, and transmits the integrated information from the communication processing unit 5040 to the server 1010
  • a judgment table generation unit 5030 that generates image data representing a judgment table to be displayed on the display unit 5080 by the display control unit 5070 from the structure name, health data, and data
  • FIG. 18 is a block diagram for explaining the hardware configuration of terminal 500.1 shown in FIG. 17.
  • the control unit 5010 is equipped with a calculation device 501 such as an MPU (Micro Processing Unit) or CPU (Central Processing Unit), a storage device consisting of a RAM 502, a ROM 503, etc., and controls each part and creates a native platform environment and application execution environment in the software configuration by executing a predetermined basic OS, middleware, etc.
  • a calculation device 501 such as an MPU (Micro Processing Unit) or CPU (Central Processing Unit)
  • a storage device consisting of a RAM 502, a ROM 503, etc.
  • the imaging device 505 a camera module as described above is used, and as the positioning device 509, a GPS or other positioning device as described above is used.
  • the display device 508 may be a liquid crystal panel or an organic EL panel, and the operation device 510 may be a touch panel integrated with the display panel, or a voice recognition device.
  • the storage device of the control unit 5010 includes, for example, semiconductor memory such as RAM as a temporary storage device and flash memory as a non-volatile storage device.
  • This non-volatile storage device stores driver programs, operating system programs, application programs, data, etc. used for processing in each unit.
  • the non-volatile storage device stores driver programs such as a communication driver program that executes a wireless communication method conforming to the IEEE 802.11 standard or a wireless communication method for mobile communication (cellular communication), an input device driver program that controls the operation device 510, and an output device driver program that controls the display device 508.
  • driver programs such as a communication driver program that executes a wireless communication method conforming to the IEEE 802.11 standard or a wireless communication method for mobile communication (cellular communication), an input device driver program that controls the operation device 510, and an output device driver program that controls the display device 508.
  • the non-volatile storage device also stores operating system programs, such as basic OSs such as Android (registered trademark) OS and iOS (registered trademark), and connection control programs that perform authentication in wireless communication methods such as the IEEE 802.11 standard and wireless communication methods for mobile communication (cellular communication).
  • operating system programs such as basic OSs such as Android (registered trademark) OS and iOS (registered trademark)
  • connection control programs that perform authentication in wireless communication methods such as the IEEE 802.11 standard and wireless communication methods for mobile communication (cellular communication).
  • the communication interface 504 has the functionality to perform wireless LAN communication and mobile communication via a base station (not shown) of a cellular mobile communication network.
  • information relating to judging the soundness of concrete can be obtained on the inspector's terminal based on the input surface image data of a concrete structure, and the image area used for classification by the trained model of artificial intelligence can be confirmed.
  • the "trained model (classifier)" may be recorded as a program or as part of a program on a computer-readable recording medium and installed on another computer.
  • an image classification learning device 1000 as described in FIG. 1 executes a learning process using learning data as shown in FIG. 15, thereby enabling a human to not only classify the soundness of a concrete structure, but also to determine which characteristic parts (characteristic areas) the artificial intelligence focused on in determining the soundness.
  • the configuration of the image classification learning device 1000 and the classification device 4000 is further described, in which the image classification learning device 1000 executes a learning process so that, using image data as input, it not only classifies the soundness of the corresponding concrete structure, but also outputs a method of dealing with such a concrete structure.
  • Figure 19 is a diagram for explaining the composition of learning data that includes image data, health level labels corresponding to the images, and corrective action labels.
  • image data about concrete structures and corresponding information about their soundness can be collected, and data on how to deal with the concrete structures represented by the image data can also be accumulated.
  • FIG. 20 is a functional block diagram for explaining the configuration of the image classification learning device 1000 and classification device 4000 according to the third embodiment.
  • the concept activity t is input not only to the classifier 400 but also to the action discriminator 410, and furthermore, a response action label indicating a response action as shown in FIG. 19 is also input to the action discriminator 410 to execute the learning process.
  • the learning process control unit 700 executes the learning process so that the response measures output from the action discriminator 410 match the teacher data.
  • the action discriminator 410 that has been trained in this way outputs a response action.
  • the image classification learning device 1000 and classification device 4000 of embodiment 3, or according to the learning program and classification program of embodiment 3, it is possible to obtain information relating to the assessment of the soundness of concrete based on the input surface image data of a concrete structure, and a human, in particular a skilled professional, can confirm the image area used for classification by the trained model of artificial intelligence, as well as information relating to countermeasures.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/JP2023/037394 2022-10-18 2023-10-16 画像分類学習装置、画像分類学習方法、画像分類学習プログラムおよび画像分類学習済モデル Ceased WO2024085114A1 (ja)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2024551795A JPWO2024085114A1 (https=) 2022-10-18 2023-10-16

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022167046 2022-10-18
JP2022-167046 2022-10-18

Publications (1)

Publication Number Publication Date
WO2024085114A1 true WO2024085114A1 (ja) 2024-04-25

Family

ID=90737809

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/037394 Ceased WO2024085114A1 (ja) 2022-10-18 2023-10-16 画像分類学習装置、画像分類学習方法、画像分類学習プログラムおよび画像分類学習済モデル

Country Status (2)

Country Link
JP (1) JPWO2024085114A1 (https=)
WO (1) WO2024085114A1 (https=)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021165888A (ja) * 2020-04-06 2021-10-14 キヤノン株式会社 情報処理装置、情報処理装置の情報処理方法およびプログラム

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021165888A (ja) * 2020-04-06 2021-10-14 キヤノン株式会社 情報処理装置、情報処理装置の情報処理方法およびプログラム

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GODAI FUJITA: "Slot Attention", 30 July 2021 (2021-07-30), XP093161797, Retrieved from the Internet <URL:https://qiita.com/fujitagodai4/items/7964a07561e6fe5cbbb3> *
JU HE, JIE-NENG CHEN, SHUAI LIU, ADAM KORTYLEWSKI, CHENG YANG, YUTONG BAI, CHANGHU WANG: "TransFG: A Transformer Architecture for Fine-Grained Recognition", PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, vol. 36, no. 1, 28 June 2022 (2022-06-28), pages 852 - 860, XP093161801, ISSN: 2159-5399, DOI: 10.1609/aaai.v36i1.19967 *

Also Published As

Publication number Publication date
JPWO2024085114A1 (https=) 2024-04-25

Similar Documents

Publication Publication Date Title
Gu et al. Remembering normality: Memory-guided knowledge distillation for unsupervised anomaly detection
Ottoni et al. Tuning of data augmentation hyperparameters in deep learning to building construction image classification with small datasets
Prabhu et al. Few-shot learning for dermatological disease diagnosis
Yang et al. Hyperspectral image classification with spectral and spatial graph using inductive representation learning network
EP3859666A1 (en) Classification device, classification method, program, and information recording medium
CN111651626B (zh) 图像分类方法、装置及可读存储介质
CN120747628B (zh) 一种基于皮肤图像识别的烧伤受损皮肤恢复程度评估方法
Stark et al. Quantifying uncertainty in slum detection: advancing transfer learning with limited data in noisy urban environments
CN115375983B (zh) 一种基于Pairwise分类器的多特征融合目标检测方法及系统
JP2021063706A (ja) プログラム、情報処理装置、情報処理方法及び学習済みモデルの生成方法
Hoang Classification of asphalt pavement cracks using Laplacian pyramid‐based image processing and a hybrid computational approach
Pamuncak et al. Deep learning for bridge load capacity estimation in post-disaster and-conflict zones
Thanam et al. Automated Identification and Classification of Rice Plant Leaf Diseases by Combining CNN and SVM
Hoang et al. Image processing-based classification of pavement fatigue severity using extremely randomized trees, deep neural network, and convolutional neural network
Chatrapathy et al. Skin cancer classification using a hybrid convolutional neural network with SVM classifier
CN119099406A (zh) 一种设备远程智能监控管理系统及方法
WO2024085114A1 (ja) 画像分類学習装置、画像分類学習方法、画像分類学習プログラムおよび画像分類学習済モデル
CN116872961B (zh) 用于智能驾驶车辆的控制系统
Prabandari et al. Early Detection and Classification of Melanoma Based on Android MobileNet V2 Convolutional Neural Network
Mou et al. AI Driven Water Segmentation with deep learning models for Enhanced Flood Monitoring
Nyatte et al. Enhancing the diagnosis of skin neglected tropical diseases by artificial neural networks using evolutionary algorithms: implementation on raspberry pi
Altaei et al. Satellite image classification using multi features based descriptors
Noman et al. AI-driven water segmentation with deep learning models for enhanced flood monitoring
Verma et al. Investigating the performance vs. computational complexity tradeoff in cross-domain fire risk detection
Thiyagarajan Performance Comparison of Hybrid CNN-SVM and CNN-XGBoost models in Concrete Crack Detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23879757

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2024551795

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23879757

Country of ref document: EP

Kind code of ref document: A1