WO2023281904A1 - Assessment system, assessment method, generation system, generation method, inference system, inference method, trained model, program, and information processing system - Google Patents

Assessment system, assessment method, generation system, generation method, inference system, inference method, trained model, program, and information processing system Download PDF

Info

Publication number
WO2023281904A1
WO2023281904A1 PCT/JP2022/018907 JP2022018907W WO2023281904A1 WO 2023281904 A1 WO2023281904 A1 WO 2023281904A1 JP 2022018907 W JP2022018907 W JP 2022018907W WO 2023281904 A1 WO2023281904 A1 WO 2023281904A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature vector
trained model
feature
inference
target image
Prior art date
Application number
PCT/JP2022/018907
Other languages
French (fr)
Japanese (ja)
Inventor
祐介 加藤
俊介 安木
拓実 小島
Original Assignee
パナソニックIpマネジメント株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パナソニックIpマネジメント株式会社 filed Critical パナソニックIpマネジメント株式会社
Publication of WO2023281904A1 publication Critical patent/WO2023281904A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present disclosure relates to evaluation systems, evaluation methods, generation systems, generation methods, inference systems, inference methods, trained models, programs, and information processing systems.
  • Patent Document 1 discloses an image search method.
  • the image retrieval method disclosed in Patent Literature 1 performs dimensionality reduction on each convolutional layer feature of an image to be retrieved, obtains each dimensionality reduction feature, performs clustering based on each dimensionality reduction feature, and obtains a plurality of cluster features. fusing a plurality of cluster features to obtain a global feature; and retrieving an image to be searched from a database based on the global feature.
  • the image search method disclosed in Patent Document 1 uses a trained model.
  • a large amount of data is required to generate a trained model. Since it costs money to prepare a large amount of data, public data that anyone can use may be used.
  • public data that anyone can use may be used.
  • a trained model that has been trained with public data is likely to be used in a general environment, but in a special environment where the input data is biased, the inference accuracy tends to decrease.
  • the present disclosure provides an evaluation system, evaluation method, generation system, generation method, inference system, inference method, trained model, program, and information processing system that enable improvement of inference accuracy without additional learning. do.
  • An evaluation system includes a storage device that stores a trained model that outputs an inference result regarding an object in response to an input of a target image that captures the object, and an arithmetic circuit that evaluates the trained model.
  • the trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the arithmetic circuit executes a first acquisition process, a second acquisition process, and an evaluation process. In the first acquisition process, a first target image including a first target is input to the learned model to acquire a first feature vector corresponding to the first target.
  • a second target image in which a second target having a predetermined feature different from that of the first target is captured is input to the trained model, and a second feature vector corresponding to the second target is acquired.
  • the evaluation process evaluates changes in each of the plurality of components of the feature vector for changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector.
  • An evaluation method of one aspect of the present disclosure is an evaluation method that evaluates a trained model that outputs an inference result regarding an object in response to an input of a target image in which the object is captured.
  • the trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the evaluation method includes a first acquisition process, a second acquisition process, and an evaluation process. In the first acquisition process, a first target image including a first target is input to the learned model to acquire a first feature vector corresponding to the first target.
  • a second target image in which a second target having a predetermined feature different from that of the first target is captured is input to the trained model, and a second feature vector corresponding to the second target is acquired.
  • the evaluation process evaluates changes in each of the plurality of components of the feature vector for changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector.
  • a generation system is a storage device that stores a first trained model that outputs an inference result regarding an object in response to an input of a target image that captures the object, and evaluation information of the first trained model. and an arithmetic circuit that generates a second trained model from the first trained model based on the evaluation information.
  • the first trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the evaluation information indicates an evaluation of the change in each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change in the predetermined feature.
  • the arithmetic circuit executes determination processing and generation processing.
  • the determining process determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information for the target feature among the one or more predetermined features.
  • the first learned model is corrected based on the validity of each of the plurality of components of the feature vectors extracted from the input target image, and the corrected
  • a second trained model is generated from the first trained model by modifying it to output an inference result about the object based on the feature vector.
  • a generation method is based on evaluation information of a first trained model that outputs a result of inference about an object in response to an input of a target image in which the object is captured.
  • 2 Generate a trained model.
  • the first trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the evaluation information indicates an evaluation of the change in each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change in the predetermined feature.
  • the generation method includes determination processing and generation processing. The determining process determines the effectiveness of the plurality of components of the feature vector based on the evaluation information for the target feature among the plurality of predetermined features.
  • the first learned model is corrected based on the validity of each of the plurality of components of the feature vectors extracted from the input target image, and the corrected A second trained model is generated from the first trained model by modifying it to output an inference result about the object based on the feature vector.
  • An inference system includes a storage device that stores a trained model that outputs an inference result regarding an object in response to an input of a target image showing the object, and an arithmetic circuit.
  • the trained model extracts the feature vectors of the object appearing in the input target image, corrects the extracted feature vectors based on the effectiveness of each of the multiple components of the feature vectors, output the result of inference about the object.
  • the effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object.
  • the arithmetic circuit executes acquisition processing and inference processing. Acquisition processing acquires a predetermined target image. In the inference process, the predetermined target image acquired in the acquisition process is input to the trained model stored in the storage device, and an inference result regarding the target object appearing in the predetermined target image is acquired.
  • An inference method of one aspect of the present disclosure uses a trained model that outputs an inference result regarding an object in response to an input of a target image in which the object is captured.
  • the trained model extracts the feature vectors of the object appearing in the input target image, corrects the extracted feature vectors based on the effectiveness of each of the multiple components of the feature vectors, output the result of inference about the object.
  • the effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object.
  • the inference method includes an acquisition process and an inference process. Acquisition processing acquires a predetermined target image. In the inference process, the predetermined target image acquired in the acquisition process is input to the trained model, and an inference result regarding the object appearing in the predetermined target image is acquired.
  • a trained model of one aspect of the present disclosure outputs an inference result regarding an object in response to an input of a target image in which the object is shown.
  • the trained model extracts the feature vectors of the object appearing in the input target image, corrects the extracted feature vectors based on the effectiveness of each of the multiple components of the feature vectors, output the result of inference about the object.
  • the effectiveness of each of the multiple components of the feature vector is set based on the change of each of the multiple components of the feature vector with respect to the change of the predetermined feature in the object.
  • a program of one aspect of the present disclosure is a program for causing an arithmetic circuit to execute at least one of the evaluation method, the generation method, and the inference method.
  • An information processing system of one aspect of the present disclosure includes an evaluation system, a generation system, and an inference system.
  • the evaluation system generates evaluation information of a first trained model that outputs an inference result regarding an object in response to an input of a target image showing the object.
  • a generation system generates a second trained model from the first trained model based on the evaluation information.
  • the inference system uses the second trained model to output an inference result regarding the object in response to the input of the target image in which the object appears.
  • the first trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the evaluation system executes a first acquisition process, a second acquisition process, and an evaluation process.
  • a first target image including a first target is input to the first trained model to acquire a first feature vector corresponding to the first target.
  • a second target image in which a second target having a predetermined feature different from the first target is input to the first trained model, and a second feature vector corresponding to the second target is acquired.
  • the evaluation process evaluates changes in each of the plurality of components of the feature vector for changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector.
  • the evaluation information indicates an evaluation of the change in each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change in the predetermined feature.
  • a production system performs a decision process and a production process.
  • the determining process determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information for the target feature among the one or more predetermined features.
  • the first learned model is corrected based on the validity of each of the plurality of components of the feature vectors extracted from the input target image, and the corrected A second trained model is generated from the first trained model by modifying it to output an inference result about the object based on the feature vector.
  • aspects of the present disclosure enable inference accuracy to be improved without additional learning.
  • FIG. 1 is a block diagram of a configuration example of an information processing system according to a first embodiment;
  • FIG. Block diagram of a configuration example of an evaluation system for the information processing system in FIG. Schematic diagram of an example of a first trained model evaluated by the evaluation system of FIG.
  • Flowchart of an example of an evaluation method executed by the evaluation system of FIG. Schematic explanatory diagram of the evaluation method of FIG.
  • Block diagram of a configuration example of a generation system of the information processing system in FIG. Schematic diagram of an example of a second trained model generated by the generation system of FIG.
  • FIG. 1 is a block diagram of an information processing system 1 according to this embodiment.
  • the information processing system 1 enables execution of rematching of objects.
  • Rematching of objects is a task of searching a large number of images for images in which the same objects as the images prepared in advance appear.
  • the target object is, for example, a person.
  • the information processing system 1 can execute a task of searching a large number of images for an image in which the same person as the image prepared in advance appears.
  • a trained model is used for rematching.
  • a large amount of data is required to generate a trained model. Since it costs money to prepare a large amount of data, public data that anyone can use may be used.
  • public data that anyone can use may be used.
  • a trained model that has been trained with public data is likely to be used in a general environment, but in a special environment where the input data is biased, the inference accuracy tends to decrease.
  • the information processing system 1 of FIG. 1 newly generates a trained model suitable for the environment in which re-matching is performed from a trained model prepared in advance, and enables re-matching using the newly generated trained model. is used to
  • the information processing system 1 in FIG. 1 includes an evaluation system 2, a generation system 3, and an inference system 4.
  • the evaluation system 2 evaluates a prepared trained model (first trained model) LM1 (see FIG. 3) that outputs an inference result regarding an object in response to an input of a target image in which the object is captured.
  • the generating system 3 generates a trained model (second trained model) LM2 (see FIG. 7) from the first trained model LM1 (see FIG. 3) based on the evaluation information D1 of the evaluation system 2.
  • the inference system 4 uses the second trained model LM2 to make an inference about the object for the input of the target image showing the object. In this embodiment, the inference system 4 outputs the result of re-matching.
  • the evaluation system 2 is communicably connected to the generation system 3 via the communication network 51 .
  • the generation system 3 is communicably connected to the inference system 4 via a communication network 52 .
  • FIG. 2 is a block diagram of the evaluation system 2. As shown in FIG. The evaluation system 2 evaluates a first trained model LM1 prepared in advance that outputs an inference result regarding an object in response to an input of a target image in which the object is captured.
  • the evaluation system 2 includes an interface (input/output device 21 and communication device 22), a storage device 23, and an arithmetic circuit 24.
  • FIG. The evaluation system 2 is realized by, for example, one terminal device. Examples of terminal devices include personal computers (desktop computers, laptop computers), mobile terminals (smartphones, tablet terminals, wearable terminals, etc.), and the like.
  • the input/output device 21 functions as an input device for inputting information from the user and as an output device for outputting information to the user. That is, the input/output device 21 is used for inputting information to the evaluation system 2 and for outputting information from the evaluation system 2 .
  • the input/output device 21 has one or more human-machine interfaces. Examples of human-machine interfaces include keyboards, pointing devices (mouse, trackball, etc.), input devices such as touch pads, output devices such as displays and speakers, and input/output devices such as touch panels.
  • the communication device 22 is communicably connected to an external device or system.
  • the communication device 22 is used for communication with the generation system 3 through the communication network 51 .
  • the communication device 22 has one or more communication interfaces.
  • the communication device 22 is connectable to the communication network 51 and has a function of communicating through the communication network 51 .
  • the communication device 22 complies with a predetermined communication protocol.
  • the predetermined communication protocol may be selected from various known wired and wireless communication standards.
  • the storage device 23 is used to store information used by the arithmetic circuit 24 and information generated by the arithmetic circuit 24 .
  • the storage device 23 includes one or more storages (non-temporary storage media).
  • the storage can be, for example, hard disk drives, optical drives, and solid state drives (SSDs).
  • the storage may be any of built-in type, external type, and NAS (network-attached storage) type.
  • the evaluation system 2 may include a plurality of storage devices 23 . Information may be distributed and stored in a plurality of storage devices 23 .
  • the information stored in the storage device 23 includes the first learned model LM1 and evaluation information D1.
  • FIG. 2 shows a state in which the storage device 23 stores all of the first trained model LM1 and the evaluation information D1.
  • the first trained model LM1 and the evaluation information D1 need not always be stored in the storage device 23, and may be stored in the storage device 23 when the arithmetic circuit 24 needs them.
  • the first trained model LM1 outputs the result of inference regarding the object in response to the input of the target image in which the object is shown.
  • the result of the inference indicates whether an object matches a particular object.
  • a target object is a person.
  • the first trained model LM1 is used, for example, to search for a target image in which a specific target is captured from multiple target images. That is, the first trained model LM1 is a model for person re-matching. Person re-matching is a task of searching a large number of images for an image in which the same person as the image prepared in advance appears.
  • the first trained model LM1 becomes the base of the second trained model LM2 generated by the generation system 3.
  • the first trained model LM1 may be generated, for example, by an external system different from the information processing system 1 and provided to the information processing system 1 (in particular, the evaluation system 2 and the generation system 3).
  • FIG. 3 is a schematic diagram of an example of the first trained model LM1.
  • the first trained model LM1 uses a model having a neural network structure, inputs a target image showing the target, and machine learning using a learning data set in which the result of inference about the target is the correct answer ( It is obtained from a trained model generated by performing supervised learning.
  • the first trained model LM1 is configured to extract a feature vector V of an object appearing in an input target image, and output an inference result regarding the object based on the extracted feature vector V. More specifically, the first trained model LM1 in FIG. 3 includes a feature extraction unit F1 and a determination unit F2.
  • the feature extraction unit F1 extracts a feature vector (feature amount) V of an object appearing in a target image input in response to input of the target image.
  • the feature extraction unit F1 in FIG. 3 is configured including an input layer F11, a plurality of intermediate layers (hidden layers) F12 and F13, and an output layer F14.
  • FIG. 3 shows a simplified structure of the first trained model LM1, and the structure of an actual neural network, for example, the structure of a convolutional neural network (CNN), is between the input layer and the output layer. , an arbitrary number of intermediate layers such as convolutional layers, pooling layers, activation functions, fully connected layers, etc.
  • the determination unit F2 outputs a result of inference regarding the object based on the feature vector V extracted by the feature extraction unit F1.
  • the determination unit F2 is, for example, a discriminator. Algorithms such as the K nearest neighbor method (KNN) and the support vector machine (SVM) can be used for the discriminator.
  • KNN K nearest neighbor method
  • SVM support vector machine
  • the result of inference indicates whether an object matches a particular object. Therefore, the determination unit F2 outputs the result of matching of the object (matching result) in response to the input of the feature vector V of the object from the feature extraction unit F1.
  • the output from the determination unit F2 indicates whether or not the object appearing in the target image input to the first trained model LM1 is a specific object.
  • the determination unit F2 compares the feature vector V extracted by the feature extraction unit F1 with the feature vector of the specific target object, thereby determining the feature vector extracted by the feature extraction unit F1 and the feature of the specific target object.
  • the degree of matching (similarity) with the vector is obtained, and the objects appearing in the target image are compared based on the degree of similarity. Since the feature vector is an n-dimensional vector, the degree of matching can be evaluated by cosine similarity, Euclidean distance, or the like.
  • the determination unit F2 outputs a result that the target matches the specific target when the degree of matching is equal to or higher than the determination value.
  • the arithmetic circuit 24 is a circuit that controls the operation of the evaluation system 2.
  • the arithmetic circuit 24 is connected to the input/output device 21 and the communication device 22 and can access the storage device 23 .
  • Arithmetic circuit 24 may be realized by, for example, a computer system including one or more processors (microprocessors) and one or more memories.
  • processors microprocessors
  • One or more processors execute a program (stored in one or more memories or storage devices 23) to realize the functions of the arithmetic circuit 24.
  • FIG. Although the program is pre-recorded in the storage device 23 here, it may be provided through an electric communication line such as the Internet or recorded in a non-temporary recording medium such as a memory card.
  • the arithmetic circuit 24 evaluates the first trained model LM1.
  • the arithmetic circuit 24 executes, for example, the evaluation method shown in FIG. FIG. 4 is a flowchart of an example of an evaluation method executed by the evaluation system 2.
  • FIG. FIG. 5 is a schematic illustration of the evaluation method of FIG.
  • the evaluation method in FIG. 4 includes a first acquisition process S11, a second acquisition process S12, and an evaluation process S13.
  • the first target image 61 in which the first target object 71 is captured is input to the first trained model LM1 (especially the feature extraction unit F1), and the first target image 61 is captured by the first target object 71. Obtain the corresponding first feature vector V1.
  • the first target image 61 is a target image in which a first target object 71 is captured as a target object.
  • the second acquisition process S12 includes a second target object 72 in which a second target object 72 having a predetermined feature different from the first target object 71 appears in the first trained model LM1 (especially the feature extraction unit F1).
  • a second feature vector V2 corresponding to the second object 72 is obtained by inputting the image 62 .
  • the second target image 62 is a target image in which the second target object 72 is captured as the target object.
  • the second target object 72 is a target object having a predetermined characteristic different from that of the first target object 71 .
  • the predetermined characteristic is the aspect ratio of the human head.
  • the head 71a of the first object 71 and the head 72a of the second object 72 have different aspect ratios.
  • the clothes 71b of the first object 71 and the clothes 72b of the second object 72 are similar in clothes and have the same color.
  • the second target image 62 may be generated by changing the first target image 61 so that the first target object 71 has different predetermined characteristics.
  • the predetermined feature may be set based on whether it affects the inference result of the first trained model LM1 with respect to the object.
  • the predetermined feature includes at least one of a color feature of the object and a shape feature of the object.
  • Color-related features of objects include hue, brightness, saturation, and contrast.
  • the features related to the color of the object include the color of the hair, the color of the clothes, the color of the shoes, and the color of the inner shirt (partially visible).
  • Features relating to the shape of the object include the aspect ratio of the object, the head and body of the object, and the body shape of the object.
  • Features relating to the shape of an object can be said to be geometric features.
  • the evaluation process S13 evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison between the first feature vector V1 and the second feature vector V2.
  • Evaluation processing S13 generates evaluation information D1 indicating the result of this evaluation.
  • a difference between the first feature vector V1 and the second feature vector V2 is a difference between the first object 71 and the second object 72, that is, a change in a predetermined feature.
  • the predetermined feature is the aspect ratio of the human head
  • the differences between the first feature vector V1 and the second feature vector V2 include the head 71a of the first object 71 and the second object It is considered that a difference from the head 72a of the object 72 appears.
  • each of the plurality of components of the feature vector for example, (v1, v2, v3, v4, v5) in FIG. 5 for a given feature change. Assessment of change is possible.
  • the evaluation process S13 will be further explained.
  • the evaluation process S13 of FIG. 4 includes a first extraction process S131 and a second extraction process S132.
  • the first extraction process S131 extracts a component whose value in the first feature vector is equal to or greater than a threshold from multiple components of the feature vector.
  • the threshold is set based on a representative value of values in the first feature vector of the plurality of components of the feature vector.
  • the representative value is obtained, for example, from a histogram of the values of the plurality of components of the feature vector in the first feature vector. Representative values include mean, mode, and median.
  • the values of the components of the feature vector in the first feature vector be (0.7, 0.4, 0.2, 0.8, 0.3).
  • the median value is 0.4.
  • the first extraction process S131 extracts the components v1, v2, v4 of the feature vector. Through the first extraction process S131, a first set is obtained for a plurality of components of the feature vector, in which the values in the first feature vector are equal to or greater than the threshold.
  • the second extraction process S132 extracts a component, among the components extracted in the first extraction process S131, for which the difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value.
  • the predetermined value is set to extract a component that significantly changes in the feature vector of the object when the predetermined feature of the object shown in the target image is changed.
  • the predetermined value is a value used to determine whether or not the predetermined feature changed by the component of the feature vector of the first trained model LM1 is emphasized.
  • the predetermined value may be the same as the threshold in the first extraction process S131, for example. For example, assume that the values of the plurality of components of the feature vector in the second feature vector are (0.1, 0.3, 0.2, 0.4, 0.2).
  • the changes of the components v1, v2 and v4 extracted in the first extraction process S131 are 0.6, 0.1 and 0.4, respectively. If the predetermined value is equal to the threshold, then 0.4. In this case, the second extraction process S132 extracts the components v1 and v4 of the feature vector. By the second extraction process S132, a second set is obtained in which the difference between the values in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value for the components extracted in the first extraction process S131. The second set is a subset of the first set.
  • the evaluation information D1 is generated by the evaluation process S13.
  • the evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature of the object.
  • the evaluation information D1 indicates the second set obtained by the second extraction processing S132.
  • the evaluation information D1 is obtained for the first trained model LM1.
  • the evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature.
  • classification by machine learning inference programs such as neural networks is a black box, and there is no unified opinion on the interpretation of inference results.
  • a component that significantly changes the tint which is a feature related to color
  • a component that emphasizes the tint is a component that emphasizes the tint.
  • the evaluation system 2 it can be understood how much the feature vector of the first trained model LM1 reacts (how much importance is placed) on the feature regarding the color of the object and the feature regarding the shape of the object. Since the judgment of the inference of the first trained model LM1 can be explained by using them, the black box can be expected to play a role in improving the explainability.
  • FIG. 6 is a block diagram of the generation system 3.
  • the generation system 3 generates a second trained model LM2 from the first trained model LM1.
  • the generation system 3 uses the evaluation information D1 generated by the evaluation system 2.
  • the evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature of the object.
  • the generation system 3 generates the second trained model LM2 from the first trained model LM1 so as to obtain a more accurate inference result for the predetermined features.
  • the generation system 3 includes an interface (input/output device 31 and communication device 32), a storage device 33, and an arithmetic circuit .
  • the generation system 3 is implemented by, for example, one terminal device. Examples of terminal devices include personal computers (desktop computers, laptop computers), mobile terminals (smartphones, tablet terminals, wearable terminals, etc.), and the like.
  • the input/output device 31 functions as an input device for inputting information from the user and as an output device for outputting information to the user. In other words, the input/output device 31 is used for inputting information to the generation system 3 and for outputting information from the generation system 3 .
  • the input/output device 31 has one or more human-machine interfaces. Examples of human-machine interfaces include keyboards, pointing devices (mouse, trackball, etc.), input devices such as touch pads, output devices such as displays and speakers, and input/output devices such as touch panels.
  • the communication device 32 is communicably connected to an external device or system.
  • the communication device 32 is used for communication with the evaluation system 2 through the communication network 51 and communication with the inference system 4 through the communication network 52 .
  • the communication device 32 has one or more communication interfaces.
  • the communication device 32 can be connected to the communication networks 51 and 52 and has a function of performing communication through the communication networks 51 and 52 .
  • the communication device 32 complies with a predetermined communication protocol.
  • the predetermined communication protocol may be selected from various known wired and wireless communication standards.
  • the storage device 33 is used to store information used by the arithmetic circuit 34 and information generated by the arithmetic circuit 34 .
  • the storage device 33 includes one or more storages (non-temporary storage media).
  • the storage can be, for example, hard disk drives, optical drives, and solid state drives (SSDs).
  • the storage may be any of built-in type, external type, and NAS type.
  • the generation system 3 may include a plurality of storage devices 33 . Information may be distributed and stored in a plurality of storage devices 33 .
  • the information stored in the storage device 33 includes the first learned model LM1, the evaluation information D1, and the second learned model LM2.
  • FIG. 6 shows a state in which the storage device 33 stores all of the first trained model LM1, the evaluation information D1, and the second trained model LM2.
  • the first trained model LM1, the evaluation information D1, and the second trained model LM2 need not always be stored in the storage device 33, and are stored in the storage device 33 when required by the arithmetic circuit 34. It is good if there is In this embodiment, the evaluation information D1 is provided from the evaluation system 2 to the generation system 3.
  • FIG. 6 shows a state in which the storage device 33 stores all of the first trained model LM1, the evaluation information D1, and the second trained model LM2.
  • the first trained model LM1, the evaluation information D1, and the second trained model LM2 need not always be stored in the storage device 33, and are stored in the storage device 33 when required by the arithmetic circuit 34. It is good if there is In this embodiment, the evaluation information D
  • the second trained model LM2 outputs the result of inference regarding the object in response to the input of the target image in which the object is shown.
  • the second trained model LM2 is generated using the first trained model LM1.
  • the second trained model LM2 is generated from the first trained model LM1 so as to obtain more accurate inference results for a given feature.
  • FIG. 7 is a schematic diagram of the second trained model LM2.
  • the second trained model LM2 in FIG. 7 includes a feature extraction unit F1, a determination unit F2, and a correction unit F3.
  • the feature extraction unit F1 extracts a feature vector V of an object appearing in an input target image.
  • the correction unit F3 is located between the feature extraction unit F1 and the determination unit F2.
  • the correction unit F3 corrects the feature vector V extracted by the feature extraction unit F1 based on the effectiveness of each of the plurality of components of the feature vector V.
  • FIG. The effectiveness of each of the plurality of components of the feature vector V is set based on the change of each of the plurality of components of the feature vector V with respect to the change of the predetermined feature of the object.
  • a component that emphasizes a predetermined feature in the feature vector V has a high degree of effectiveness (for example, "1"), and a component that does not emphasize a predetermined feature in the feature vector V has a low degree of effectiveness ( set to '0'). Effectiveness will be described in detail later.
  • the determination unit F2 outputs a result of inference regarding the object based on the feature vector VA corrected by the correction unit F3. In this way, the second trained model LM2 differs from the first trained model LM1 in that it includes the corrector F3.
  • the arithmetic circuit 34 is a circuit that controls the operation of the generation system 3.
  • the arithmetic circuit 34 is connected to the input/output device 31 and the communication device 32 and can access the storage device 33 .
  • Arithmetic circuit 34 may be realized by, for example, a computer system including one or more processors (microprocessors) and one or more memories.
  • processors microprocessors
  • One or more processors execute a program (stored in one or more memories or storage devices 33) to realize the functions of the arithmetic circuit 34.
  • FIG. Although the program is pre-recorded in the storage device 33 here, it may be provided through an electric communication line such as the Internet or recorded in a non-temporary recording medium such as a memory card.
  • the arithmetic circuit 34 generates the second trained model LM2. More specifically, the arithmetic circuit 34 generates the second trained model LM2 from the first trained model LM1 based on the evaluation information D1.
  • the arithmetic circuit 34 executes, for example, the generation method shown in FIG. FIG. 8 is a flow chart of an example of a generation method executed by the generation system 3.
  • FIG. The generation method of FIG. 8 includes a determination process S21 and a generation process S22.
  • the decision processing S21 decides the effectiveness of each of the plurality of components of the feature vector based on the evaluation information D1.
  • the validity is multiplied by the corresponding component of the feature vector, for example. Effectiveness is determined by the degree to which the components of the feature vector emphasize the feature of interest.
  • the evaluation information D1 is used to determine whether the component of the feature vector emphasizes the feature of interest.
  • the evaluation information D1 indicates the second set obtained by the second extraction processing S132. For example, the validity level is set to "1" for the components included in the second set, and the validity level is set to "0" for the components not included in the second set. An efficacy of "1" means that the component is used, and an efficacy of "0" means that the component is not used.
  • the generation processing S22 corrects the feature vector V extracted from the input target image in the first trained model LM1 based on the effectiveness of each of the plurality of components of the feature vector V determined in the determination processing S21.
  • a second trained model LM2 is generated from the first trained model LM1 by modifying the obtained corrected feature vector VA so as to output an inference result regarding the object.
  • the generation processing S22 adds a correction unit F3 that corrects the feature vector V extracted by the feature extraction unit F1 between the feature extraction unit F1 of the first trained model LM1 and the determination unit F2.
  • the determination unit F2 is changed to output the result of inference regarding the object based on the feature vector VA corrected by the correction unit F3, thereby generating the second trained model LM2 from the first trained model LM1. do.
  • the generation process S22 generates a trained model without additional learning.
  • the correction unit F3 corrects based on the effectiveness of each of the plurality of components of the feature vector determined in the determination processing S21.
  • the feature vector components v1, v2, v3, v4, v5 are 0.7, 0.4, 0.2, 0.8, 0.3, and the valid Let the degrees be 1,1,0,1,0.
  • the corrected feature vector components v1, v2, v3, v4, and v5 are 0.7, 0.4, 0.0, 0.8, and 0.0.
  • the second trained model LM2 is obtained from the first trained model LM1 without additional learning.
  • the correction unit F3 corrects the plurality of components of the feature vector V extracted by the feature extraction unit F1 based on the effectiveness of each of the plurality of components of the feature vector V for the target feature. be done.
  • the effectiveness of the target feature By setting the effectiveness of the target feature in this way, it is possible to emphasize the component that emphasizes the target feature more than the component that does not emphasize the target feature. Improvement in inference accuracy can be expected. For example, in a usage environment with many similar clothes such as an office with many suits or a factory with many work clothes, the first trained model LM1 trained with public data may not exhibit sufficient performance.
  • the first trained model LM1 is trained to emphasize the color of clothes, it may not be able to distinguish between people well in an environment where there are many similar clothes.
  • the components included in the second set are emphasized using the evaluation information D1 about the features (face, body shape, shoes, color of accessories, etc.) that are not similar between the objects from the characteristics of the objects.
  • a second trained model LM2 is generated by adding the correction part F3 to the first trained model LM1.
  • the second trained model LM2 including such a correction part F3 enables inference using features unique to objects that are not similar among objects, and thus improves performance.
  • FIG. 9 is a block diagram of the inference system 4.
  • the inference system 4 uses the trained model LM2 to output the result of inference regarding the object in response to the input of the target image in which the object appears.
  • the inference system 4 includes an interface (input/output device 41 and communication device 42 ), a storage device 43 and an arithmetic circuit 44 .
  • the inference system 4 is implemented by, for example, one terminal device. Examples of terminal devices include personal computers (desktop computers, laptop computers), mobile terminals (smartphones, tablet terminals, wearable terminals, etc.), and the like.
  • the input/output device 41 functions as an input device for inputting information from the user and as an output device for outputting information to the user.
  • the input/output device 41 is used for inputting information to the inference system 4 and for outputting information from the inference system 4 .
  • the input/output device 41 has one or more human-machine interfaces. Examples of human-machine interfaces include keyboards, pointing devices (mouse, trackball, etc.), input devices such as touch pads, output devices such as displays and speakers, and input/output devices such as touch panels.
  • the communication device 42 is communicably connected to an external device or system.
  • the communication device 42 is used for communication with the generation system 3 through the communication network 52 .
  • the communication device 42 has one or more communication interfaces.
  • the communication device 42 is connectable to a communication network 52 and has a function of communicating through the communication network 52 .
  • the communication device 42 complies with a predetermined communication protocol.
  • the predetermined communication protocol may be selected from various known wired and wireless communication standards.
  • the storage device 43 is used to store information used by the arithmetic circuit 44 and information generated by the arithmetic circuit 44 .
  • the storage device 43 includes one or more storages (non-temporary storage media).
  • the storage can be, for example, hard disk drives, optical drives, and solid state drives (SSDs).
  • the storage may be any of built-in type, external type, and NAS type.
  • the inference system 4 may include a plurality of storage devices 43 . Information may be distributed and stored in a plurality of storage devices 43 .
  • the information stored in the storage device 43 includes the second trained model LM2.
  • FIG. 9 shows a state in which the storage device 43 stores the second trained model LM2.
  • the second trained model LM2 need not always be stored in the storage device 43, and may be stored in the storage device 43 when it is required by the arithmetic circuit 44.
  • FIG. In this embodiment, the second trained model LM2 is provided from the generation system 3 to the inference system 4.
  • the arithmetic circuit 44 is a circuit that controls the operation of the inference system 4 .
  • the arithmetic circuit 44 is connected to the input/output device 41 and the communication device 42 and can access the storage device 43 .
  • Arithmetic circuit 44 may be realized by, for example, a computer system including one or more processors (microprocessors) and one or more memories.
  • processors microprocessors
  • One or more processors execute a program (stored in one or more memories or storage devices 43) to realize the functions of the arithmetic circuit 44.
  • FIG. Although the program is pre-recorded in the storage device 43 here, it may be provided through an electric communication line such as the Internet or recorded in a non-temporary recording medium such as a memory card.
  • the arithmetic circuit 44 makes an inference using the second trained model LM2.
  • the arithmetic circuit 44 executes, for example, the inference method shown in FIG. FIG. 10 is a flow chart of an example of an inference method executed by the inference system 4 .
  • the inference method of FIG. 10 includes acquisition processing S31 and inference processing S32.
  • Acquisition processing S31 acquires a predetermined target image.
  • the predetermined target image is a target image in which an object to be inferred by the inference system 4 is shown.
  • Acquisition processing S31 acquires a predetermined target image by the input/output device 41, for example.
  • a screen for inputting a predetermined target image is presented by the input/output device 41, and the user can input the predetermined target image according to instructions on the screen.
  • the predetermined target image is input not only by inputting the predetermined target image from an external device to the inference system 4, but also by specifying an image to be used as the predetermined target image from the images stored in the inference system 4. may contain.
  • the inference processing S32 inputs the predetermined target image acquired in the acquisition processing S31 to the second trained model LM2 stored in the storage device 43, and acquires the result of inference regarding the object appearing in the predetermined target image.
  • the result of the inference processing S32 indicates whether or not the target appearing in the predetermined target image acquired in the acquisition processing S31 matches the predetermined target.
  • inference is executed using the second trained model LM2.
  • the correction unit F3 corrects the plurality of components of the feature vector extracted by the feature extraction unit F1 based on the effectiveness of each of the plurality of components of the feature vector for the target feature. .
  • the effectiveness of the target feature it is possible to emphasize the component that emphasizes the target feature more than the component that does not emphasize the target feature. Improvement in inference accuracy can be expected.
  • the evaluation system 2 described above includes a storage device 23 that stores a trained model LM1 that outputs an inference result regarding an object in response to an input of a target image in which the object is captured, and an arithmetic circuit that evaluates the trained model LM1.
  • the trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the arithmetic circuit 24 executes a first acquisition process S11, a second acquisition process S12, and an evaluation process S13.
  • a first acquisition process S11 acquires a first feature vector corresponding to the first target by inputting a first target image in which the first target is shown to the trained model LM1.
  • a second acquisition process S12 acquires a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model LM1. do.
  • the evaluation process S13 evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison of the first feature vector and the second feature vector.
  • the evaluation system 2 for the trained model LM1, an evaluation of the change in each of the multiple components of the feature vector with respect to the change in the predetermined feature is obtained.
  • the predetermined characteristics include head and body, body type, hair color, clothing color, shoe color, inner shirt color, and the like. Therefore, it is possible to identify effective components for a given feature in the feature vector. Inference can be made using a component effective for a predetermined feature among a plurality of components of a feature vector, and an improvement in inference accuracy can be expected.
  • the evaluation system 2 enables improvement of inference accuracy without additional learning.
  • the evaluation process S13 includes a first extraction process S131 for extracting a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector, and a second extraction process S132 for extracting a component for which the difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value among the components obtained.
  • This configuration allows for improved accuracy in evaluating changes in each of the multiple components of the feature vector.
  • the threshold is set based on the representative value of the values in the first feature vector of the plurality of components of the feature vector. This configuration allows for improved accuracy in evaluating changes in each of the multiple components of the feature vector.
  • the predetermined features include at least one of features related to the color of the object and features related to the shape of the object. This configuration allows for improved inference accuracy.
  • the features related to the color of the object include hue, brightness, saturation, and contrast.
  • Features relating to the shape of the object include the aspect ratio of the object, the head and body of the object, and the body shape of the object. This configuration allows for improved inference accuracy.
  • the inference result indicates whether or not the object shown in the target image matches the specific object.
  • the evaluation system 2 executes the following method (evaluation method). That is, the evaluation method evaluates the trained model LM1 that outputs the result of inference regarding the object in response to the input of the target image in which the object is captured.
  • the trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the evaluation method includes a first acquisition process S11, a second acquisition process S12, and an evaluation process S13.
  • a first acquisition process S11 acquires a first feature vector corresponding to the first target by inputting a first target image in which the first target is shown to the trained model LM1.
  • a second acquisition process S12 acquires a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model LM1. do.
  • the evaluation process S13 evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison of the first feature vector and the second feature vector. This configuration allows for improved inference accuracy without additional learning.
  • the evaluation system 2 is implemented using an arithmetic circuit 24. That is, the method (evaluation method) executed by the evaluation system 2 can be realized by the arithmetic circuit 24 executing the program.
  • This program is a computer program for causing the arithmetic circuit 24 to execute the evaluation method described above. This configuration allows for improved inference accuracy without additional learning.
  • the generation system 3 described above has a memory that stores the first trained model LM1 that outputs the result of inference about the object in response to the input of the target image that shows the object, and the evaluation information D1 of the first trained model LM1.
  • the first trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature.
  • the arithmetic circuit 34 executes a determination process S21 and a generation process S22.
  • the determination processing S21 determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information D1 for the target feature among the one or more predetermined features.
  • the generating process S22 obtains the first trained model LM1 by correcting the feature vectors extracted from the input target image based on the effectiveness of each of the plurality of components of the feature vectors determined in the determining process S21.
  • a second trained model LM2 is generated from the first trained model LM1 by changing the output of the inference result regarding the object based on the corrected feature vector.
  • the generation system 3 adds to the first trained model LM1 a process of correcting the feature vector based on the effectiveness of each of the plurality of components of the feature vector, thereby converting the first trained model LM1 to the second trained model Generate LM2.
  • the second trained model LM2 it becomes possible to make an inference using an effective component for a predetermined feature among the plurality of components of the feature vector, and an improvement in inference accuracy can be expected.
  • the generation system 3 executes the following method (generation method). That is, the generation method is to generate a second Generate a trained model LM2.
  • the first trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature.
  • the generation method includes determination processing S21 and generation processing S22.
  • the decision processing S21 decides the effectiveness of the plurality of components of the feature vector based on the evaluation information for the target feature among the plurality of predetermined features.
  • the generating process S22 obtains the first trained model LM1 by correcting the feature vectors extracted from the input target image based on the effectiveness of each of the plurality of components of the feature vectors determined in the determining process S21.
  • a second trained model LM2 is generated from the first trained model LM1 by changing the output of the inference result regarding the object based on the corrected feature vector. This configuration allows for improved inference accuracy without additional learning.
  • the generation system 3 is implemented using an arithmetic circuit 34. That is, the method (generation method) executed by the generation system 3 can be realized by the arithmetic circuit 34 executing the program.
  • This program is a computer program for causing the arithmetic circuit 34 to execute the above generation method. This configuration allows for improved inference accuracy without additional learning.
  • the inference system 4 described above includes a storage device 43 that stores a learned model LM2 that outputs an inference result regarding an object in response to an input of a target image in which the object appears, and an arithmetic circuit 44.
  • the trained model LM2 extracts the feature vector of the object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and converts the corrected feature vector to configured to output a result of an inference about the object based on the
  • the effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object.
  • the arithmetic circuit 24 executes an acquisition process S31 and an inference process S32.
  • Acquisition processing S31 acquires a predetermined target image.
  • the inference processing S32 inputs the predetermined target image acquired in the acquisition processing S31 to the learned model LM2 stored in the storage device 43, and acquires the result of inference regarding the object appearing in the predetermined target image.
  • the learned model LM2 used by the inference system 4 includes processing for correcting the feature vector based on the effectiveness of each of the multiple components of the feature vector.
  • the trained model LM2 it becomes possible to make an inference using an effective component for a predetermined feature among the plurality of components of the feature vector, and an improvement in inference accuracy can be expected.
  • the effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object. Therefore, it is not necessary to perform additional learning such as emphasizing predetermined features in the trained model itself. Therefore, the inference system 4 enables improvement in inference accuracy without additional learning.
  • the inference system 4 executes the following method (inference method). That is, the inference method uses a trained model LM2 that outputs an inference result regarding an object in response to an input of a target image in which the object appears.
  • the trained model LM2 extracts the feature vector of the object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and converts the corrected feature vector to configured to output a result of an inference about the object based on the
  • the effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object.
  • the inference method includes an acquisition process S31 and an inference process S32.
  • Acquisition processing S31 acquires a predetermined target image.
  • the inference processing S32 inputs the predetermined target image acquired in the acquisition processing S31 to the learned model LM2 and acquires the result of inference regarding the target object appearing in the predetermined target image. This configuration allows for improved inference accuracy without additional learning.
  • the inference system 4 is implemented using an arithmetic circuit 44. That is, the method (generation method) executed by the inference system 4 can be realized by the arithmetic circuit 44 executing the program.
  • This program is a computer program for causing the arithmetic circuit 44 to execute the inference method described above. This configuration allows for improved inference accuracy without additional learning.
  • the learned model LM2 described above outputs the result of inference regarding the object in response to the input of the target image in which the object is captured.
  • the trained model LM2 extracts the feature vector of the object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and converts the corrected feature vector to configured to output a result of an inference about the object based on the
  • the effectiveness of each of the multiple components of the feature vector is set based on the change of each of the multiple components of the feature vector with respect to the change of the predetermined feature in the object. This configuration allows for improved inference accuracy without additional learning.
  • FIG. 11 is a block diagram of a configuration example of an information processing system 1A according to the second embodiment.
  • the information processing system 1A of FIG. 11 enables execution of re-matching of the target object in the same manner as the information processing system 1 of FIG.
  • the information processing system 1A of FIG. 11 newly generates a trained model suitable for the environment in which re-matching is performed from a trained model prepared in advance, and enables re-matching using the newly generated trained model. is used to
  • the information processing system 1A of FIG. 11 includes an evaluation system 2A, a generation system 3A, and an inference system 4.
  • FIG. 12 is a block diagram of the evaluation system 2A.
  • the evaluation system 2A evaluates a first trained model LM1 prepared in advance that outputs an inference result regarding an object in response to an input of a target image showing the object.
  • the evaluation system 2A includes an interface (input/output device 21 and communication device 22), a storage device 23A, and an arithmetic circuit 24A.
  • the information stored in the storage device 23A includes the first learned model LM1, the database DB1, and the evaluation information D1A.
  • FIG. 12 shows a state in which the storage device 23A stores all of the first trained model LM1, the database DB1, and the evaluation information D1.
  • the first trained model LM1, the database DB1, and the evaluation information D1A need not always be stored in the storage device 23A, and may be stored in the storage device 23A when required by the arithmetic circuit 24A. .
  • the database DB1 contains data used for evaluating the first trained model LM1.
  • Database DB1 includes a plurality of first target images and a plurality of second target images. For one first object, there can be a plurality of second objects that differ from the first object in a plurality of predetermined characteristics.
  • a plurality of first target images each showing a plurality of different first targets are registered in the database DB1.
  • a plurality of second target images are registered in which a plurality of second target objects having predetermined features different from the plurality of first target objects are captured for each of a plurality of predetermined features different from each other.
  • the number of images registered in the database DB1 is smaller than, for example, the number of images required to generate a reuse model by additional learning of the first trained model LM1.
  • the arithmetic circuit 24A evaluates the first trained model LM1. 24 A of arithmetic circuits perform the evaluation method shown in FIG. 13, for example.
  • FIG. 13 is a flow chart of an example of an evaluation method executed by the evaluation system 2A.
  • the evaluation method of FIG. 13 includes a first acquisition process S11A, a second acquisition process S12A, and an evaluation process S13A.
  • 14 is a schematic illustration of the evaluation method of FIG. 13. FIG.
  • the first target image 61 in which the first target object 71 is captured is input to the first trained model LM1 (especially the feature extraction unit F1), and the first target image 61 is Obtain the corresponding first feature vector V1.
  • the first target image 61 is acquired from the database DB1, for example.
  • a plurality of first target images 61 each showing a plurality of different first targets 71 are registered in the database DB1.
  • a plurality of first target images 61 each including a plurality of first targets 71 different from each other are input to the first trained model LM1, and a plurality of first target images 61 corresponding to the plurality of first targets 71 are obtained.
  • obtain the first feature vector V1 of A plurality of first feature vectors V1 respectively corresponding to a plurality of first objects 71 different from each other are obtained by the first acquisition processing S11A.
  • the second acquisition process S12A is a second target object 72 in which a second target object 72 having a predetermined feature different from the first target object 71 appears in the first trained model LM1 (especially the feature extraction unit F1).
  • a second feature vector V2 corresponding to the second object 72 is obtained by inputting the image 62 .
  • the predetermined characteristic is the aspect ratio of a person's head.
  • the head 71a of the first object 71 and the head 72a of the second object 72 have different aspect ratios.
  • the clothes 71b of the first object 71 and the clothes 72b of the second object 72 are similar in clothes and have the same color.
  • the second target image 62 is acquired from the database DB1, for example.
  • the database DB1 stores a plurality of second target images in which a plurality of second target objects 72 having predetermined features different from the plurality of first target objects 71 are captured for each of a plurality of predetermined features different from each other. 62 is registered.
  • a second acquisition process S12A acquires a second feature vector V2 by inputting the second target image 62 to the first trained model LM1 for each of a plurality of different predetermined features.
  • a plurality of second target images 62 including a plurality of second targets 72 having predetermined characteristics different from the plurality of first targets 71 are input to the first trained model LM1.
  • a plurality of second feature vectors V2 corresponding to a plurality of second objects 72 are acquired.
  • second feature vectors V2 of a plurality of second objects 72 different from the first target object 71 in a plurality of predetermined features are obtained for each first feature vector V1 of the first target object 71. be done.
  • the evaluation process S13A evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison between the first feature vector V1 and the second feature vector V2.
  • the evaluation process S13A generates evaluation information D1 indicating the result of this evaluation.
  • the evaluation process S13A of FIG. 13 includes a first extraction process S131A, a second extraction process S132A, and an arithmetic process S133A.
  • the first extraction process S131A extracts a component whose value in the first feature vector is equal to or greater than a threshold from multiple components of the feature vector.
  • a plurality of first feature vectors respectively corresponding to a plurality of different first objects are obtained by the first acquisition processing S11A. Therefore, the first extraction process S131A extracts a component whose value in the first feature vector is equal to or greater than the threshold from the plurality of components of the feature vector for each of the plurality of first target images.
  • the threshold is set based on a representative value of values in the first feature vector of the plurality of components of the feature vector.
  • the second extraction process S132A extracts a component whose difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value among the components extracted in the first extraction process S131A.
  • the second acquisition processing S12A obtains second feature vectors of a plurality of second targets that differ from the first target in a plurality of predetermined features for each first feature vector of the first target. can get. Therefore, for each second feature vector of a plurality of second objects having different predetermined features, the value of the first feature vector among the components extracted in the first extraction processing S131A and the value of the second feature vector A component whose difference from the value is equal to or greater than a predetermined value is extracted.
  • the second extraction process S132A obtains a second set in which the difference between the values in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value for the components extracted in the first extraction process S131A.
  • the second set is a subset of the first set.
  • the arithmetic processing S133A obtains the ratio of the number of times extracted in the second extraction processing S132A to the number of times extracted in the first extraction processing S131A as a response rate to a change in a predetermined feature for each of the plurality of components of the feature vector. .
  • the number of times extracted in the first extraction process S131A is the number of times included in the first set
  • the number of times extracted in the second extraction process S132A is the number of times included in the second set. For example, assume that the number of extractions in the first extraction processing S131A is 100 and the number of extractions in the second extraction processing S132A is 10 for the component v1 of the feature vector.
  • the reaction rate for each of the plurality of predetermined features is obtained for each component of the feature vector. This makes it easy to grasp which component of the feature vector responds well to which predetermined feature.
  • the evaluation information D1A is generated by the evaluation processing S13A.
  • the evaluation information D1A indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature.
  • the evaluation information D1A indicates the response rate to a change in a predetermined feature for each of the plurality of components of the feature vector.
  • Table 1 below is an example of the evaluation information D1A.
  • the predetermined features are hue, brightness, contrast, aspect ratio, and head-to-body.
  • FIG. 15 is a block diagram of the generating system 3A.
  • the generation system 3A generates the second trained model LM2 from the first trained model LM1.
  • the generation system 3A uses the evaluation information D1A generated by the evaluation system 2A.
  • the evaluation information D1A indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature.
  • the generation system 3A generates the second trained model LM2 from the first trained model LM1 so as to obtain a more accurate inference result for the predetermined feature.
  • the generation system 3A includes an interface (input/output device 31 and communication device 32), a storage device 33A, and an arithmetic circuit 34A.
  • the information stored in the storage device 33A includes the first learned model LM1, the evaluation information D1A, and the second learned model LM2.
  • the first trained model LM1, the evaluation information D1A, and the second trained model LM2 need not always be stored in the storage device 33A, and are stored in the storage device 33A when required by the arithmetic circuit 34A. It is good if there is
  • the arithmetic circuit 34A generates the second trained model LM2. More specifically, the arithmetic circuit 34A generates the second trained model LM2 from the first trained model LM1 based on the evaluation information D1A.
  • 34 A of arithmetic circuits perform the production
  • FIG. 16 is a flow chart of an example of a generation method executed by the generation system 3A. The generation method of FIG. 16 includes determination processing S21A and generation processing S22A.
  • the determination processing S21A determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information D1A for the target feature among the one or more predetermined features.
  • a feature of interest is selected from one or more predetermined features based on whether it affects the result of inference of the second trained model LM2 in the usage environment of the second trained model LM2. For example, if the usage environment of the second trained model LM2 is an office or a factory, there is a high possibility that the objects appearing in the target images input to the second trained model LM2 are people wearing the same or similar clothes. . In such a case, the feature of the target's clothing color, such as the green of work clothes or the black of a suit, is a feature common to a plurality of targets, and does not affect the inference result of the second trained model LM2.
  • the feature of the color of shoes, the color of the inner shirt visible from the neck, the texture of the face, or the accessories worn are more specific to the object than the color of clothes. and is likely to affect the inference result of the second trained model LM2.
  • the feature of the object may be determined by a human eye or the like from a plurality of predetermined features, or may be determined automatically.
  • the effectiveness is multiplied by the corresponding component of the feature vector. Effectiveness is determined by the degree to which the components of the feature vector emphasize the feature of interest.
  • the evaluation information D1A is used to determine whether the component of the feature vector emphasizes the feature of interest.
  • the evaluation information D1A indicates the response rate to the change in the feature of interest for each of the plurality of components of the feature vector. For example, depending on whether the reaction ratio of the component is equal to or greater than a reference value, it is determined whether or not a given feature is emphasized.
  • the degree of effectiveness is set to "1" for components whose reaction rate is equal to or greater than the reference value, and the degree of effectiveness is set to "0" for components whose reaction rate is less than the reference value.
  • the reference value may be a fixed value, or may be set in consideration of the performance of the second trained model LM2. Since the reaction rate is a value between 0 and 1, changing the reference value from 0 to 1 changes the effectiveness of each of the plurality of components of the feature vector. Therefore, the effectiveness of each of the plurality of components of the feature vector can be determined by the reference value when the performance of the second trained model LM2 is the best.
  • the generation processing S22A corrects the feature vector V extracted from the input target image in the first trained model LM1 based on the effectiveness of each of the plurality of components of the feature vector V determined in the determination processing S21.
  • a second trained model LM2 is generated from the first trained model LM1 by modifying the obtained corrected feature vector VA so as to output an inference result regarding the object.
  • the generation processing S22A adds a correction unit F3 that corrects the feature vector extracted by the feature extraction unit F1 between the feature extraction unit F1 and the determination unit F2 of the first trained model LM1.
  • the second trained model LM2 is generated from the first trained model LM1 by changing the determination unit F2 to output the result of inference regarding the object based on the feature vector corrected by the correction unit F3.
  • the generation processing S22A generates a trained model without additional learning.
  • features that are not similar between objects are selected from among the features of the objects, and components with a high response rate to the characteristics of the object are selected.
  • the second trained model LM2 including such a correction part F3 enables inference using features unique to objects that are not similar among objects, and thus improves performance.
  • the first acquisition processing S11A inputs a plurality of first target images each showing a plurality of different first targets to the first trained model LM1 to obtain a plurality of first targets. a plurality of first feature vectors respectively corresponding to .
  • the second acquisition processing S12A a plurality of second target images in which a plurality of second target objects having predetermined characteristics different from the plurality of first target objects are respectively input to the first trained model LM1 to obtain a plurality of second target images. A plurality of second feature vectors corresponding to the two objects are obtained.
  • the evaluation process S13A includes a first extraction process S131A, a second extraction process S132A, and an arithmetic process S133A.
  • the first extraction processing S131A extracts a component whose value in the first feature vector is equal to or greater than a threshold from the plurality of components of the feature vector for each of the plurality of first target images.
  • the second extraction process S132A extracts a component whose difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value among the components extracted in the first extraction process S131A.
  • the arithmetic processing S133A obtains the ratio of the number of times of extraction in the second extraction process to the number of times of extraction in the first extraction process as a response rate to a change in a predetermined feature for each of the plurality of components of the feature vector. This configuration allows obtaining an estimate of the change in each of the multiple components of the feature vector.
  • the second acquisition process S12A acquires a second feature vector by inputting the second target image to the first trained model LM1 for each of a plurality of different predetermined features.
  • the evaluation processing S13A evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature for each of the plurality of predetermined features. This configuration allows obtaining an estimate of the change in each of the multiple components of the feature vector for multiple predetermined features.
  • the threshold is set based on the representative value of the values in the first feature vector of the plurality of components of the feature vector. This configuration makes it possible to improve the accuracy of evaluation of changes in each of the plurality of components of the feature vector.
  • Embodiments of the present disclosure are not limited to the above embodiments.
  • the above-described embodiment can be modified in various ways according to the design, etc., as long as the subject of the present disclosure can be achieved. Modifications of the above embodiment are listed below. Modifications described below can be applied in combination as appropriate.
  • the information processing system 1 may include at least one of the evaluation system 2, the generation system 3, and the inference system 4.
  • the program may be a program for causing an arithmetic circuit to execute at least one of an evaluation method, a generation method, and an inference method. This point also applies to the information processing system 1A.
  • the result of inference is not particularly limited.
  • the result of the inference may be the result of classification of objects appearing in the target image.
  • the second acquisition process S12 may acquire a second feature vector by inputting the second target image to the first trained model LM1 for each of a plurality of different predetermined features. That is, a plurality of second target images having different predetermined characteristics may be set for one first target image.
  • the evaluation system 2, the generation system 3, and the inference system 4 are implemented by different computer systems. At least two of the evaluation system 2, generation system 3, and reasoning system 4 may be implemented in a single computer system. This point also applies to the information processing system 1A.
  • evaluation system 2 (2A), generation system 3 (3A), and reasoning system 4 need not include both input/output devices 21, 31, 41 and communication devices 22, 32, 42, respectively. Absent. This point is the same for the evaluation system 2A and the generation system 3A.
  • each of the evaluation system 2, generation system 3, and reasoning system 4 may be implemented in multiple computer systems. In other words, it is not essential that multiple functions (components) in each of the evaluation system 2, the generation system 3, and the inference system 4 are integrated in one housing, and the evaluation system 2, the generation system 3, and the Each component of the inference system 4 may be distributed over a plurality of housings. Furthermore, even if at least some functions of each of the evaluation system 2, the generation system 3, and the inference system 4, for example, some functions of the arithmetic circuits 24, 34, and 44 are realized by the cloud (cloud computing), etc. good. This point is the same for the evaluation system 2A and the generation system 3A.
  • a first aspect is an evaluation system (2; 2A), which is a storage device ( 23; 23A) and an arithmetic circuit (24; 24A) for evaluating the learned model (LM1).
  • the trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the arithmetic circuit (24; 24A) executes a first acquisition process (S11; S11A), a second acquisition process (S12; S12A), and an evaluation process (S13; S13A).
  • the first acquisition process (S11; S11A) acquires a first feature vector corresponding to the first target by inputting a first target image showing the first target to the learned model (LM1).
  • a second target image in which a second target having a predetermined characteristic different from that of the first target is input to the trained model (LM1) to obtain the second target.
  • the evaluation process (S13; S13A) for obtaining a second feature vector corresponding to the object includes determining a plurality of the feature vectors for changes in the predetermined feature based on a comparison between the first feature vector and the second feature vector. Evaluate the change in each of the components of This aspect allows for improved inference accuracy without additional learning.
  • the second aspect is the evaluation system (2) based on the first aspect.
  • the evaluation process (S13) includes a first extraction process (S131) and a second extraction process (S132).
  • the first extraction process (S131) extracts a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector.
  • the difference between the value of the first feature vector and the value of the second feature vector among the components extracted in the first extraction process (S131) is equal to or greater than a predetermined value. Extract a component.
  • the third aspect is a rating system (2; 2A) based on the first aspect.
  • the second acquisition process (S12; S12A) inputs the second target image to the first trained model (LM1) for each of the plurality of predetermined features different from each other, and Obtain a second feature vector.
  • the evaluation process (S13; S13A) evaluates changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature for each of the plurality of predetermined features. This aspect allows obtaining an estimate of the change in each of the multiple components of the feature vector for multiple predetermined features.
  • the fourth aspect is the evaluation system (2A) based on the third aspect.
  • the first acquisition process (S11A) includes inputting a plurality of first target images each including a plurality of different first target objects to the second trained model (LM1). Obtaining a plurality of first feature vectors respectively corresponding to the plurality of first objects.
  • the second acquisition process (S12A) includes obtaining the plurality of second target images, in which the plurality of second target objects having the predetermined characteristics different from the plurality of first target objects, are captured by the second trained model ( LM1) to obtain a plurality of second feature vectors corresponding to the plurality of second objects.
  • the evaluation process (S13A) executes a first extraction process (S131A), a second extraction process (S132A), and an arithmetic process (S133).
  • the first extraction process (S131A) extracts a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector for each of the plurality of first target images.
  • the second extraction process (S132A) the difference between the value of the first feature vector and the value of the second feature vector among the components extracted in the first extraction process (S131A) is equal to or greater than a predetermined value. Extract a component.
  • the fifth aspect is a rating system (2; 2A) based on the second or fourth aspect.
  • the threshold is set based on a representative value of values in the first feature vector of the plurality of components of the feature vector. According to this aspect, it is possible to improve the accuracy of evaluating the change in each of the plurality of components of the feature vector.
  • the sixth aspect is a rating system (2; 2A) based on any one of the first to fifth aspects.
  • the predetermined characteristic includes at least one of a color characteristic of the object and a shape characteristic of the object. According to this aspect, it is possible to improve the inference accuracy.
  • the seventh aspect is a rating system (2; 2A) based on the sixth aspect.
  • the color-related features of the object include hue, brightness, saturation, and contrast.
  • the features related to the shape of the object include the aspect ratio of the object, the head and body of the object, and the body shape of the object. According to this aspect, it is possible to improve the inference accuracy.
  • the eighth aspect is a rating system (2; 2A) based on any one of the first to seventh aspects.
  • the inference result indicates whether or not the object appearing in the target image matches a specific object. According to this aspect, it is possible to improve the inference accuracy as to whether or not the object appearing in the target image matches the specific object.
  • a ninth aspect is an evaluation method for evaluating a trained model (LM1) that outputs an inference result regarding an object in response to an input of a target image in which the object is captured.
  • the trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the evaluation method includes a first acquisition process (S11; S11A), a second acquisition process (S12; S12A), and an evaluation process (S13; S13A).
  • the first acquisition process (S11; S11A) acquires a first feature vector corresponding to the first target by inputting a first target image showing the first target to the learned model (LM1).
  • a second target image in which a second target having a predetermined characteristic different from that of the first target is input to the trained model (LM1) to obtain the second target.
  • LM1 trained model
  • the evaluation process (S13; S13A) evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison between the first feature vector and the second feature vector. This aspect allows for improved inference accuracy without additional learning.
  • a tenth aspect is a generation system (3; 3A), which includes a first trained model (LM1) for outputting an inference result regarding the object in response to input of a target image in which the object is captured, and the first a storage device (33; 33A) for storing evaluation information (D1; D1A) of a trained model (LM1); and a second trained model (LM1) based on the evaluation information (D1; D1A). and an arithmetic circuit (34; 34A) for generating a trained model (LM2).
  • the first trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. be.
  • the evaluation information (D1; D1A) indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature.
  • the arithmetic circuit (34; 34A) executes a determination process (S21; S21A) and a generation process (S22; S22A).
  • the determination processing (S21; S21A) determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information (D1; D1A) for the target feature among the one or more predetermined features.
  • the generating process (S22; S22A) converts the first trained model (LM1) into the feature vector extracted from the input target image, which is the feature vector determined in the determination process (S21; S21A).
  • LM1 the first trained model
  • LM2 the second trained model
  • evaluation information ( D1; D1A) is a generation method for generating a second trained model (LM2).
  • the first trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. be.
  • the evaluation information (D1; D1A) indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature.
  • the generation method includes a determination process (S21; S21A) and a generation process (S22; S22A).
  • the determination processing determines effectiveness of a plurality of components of the feature vector based on the evaluation information for a target feature among the plurality of predetermined features.
  • the generating process converts the first trained model (LM1) into the feature vector extracted from the input target image, which is the feature vector determined in the determination process (S21; S21A).
  • LM1 the first trained model
  • LM2 the second trained model
  • a twelfth aspect is an inference system (4) comprising a storage device (43) storing a trained model (LM2) outputting an inference result regarding an object in response to an input of a target image showing the object. and an arithmetic circuit (44).
  • the learned model (LM2) extracts a feature vector of an object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and corrects It is configured to output a result of inference about the object based on subsequent feature vectors.
  • the effectiveness of each of the plurality of components of the feature vector is set based on the change of the plurality of components of the feature vector with respect to the change of a predetermined feature of the object.
  • the arithmetic circuit (24) performs an acquisition process (S31) for acquiring a predetermined target image, and the learned target image stored in the storage device (43) for the predetermined target image acquired in the acquisition process (S31). and an inference process (S32) for inputting to the model (LM2) and obtaining an inference result regarding the object appearing in the predetermined target image.
  • This aspect allows for improved inference accuracy without additional learning.
  • a thirteenth aspect is an inference method using a learned model (LM2) that outputs an inference result regarding an object in response to an input of a target image in which the object is captured.
  • the learned model (LM2) extracts a feature vector of an object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and corrects It is configured to output a result of inference about the object based on subsequent feature vectors.
  • the effectiveness of each of the plurality of components of the feature vector is set based on the change of the plurality of components of the feature vector with respect to the change of a predetermined feature of the object.
  • the inference method includes acquisition processing (S31) for acquiring a predetermined target image, and inputting the predetermined target image acquired in the acquisition processing (S31) to the trained model (LM2) to obtain the predetermined target image.
  • a fourteenth aspect is a trained model (LM2) that outputs a result of inference regarding an object in response to an input of a target image in which the object is shown, wherein the feature vector of the object shown in the input target image is correcting the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector; and outputting a result of inference about the object based on the corrected feature vector.
  • the validity of each of the plurality of components of the feature vector is set based on the change of each of the plurality of components of the feature vector with respect to the change of a predetermined feature of the object. This aspect allows for improved inference accuracy without additional learning.
  • At least one of the evaluation method based on the ninth aspect, the generation method based on the eleventh aspect, and the inference method based on the thirteenth aspect is applied to arithmetic circuits (24, 24A, 34, 34A, 44) to be executed. This aspect allows for improved inference accuracy without additional learning.
  • a sixteenth aspect is an information processing system (1; 1A) comprising an evaluation system (2; 2A), a generation system (3; 3A), and an inference system (4).
  • the evaluation system (2; 2A) generates evaluation information (D1; D1A) of a first trained model (LM1) that outputs an inference result regarding the object in response to an input of a target image showing the object.
  • the generating system (3; 3A) generates a second trained model (LM2) from the first trained model (LM1) based on the evaluation information (D1; D1A).
  • the inference system (4) uses the second trained model (LM2) to output an inference result regarding the object in response to an input of a target image in which the object is shown.
  • the first trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector.
  • the evaluation system (2; 2A) executes a first acquisition process (S11; S11A), a second acquisition process (S12; S12A), and an evaluation process (S13; S13A).
  • the first acquisition process (S11; S11A) acquires a first feature vector corresponding to the first target by inputting a first target image showing the first target to the first trained model (LM1). do. a second acquisition process (S12; S12A); 2. Obtain a second feature vector corresponding to the object.
  • Evaluation processing (S13) evaluates changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature based on comparison between the first feature vector and the second feature vector, and outputs the evaluation information ( D1; D1A).
  • the evaluation information (D1; D1A) indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature.
  • the generation system (3; 3A) executes a determination process (S21; S21A) and a generation process (S22; S22A).
  • the determination processing determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information (D1; D1A) for the target feature among the one or more predetermined features.
  • the generating process converts the first trained model (LM1) into the feature vectors extracted from the input target image, each of the plurality of components of the feature vectors determined in the determination process.
  • the second learned model is changed from the first learned model (LM1) Generate a model (LM2). This aspect allows for improved inference accuracy without additional learning.
  • a “learned model” is an “inference program” that incorporates “learned parameters”.
  • Training parameters refer to parameters (coefficients) obtained as a result of learning using the learning data set.
  • a learned parameter is generated by inputting a learning data set to a learning program and mechanically adjusting it for a certain purpose.
  • the trained parameters are adjusted according to the purpose of learning, they are merely parameters (information such as numerical values) by themselves, and they function as trained models only when they are incorporated into an inference program. For example, in the case of deep learning, among the learned parameters, the parameters used for weighting the links between nodes correspond to this.
  • “Inference program” refers to a program that can output certain results for input by applying the built-in learned parameters. For example, it is a program that defines a series of calculation procedures for applying learned parameters obtained as a result of learning to an image given as an input and outputting results (authentication and judgment) for the image. .
  • Training data set is also known as a training data set. For raw data, preprocessing such as removal of missing values and outliers, addition of separate data such as label information (correct data), etc. Alternatively, it refers to secondary processed data generated to facilitate analysis by the target learning method by combining these and applying conversion/processing processing.
  • the training data set may also contain data that has been "padded” by applying certain transformations to the raw data.
  • Raw data refers to data that is primarily obtained by users, vendors, other business operators, research institutions, etc., and that has been converted and processed so that it can be read into the database.
  • Training program refers to a program that finds certain rules from a learning data set and executes an algorithm to generate a model that expresses those rules. Specifically, this corresponds to a program that defines a procedure to be executed by a computer in order to realize learning by the adopted learning method.
  • Additional learning means generating new learned parameters by applying a different training data set to an existing trained model and performing further learning.
  • Reused model means an inference program that incorporates learned parameters newly generated by additional learning.
  • the present disclosure relates to evaluation systems, evaluation methods, generation systems, generation methods, inference systems, inference methods, trained models, and programs. Specifically, an evaluation system and evaluation method that evaluates a pre-prepared trained model that outputs the result of inference about an object in response to an input of a target image in which the object is captured, and an input of a target image in which the target is captured.
  • a generation system and generation method for generating a new trained model from a trained model that outputs the result of inference about the object, and a target image that shows the object using the trained model An inference system and inference method that outputs inference results regarding an object, a trained model that outputs inference results regarding an object in response to input of a target image in which the object appears, and an evaluation method, generation method, and inference method The present disclosure is applicable to a program for

Abstract

Provided are an assessment system, an assessment method, a generation system, a generation method, an inference system, an inference method, a trained model, a program, and an information processing system that enable improvement in inference accuracy without additional training. An assessment system (2) assesses a trained model (LM1). The trained model (LM1) is configured to: extract a feature vector of a target object included in an inputted target image; and output a result of inference concerning the target object on the basis of the extracted feature vector. On the basis of a comparison between a first feature vector that corresponds to a first target object and that is obtained by inputting, to the trained model (LM1), a first target image in which a first target object is included and a second feature vector that corresponds to a second target object having a predetermined feature different from the first target object and that is obtained by inputting, to the trained model (LM1), a second target image in which the second target object is included, the assessment system (2) assesses respective changes of a plurality of components of the feature vectors with respect to changes of predetermined features.

Description

評価システム、評価方法、生成システム、生成方法、推論システム、推論方法、学習済みモデル、プログラム、及び、情報処理システムEvaluation system, evaluation method, generation system, generation method, inference system, inference method, trained model, program, and information processing system
 本開示は、評価システム、評価方法、生成システム、生成方法、推論システム、推論方法、学習済みモデル、プログラム、及び、情報処理システムに関する。 The present disclosure relates to evaluation systems, evaluation methods, generation systems, generation methods, inference systems, inference methods, trained models, programs, and information processing systems.
 特許文献1は、画像検索方法を開示する。特許文献1に開示された画像検索方法は、検索対象の画像の各畳み込み層特徴をそれぞれ次元削減し、各次元削減特徴を得ることと、各次元削減特徴に基づいてクラスタリングし、複数のクラスタ特徴を得ることと、複数のクラスタ特徴を融合し、グローバル特徴を得ることと、グローバル特徴に基づき、データベースから検索対象の画像を検索することと、を含む。 Patent Document 1 discloses an image search method. The image retrieval method disclosed in Patent Literature 1 performs dimensionality reduction on each convolutional layer feature of an image to be retrieved, obtains each dimensionality reduction feature, performs clustering based on each dimensionality reduction feature, and obtains a plurality of cluster features. fusing a plurality of cluster features to obtain a global feature; and retrieving an image to be searched from a database based on the global feature.
特表2020-525908号公報Japanese Patent Publication No. 2020-525908
 特許文献1に開示された画像検索方法は、学習済みモデルを利用している。一般に、学習済みモデルを生成するには、多量のデータが必要になる。多量のデータを用意するにはコストがかかるため、誰でも利用可能な公開データが利用される場合がある。しかしながら、公開データで学習した学習済みモデルは、一般的な環境下での利用はともかく、入力データに何らかの偏りがあるような特殊な環境下の利用では推論精度が低下しやすい。推論精度の低下を抑制するためには特殊な環境下に対応できるように学習済みモデルに対して追加学習をすることが考えられるが、このような追加学習のためには特殊な環境下に対応する多量のデータが必要になり、コストがかかる。 The image search method disclosed in Patent Document 1 uses a trained model. In general, a large amount of data is required to generate a trained model. Since it costs money to prepare a large amount of data, public data that anyone can use may be used. However, a trained model that has been trained with public data is likely to be used in a general environment, but in a special environment where the input data is biased, the inference accuracy tends to decrease. In order to suppress the deterioration of inference accuracy, it is conceivable to perform additional learning to the trained model so that it can handle special environments, but for such additional learning, it is necessary to deal with special environments. It requires a large amount of data to be processed and is costly.
 本開示は、追加学習をしなくても推論精度の向上を可能にする評価システム、評価方法、生成システム、生成方法、推論システム、推論方法、学習済みモデル、プログラム、及び、情報処理システムを提供する。 The present disclosure provides an evaluation system, evaluation method, generation system, generation method, inference system, inference method, trained model, program, and information processing system that enable improvement of inference accuracy without additional learning. do.
 本開示の一態様の評価システムは、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する学習済みモデルを記憶する記憶装置と、学習済みモデルの評価を行う演算回路とを備える。学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。演算回路は、第1取得処理と、第2取得処理と、評価処理とを実行する。第1取得処理は、学習済みモデルに第1対象物が写る第1対象画像を入力して第1対象物に対応する第1特徴ベクトルを取得する。第2取得処理は、学習済みモデルに第1対象物とは所定の特徴が異なる第2対象物が写る第2対象画像を入力して第2対象物に対応する第2特徴ベクトルを取得する。評価処理は、第1特徴ベクトルと第2特徴ベクトルとの比較に基づいて所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価をする。 An evaluation system according to one aspect of the present disclosure includes a storage device that stores a trained model that outputs an inference result regarding an object in response to an input of a target image that captures the object, and an arithmetic circuit that evaluates the trained model. Prepare. The trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The arithmetic circuit executes a first acquisition process, a second acquisition process, and an evaluation process. In the first acquisition process, a first target image including a first target is input to the learned model to acquire a first feature vector corresponding to the first target. In the second acquisition process, a second target image in which a second target having a predetermined feature different from that of the first target is captured is input to the trained model, and a second feature vector corresponding to the second target is acquired. The evaluation process evaluates changes in each of the plurality of components of the feature vector for changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector.
 本開示の一態様の評価方法は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する学習済みモデルの評価を行う評価方法である。学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。評価方法は、第1取得処理と、第2取得処理と、評価処理とを含む。第1取得処理は、学習済みモデルに第1対象物が写る第1対象画像を入力して第1対象物に対応する第1特徴ベクトルを取得する。第2取得処理は、学習済みモデルに第1対象物とは所定の特徴が異なる第2対象物が写る第2対象画像を入力して第2対象物に対応する第2特徴ベクトルを取得する。評価処理は、第1特徴ベクトルと第2特徴ベクトルとの比較に基づいて所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価をする。 An evaluation method of one aspect of the present disclosure is an evaluation method that evaluates a trained model that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation method includes a first acquisition process, a second acquisition process, and an evaluation process. In the first acquisition process, a first target image including a first target is input to the learned model to acquire a first feature vector corresponding to the first target. In the second acquisition process, a second target image in which a second target having a predetermined feature different from that of the first target is captured is input to the trained model, and a second feature vector corresponding to the second target is acquired. The evaluation process evaluates changes in each of the plurality of components of the feature vector for changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector.
 本開示の一態様の生成システムは、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する第1学習済みモデルと第1学習済みモデルの評価情報とを記憶する記憶装置と、評価情報に基づいて第1学習済みモデルから第2学習済みモデルを生成する演算回路とを備える。第1学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。評価情報は、対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。演算回路は、決定処理と、生成処理とを実行する。決定処理は、1以上の所定の特徴のうちの対象の特徴について評価情報に基づいて特徴ベクトルの複数の成分それぞれの有効度を決定する。生成処理は、第1学習済みモデルを、入力された対象画像から抽出された特徴ベクトルを決定処理で決定された特徴ベクトルの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように変更することで、第1学習済みモデルから第2学習済みモデルを生成する。 A generation system according to one aspect of the present disclosure is a storage device that stores a first trained model that outputs an inference result regarding an object in response to an input of a target image that captures the object, and evaluation information of the first trained model. and an arithmetic circuit that generates a second trained model from the first trained model based on the evaluation information. The first trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation information indicates an evaluation of the change in each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change in the predetermined feature. The arithmetic circuit executes determination processing and generation processing. The determining process determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information for the target feature among the one or more predetermined features. In the generating process, the first learned model is corrected based on the validity of each of the plurality of components of the feature vectors extracted from the input target image, and the corrected A second trained model is generated from the first trained model by modifying it to output an inference result about the object based on the feature vector.
 本開示の一態様の生成方法は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する第1学習済みモデルから、第1学習済みモデルの評価情報に基づいて、第2学習済みモデルを生成する。第1学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。評価情報は、対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。生成方法は、決定処理と、生成処理とを含む。決定処理は、複数の所定の特徴のうちの対象の特徴について評価情報に基づいて特徴ベクトルの複数の成分の有効度を決定する。生成処理は、第1学習済みモデルを、入力された対象画像から抽出された特徴ベクトルを決定処理で決定された特徴ベクトルの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように変更することで、第1学習済みモデルから第2学習済みモデルを生成する。 A generation method according to one aspect of the present disclosure is based on evaluation information of a first trained model that outputs a result of inference about an object in response to an input of a target image in which the object is captured. 2 Generate a trained model. The first trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation information indicates an evaluation of the change in each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change in the predetermined feature. The generation method includes determination processing and generation processing. The determining process determines the effectiveness of the plurality of components of the feature vector based on the evaluation information for the target feature among the plurality of predetermined features. In the generating process, the first learned model is corrected based on the validity of each of the plurality of components of the feature vectors extracted from the input target image, and the corrected A second trained model is generated from the first trained model by modifying it to output an inference result about the object based on the feature vector.
 本開示の一態様の推論システムは、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する学習済みモデルを記憶する記憶装置と、演算回路とを備える。学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルを特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。特徴ベクトルの複数の成分それぞれの有効度は、対象物における所定の特徴の変化に対する特徴ベクトルの複数の成分の変化に基づいて設定される。演算回路は、取得処理と、推論処理とを実行する。取得処理は、所定の対象画像を取得する。推論処理は、取得処理で取得した所定の対象画像を記憶装置に記憶された学習済みモデルに入力して所定の対象画像に写る対象物に関する推論の結果を取得する。 An inference system according to one aspect of the present disclosure includes a storage device that stores a trained model that outputs an inference result regarding an object in response to an input of a target image showing the object, and an arithmetic circuit. The trained model extracts the feature vectors of the object appearing in the input target image, corrects the extracted feature vectors based on the effectiveness of each of the multiple components of the feature vectors, output the result of inference about the object. The effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object. The arithmetic circuit executes acquisition processing and inference processing. Acquisition processing acquires a predetermined target image. In the inference process, the predetermined target image acquired in the acquisition process is input to the trained model stored in the storage device, and an inference result regarding the target object appearing in the predetermined target image is acquired.
 本開示の一態様の推論方法は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する学習済みモデルを用いる。学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルを特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。特徴ベクトルの複数の成分それぞれの有効度は、対象物における所定の特徴の変化に対する特徴ベクトルの複数の成分の変化に基づいて設定される。推論方法は、取得処理と、推論処理とを含む。取得処理は、所定の対象画像を取得する。推論処理は、取得処理で取得した所定の対象画像を学習済みモデルに入力して所定の対象画像に写る対象物に関する推論の結果を取得する。 An inference method of one aspect of the present disclosure uses a trained model that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The trained model extracts the feature vectors of the object appearing in the input target image, corrects the extracted feature vectors based on the effectiveness of each of the multiple components of the feature vectors, output the result of inference about the object. The effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object. The inference method includes an acquisition process and an inference process. Acquisition processing acquires a predetermined target image. In the inference process, the predetermined target image acquired in the acquisition process is input to the trained model, and an inference result regarding the object appearing in the predetermined target image is acquired.
 本開示の一態様の学習済みモデルは、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する。学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルを特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。特徴ベクトルの複数の成分それぞれの有効度は、対象物における所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化に基づいて設定される。 A trained model of one aspect of the present disclosure outputs an inference result regarding an object in response to an input of a target image in which the object is shown. The trained model extracts the feature vectors of the object appearing in the input target image, corrects the extracted feature vectors based on the effectiveness of each of the multiple components of the feature vectors, output the result of inference about the object. The effectiveness of each of the multiple components of the feature vector is set based on the change of each of the multiple components of the feature vector with respect to the change of the predetermined feature in the object.
 本開示の一態様のプログラムは、上記の評価方法、上記の生成方法、及び、上記の推論方法の少なくとも一つを、演算回路に実行させるためのプログラムである。 A program of one aspect of the present disclosure is a program for causing an arithmetic circuit to execute at least one of the evaluation method, the generation method, and the inference method.
 本開示の一態様の情報処理システムは、評価システムと、生成システムと、推論システムとを備える。評価システムは、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する第1学習済みモデルの評価情報を生成する。生成システムは、評価情報に基づいて第1学習済みモデルから第2学習済みモデルを生成する。推論システムは、第2学習済みモデルを利用して対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する。第1学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。評価システムは、第1取得処理と、第2取得処理と、評価処理とを実行する。第1取得処理は、第1学習済みモデルに第1対象物が写る第1対象画像を入力して第1対象物に対応する第1特徴ベクトルを取得する。第2取得処理は、第1学習済みモデルに第1対象物とは所定の特徴が異なる第2対象物が写る第2対象画像を入力して第2対象物に対応する第2特徴ベクトルを取得する。評価処理は、第1特徴ベクトルと第2特徴ベクトルとの比較に基づいて所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価をする。評価情報は、対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。生成システムは、決定処理と生成処理とを実行する。決定処理は、1以上の所定の特徴のうちの対象の特徴について評価情報に基づいて特徴ベクトルの複数の成分それぞれの有効度を決定する。生成処理は、第1学習済みモデルを、入力された対象画像から抽出された特徴ベクトルを決定処理で決定された特徴ベクトルの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように変更することで、第1学習済みモデルから第2学習済みモデルを生成する。 An information processing system of one aspect of the present disclosure includes an evaluation system, a generation system, and an inference system. The evaluation system generates evaluation information of a first trained model that outputs an inference result regarding an object in response to an input of a target image showing the object. A generation system generates a second trained model from the first trained model based on the evaluation information. The inference system uses the second trained model to output an inference result regarding the object in response to the input of the target image in which the object appears. The first trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation system executes a first acquisition process, a second acquisition process, and an evaluation process. In the first acquisition process, a first target image including a first target is input to the first trained model to acquire a first feature vector corresponding to the first target. In the second acquisition process, a second target image in which a second target having a predetermined feature different from the first target is input to the first trained model, and a second feature vector corresponding to the second target is acquired. do. The evaluation process evaluates changes in each of the plurality of components of the feature vector for changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector. The evaluation information indicates an evaluation of the change in each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change in the predetermined feature. A production system performs a decision process and a production process. The determining process determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information for the target feature among the one or more predetermined features. In the generating process, the first learned model is corrected based on the validity of each of the plurality of components of the feature vectors extracted from the input target image, and the corrected A second trained model is generated from the first trained model by modifying it to output an inference result about the object based on the feature vector.
 本開示の態様は、追加学習をしなくても推論精度の向上を可能にする。 Aspects of the present disclosure enable inference accuracy to be improved without additional learning.
実施の形態1の情報処理システムの構成例のブロック図1 is a block diagram of a configuration example of an information processing system according to a first embodiment; FIG. 図1の情報処理システムの評価システムの構成例のブロック図Block diagram of a configuration example of an evaluation system for the information processing system in FIG. 図2の評価システムにより評価される第1学習済みモデルの一例の概略図Schematic diagram of an example of a first trained model evaluated by the evaluation system of FIG. 図2の評価システムが実行する評価方法の一例のフローチャートFlowchart of an example of an evaluation method executed by the evaluation system of FIG. 図4の評価方法の概略的な説明図Schematic explanatory diagram of the evaluation method of FIG. 図1の情報処理システムの生成システムの構成例のブロック図Block diagram of a configuration example of a generation system of the information processing system in FIG. 図6の生成システムにより生成される第2学習済みモデルの一例の概略図Schematic diagram of an example of a second trained model generated by the generation system of FIG. 図6の生成システムが実行する生成方法の一例のフローチャートFlowchart of an example of a generation method executed by the generation system of FIG. 図1の情報処理システムの推論システムの構成例のブロック図Block diagram of a configuration example of an inference system of the information processing system in FIG. 図9の推論システムが実行する推論方法の一例のフローチャートFlowchart of an example of an inference method executed by the inference system of FIG. 実施の形態2の情報処理システムの構成例のブロック図Block diagram of a configuration example of an information processing system according to a second embodiment 図11の情報処理システムの評価システムの構成例のブロック図Block diagram of a configuration example of an evaluation system for the information processing system in FIG. 図12の評価システムが実行する評価方法の一例のフローチャートFlowchart of an example of an evaluation method executed by the evaluation system of FIG. 図13の評価方法の概略的な説明図Schematic explanatory diagram of the evaluation method of FIG. 13 図11の情報処理システムの生成システムの構成例のブロック図Block diagram of a configuration example of the generation system of the information processing system of FIG. 図15の生成システムが実行する生成方法の一例のフローチャートFlowchart of an example of a generation method executed by the generation system of FIG.
 [1.実施の形態]
 [1.1 実施の形態1]
 [1.1.1 構成]
 図1は、本実施の形態の情報処理システム1のブロック図である。情報処理システム1は、対象物の再照合を実行可能とする。対象物の再照合は、予め用意された画像と同じ対象物が映る画像を大量の画像から探すタスクである。対象物は、例えば、人である。つまり、情報処理システム1は、予め用意された画像と同じ人が映る画像を大量の画像から探すタスクを実行可能とする。
[1. Embodiment]
[1.1 Embodiment 1]
[1.1.1 Configuration]
FIG. 1 is a block diagram of an information processing system 1 according to this embodiment. The information processing system 1 enables execution of rematching of objects. Rematching of objects is a task of searching a large number of images for images in which the same objects as the images prepared in advance appear. The target object is, for example, a person. In other words, the information processing system 1 can execute a task of searching a large number of images for an image in which the same person as the image prepared in advance appears.
 再照合には、学習済みモデルが利用される。一般に、学習済みモデルを生成するには、多量のデータが必要になる。多量のデータを用意するにはコストがかかるため、誰でも利用可能な公開データが利用される場合がある。しかしながら、公開データで学習した学習済みモデルは、一般的な環境下での利用はともかく、入力データに何らかの偏りがあるような特殊な環境下の利用では推論精度が低下しやすい。 A trained model is used for rematching. In general, a large amount of data is required to generate a trained model. Since it costs money to prepare a large amount of data, public data that anyone can use may be used. However, a trained model that has been trained with public data is likely to be used in a general environment, but in a special environment where the input data is biased, the inference accuracy tends to decrease.
 図1の情報処理システム1は、予め用意された学習済みモデルから、再照合を行う環境下に適合する学習済みモデルを新たに生成して、新たに生成された学習済みモデルによる再照合を可能とするために用いられる。 The information processing system 1 of FIG. 1 newly generates a trained model suitable for the environment in which re-matching is performed from a trained model prepared in advance, and enables re-matching using the newly generated trained model. is used to
 図1の情報処理システム1は、評価システム2と、生成システム3と、推論システム4とを備える。評価システム2は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する予め用意された学習済みモデル(第1学習済みモデル)LM1(図3参照)の評価を行う。生成システム3は、評価システム2の評価情報D1に基づいて、第1学習済みモデルLM1(図3参照)から学習済みモデル(第2学習済みモデル)LM2(図7参照)の生成を行う。推論システム4は、第2学習済みモデルLM2を利用して対象物が写る対象画像の入力に対して対象物に関する推論を行う。本実施の形態では、推論システム4は、再照合の結果を出力する。 The information processing system 1 in FIG. 1 includes an evaluation system 2, a generation system 3, and an inference system 4. The evaluation system 2 evaluates a prepared trained model (first trained model) LM1 (see FIG. 3) that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The generating system 3 generates a trained model (second trained model) LM2 (see FIG. 7) from the first trained model LM1 (see FIG. 3) based on the evaluation information D1 of the evaluation system 2. FIG. The inference system 4 uses the second trained model LM2 to make an inference about the object for the input of the target image showing the object. In this embodiment, the inference system 4 outputs the result of re-matching.
 図1の情報処理システム1では、評価システム2は、生成システム3と通信ネットワーク51を介して通信可能に接続される。生成システム3は、推論システム4と通信ネットワーク52を介して通信可能に接続される。 In the information processing system 1 of FIG. 1, the evaluation system 2 is communicably connected to the generation system 3 via the communication network 51 . The generation system 3 is communicably connected to the inference system 4 via a communication network 52 .
 [1.1.1.1 評価システム]
 図2は、評価システム2のブロック図である。評価システム2は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する予め用意された第1学習済みモデルLM1の評価を行う。評価システム2は、インタフェース(入出力装置21及び通信装置22)と、記憶装置23と、演算回路24とを備える。評価システム2は、例えば、1台の端末装置で実現される。端末装置としては、パーソナルコンピュータ(デスクトップコンピュータ、ラップトップコンピュータ)、携帯端末(スマートフォン、タブレット端末、ウェアラブル端末等)等が挙げられる。
[1.1.1.1 Evaluation system]
FIG. 2 is a block diagram of the evaluation system 2. As shown in FIG. The evaluation system 2 evaluates a first trained model LM1 prepared in advance that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The evaluation system 2 includes an interface (input/output device 21 and communication device 22), a storage device 23, and an arithmetic circuit 24. FIG. The evaluation system 2 is realized by, for example, one terminal device. Examples of terminal devices include personal computers (desktop computers, laptop computers), mobile terminals (smartphones, tablet terminals, wearable terminals, etc.), and the like.
 入出力装置21は、ユーザからの情報の入力のための入力装置、及び、ユーザへの情報の出力のための出力装置としての機能を有する。つまり、入出力装置21は、評価システム2への情報の入力、及び、評価システム2からの情報の出力に利用される。入出力装置21は、1以上のヒューマン・マシン・インタフェースを備える。ヒューマン・マシン・インタフェースの例としては、キーボード、ポインティングデバイス(マウス、トラックボール等)、タッチパッド等の入力装置、ディスプレイ、スピーカ等の出力装置、タッチパネル等の入出力装置が挙げられる。 The input/output device 21 functions as an input device for inputting information from the user and as an output device for outputting information to the user. That is, the input/output device 21 is used for inputting information to the evaluation system 2 and for outputting information from the evaluation system 2 . The input/output device 21 has one or more human-machine interfaces. Examples of human-machine interfaces include keyboards, pointing devices (mouse, trackball, etc.), input devices such as touch pads, output devices such as displays and speakers, and input/output devices such as touch panels.
 通信装置22は、外部装置又はシステムと通信可能に接続される。通信装置22は、通信ネットワーク51を通じた生成システム3との通信に用いられる。通信装置22は、1以上の通信インタフェースを備える。通信装置22は、通信ネットワーク51に接続可能であり、通信ネットワーク51を通じた通信を行う機能を有する。通信装置22は、所定の通信プロトコルに準拠している。所定の通信プロトコルは、周知の様々な有線及び無線通信規格から選択され得る。 The communication device 22 is communicably connected to an external device or system. The communication device 22 is used for communication with the generation system 3 through the communication network 51 . The communication device 22 has one or more communication interfaces. The communication device 22 is connectable to the communication network 51 and has a function of communicating through the communication network 51 . The communication device 22 complies with a predetermined communication protocol. The predetermined communication protocol may be selected from various known wired and wireless communication standards.
 記憶装置23は、演算回路24が利用する情報及び演算回路24で生成される情報を記憶するために用いられる。記憶装置23は、1以上のストレージ(非一時的な記憶媒体)を含む。ストレージは、例えば、ハードディスクドライブ、光学ドライブ、及びソリッドステートドライブ(SSD)のいずれであってもよい。また、ストレージは、内蔵型、外付け型、及びNAS(network-attached storage)型のいずれであってもよい。なお、評価システム2は、複数の記憶装置23を備えてよい。複数の記憶装置23には情報が分散されて記憶されてよい。 The storage device 23 is used to store information used by the arithmetic circuit 24 and information generated by the arithmetic circuit 24 . The storage device 23 includes one or more storages (non-temporary storage media). The storage can be, for example, hard disk drives, optical drives, and solid state drives (SSDs). Also, the storage may be any of built-in type, external type, and NAS (network-attached storage) type. Note that the evaluation system 2 may include a plurality of storage devices 23 . Information may be distributed and stored in a plurality of storage devices 23 .
 記憶装置23に記憶される情報は、第1学習済みモデルLM1と、評価情報D1とを含む。図2では、記憶装置23が、第1学習済みモデルLM1と、評価情報D1との全てを記憶している状態を示している。第1学習済みモデルLM1と、評価情報D1とは常に記憶装置23に記憶されている必要はなく、演算回路24で必要とされるときに記憶装置23に記憶されていればよい。 The information stored in the storage device 23 includes the first learned model LM1 and evaluation information D1. FIG. 2 shows a state in which the storage device 23 stores all of the first trained model LM1 and the evaluation information D1. The first trained model LM1 and the evaluation information D1 need not always be stored in the storage device 23, and may be stored in the storage device 23 when the arithmetic circuit 24 needs them.
 第1学習済みモデルLM1は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する。本実施の形態では、推論の結果は、対象物が特定の対象物と一致するかどうかを示す。対象物は、人である。第1学習済みモデルLM1は、例えば、複数の対象画像から特定の対象物が写る対象画像を探索するために用いられる。つまり、第1学習済みモデルLM1は、人物再照合のためのモデルである。人物再照合は、予め用意された画像と同じ人物が映る画像を大量の画像から探すタスクである。第1学習済みモデルLM1は、生成システム3で生成される第2学習済みモデルLM2のベースとなる。第1学習済みモデルLM1は、例えば、情報処理システム1とは別の外部システムにより生成され、情報処理システム1(特に、評価システム2及び生成システム3)に提供され得る。 The first trained model LM1 outputs the result of inference regarding the object in response to the input of the target image in which the object is shown. In this embodiment, the result of the inference indicates whether an object matches a particular object. A target object is a person. The first trained model LM1 is used, for example, to search for a target image in which a specific target is captured from multiple target images. That is, the first trained model LM1 is a model for person re-matching. Person re-matching is a task of searching a large number of images for an image in which the same person as the image prepared in advance appears. The first trained model LM1 becomes the base of the second trained model LM2 generated by the generation system 3. FIG. The first trained model LM1 may be generated, for example, by an external system different from the information processing system 1 and provided to the information processing system 1 (in particular, the evaluation system 2 and the generation system 3).
 図3は、第1学習済みモデルLM1の一例の概略図である。第1学習済みモデルLM1は、例えば、ニューラルネットワークの構造を有するモデルを用いて、対象物が写る対象画像を入力、対象物に関する推論の結果を正解とする学習用データセットを用いた機械学習(教師あり学習)を実行することによって生成される学習済みモデルから得られる。第1学習済みモデルLM1は、入力された対象画像に写る対象物の特徴ベクトルVを抽出し、抽出された特徴ベクトルVに基づいて対象物に関する推論の結果を出力するように構成される。より詳細には、図3の第1学習済みモデルLM1は、特徴抽出部F1と、判定部F2とを備える。特徴抽出部F1は、対象画像の入力に対して入力された対象画像に写る対象物の特徴ベクトル(特徴量)Vを抽出する。図3の特徴抽出部F1は、入力層F11と、複数の中間層(隠れ層)F12,F13と、出力層F14とを備えて構成される。なお、図3は、第1学習済みモデルLM1の構造を簡略化して示しており、実際のニューラルネットワークの構造、例えば、畳み込みニューラルネットワーク(CNN)の構造は、入力層と出力層との間に、適宜の数の、畳み込み層、プーリング層、活性化関数、全結合層等の多数の中間層を有する。判定部F2は、特徴抽出部F1により抽出された特徴ベクトルVに基づいて対象物に関する推論の結果を出力する。判定部F2は、例えば、識別器である。識別器には、例えば、K近傍法(KNN)、サポートベクターマシーン(SVM)等のアルゴリズムを用いることができる。上述したように、本実施の形態では、推論の結果は、対象物が特定の対象物と一致するかどうかを示す。よって、判定部F2は、特徴抽出部F1からの対象物の特徴ベクトルVの入力に対して対象物の照合の結果(照合結果)を出力する。判定部F2からの出力は、第1学習済みモデルLM1に入力された対象画像に写る対象物が特定の対象物かどうか否かを示す。判定部F2は、例えば、特徴抽出部F1で抽出された特徴ベクトルVを、特定の対象物の特徴ベクトルと比較することで、特徴抽出部F1で抽出された特徴ベクトルと特定の対象物の特徴ベクトルとの一致度(類似度)を求め、類似度に基づいて対象画像に写る対象物の照合を行う。特徴ベクトルはn次元のベクトルであるから、一致度は、コサイン類似度又はユークリッド距離等によって評価することができる。一例として、判定部F2は、一致度が判定値以上である場合に、対象物が特定の対象物と一致するという結果を出力する。 FIG. 3 is a schematic diagram of an example of the first trained model LM1. The first trained model LM1, for example, uses a model having a neural network structure, inputs a target image showing the target, and machine learning using a learning data set in which the result of inference about the target is the correct answer ( It is obtained from a trained model generated by performing supervised learning. The first trained model LM1 is configured to extract a feature vector V of an object appearing in an input target image, and output an inference result regarding the object based on the extracted feature vector V. More specifically, the first trained model LM1 in FIG. 3 includes a feature extraction unit F1 and a determination unit F2. The feature extraction unit F1 extracts a feature vector (feature amount) V of an object appearing in a target image input in response to input of the target image. The feature extraction unit F1 in FIG. 3 is configured including an input layer F11, a plurality of intermediate layers (hidden layers) F12 and F13, and an output layer F14. Note that FIG. 3 shows a simplified structure of the first trained model LM1, and the structure of an actual neural network, for example, the structure of a convolutional neural network (CNN), is between the input layer and the output layer. , an arbitrary number of intermediate layers such as convolutional layers, pooling layers, activation functions, fully connected layers, etc. The determination unit F2 outputs a result of inference regarding the object based on the feature vector V extracted by the feature extraction unit F1. The determination unit F2 is, for example, a discriminator. Algorithms such as the K nearest neighbor method (KNN) and the support vector machine (SVM) can be used for the discriminator. As described above, in the present embodiment, the result of inference indicates whether an object matches a particular object. Therefore, the determination unit F2 outputs the result of matching of the object (matching result) in response to the input of the feature vector V of the object from the feature extraction unit F1. The output from the determination unit F2 indicates whether or not the object appearing in the target image input to the first trained model LM1 is a specific object. For example, the determination unit F2 compares the feature vector V extracted by the feature extraction unit F1 with the feature vector of the specific target object, thereby determining the feature vector extracted by the feature extraction unit F1 and the feature of the specific target object. The degree of matching (similarity) with the vector is obtained, and the objects appearing in the target image are compared based on the degree of similarity. Since the feature vector is an n-dimensional vector, the degree of matching can be evaluated by cosine similarity, Euclidean distance, or the like. As an example, the determination unit F2 outputs a result that the target matches the specific target when the degree of matching is equal to or higher than the determination value.
 演算回路24は、評価システム2の動作を制御する回路である。演算回路24は、入出力装置21及び通信装置22に接続され、記憶装置23にアクセス可能である。演算回路24は、例えば、1以上のプロセッサ(マイクロプロセッサ)と1以上のメモリとを含むコンピュータシステムにより実現され得る。1以上のプロセッサが(1以上のメモリ又は記憶装置23に記憶された)プログラムを実行することで、演算回路24としての機能を実現する。プログラムは、ここでは記憶装置23に予め記録されているが、インターネット等の電気通信回線を通じて、又はメモリカード等の非一時的な記録媒体に記録されて提供されてもよい。 The arithmetic circuit 24 is a circuit that controls the operation of the evaluation system 2. The arithmetic circuit 24 is connected to the input/output device 21 and the communication device 22 and can access the storage device 23 . Arithmetic circuit 24 may be realized by, for example, a computer system including one or more processors (microprocessors) and one or more memories. One or more processors execute a program (stored in one or more memories or storage devices 23) to realize the functions of the arithmetic circuit 24. FIG. Although the program is pre-recorded in the storage device 23 here, it may be provided through an electric communication line such as the Internet or recorded in a non-temporary recording medium such as a memory card.
 演算回路24は、第1学習済みモデルLM1の評価を行う。演算回路24は、例えば、図4に示す評価方法を実行する。図4は、評価システム2が実行する評価方法の一例のフローチャートである。図5は、図4の評価方法の概略的な説明図である。 The arithmetic circuit 24 evaluates the first trained model LM1. The arithmetic circuit 24 executes, for example, the evaluation method shown in FIG. FIG. 4 is a flowchart of an example of an evaluation method executed by the evaluation system 2. FIG. FIG. 5 is a schematic illustration of the evaluation method of FIG.
 図4の評価方法は、第1取得処理S11と、第2取得処理S12と、評価処理S13とを含む。 The evaluation method in FIG. 4 includes a first acquisition process S11, a second acquisition process S12, and an evaluation process S13.
 第1取得処理S11は、図5に示すように、第1学習済みモデルLM1(特に特徴抽出部F1)に第1対象物71が写る第1対象画像61を入力して第1対象物71に対応する第1特徴ベクトルV1を取得する。第1対象画像61は、対象物として第1対象物71が写る対象画像である。 In the first acquisition process S11, as shown in FIG. 5, the first target image 61 in which the first target object 71 is captured is input to the first trained model LM1 (especially the feature extraction unit F1), and the first target image 61 is captured by the first target object 71. Obtain the corresponding first feature vector V1. The first target image 61 is a target image in which a first target object 71 is captured as a target object.
 第2取得処理S12は、図5に示すように、第1学習済みモデルLM1(特に特徴抽出部F1)に第1対象物71とは所定の特徴が異なる第2対象物72が写る第2対象画像62を入力して第2対象物72に対応する第2特徴ベクトルV2を取得する。第2対象画像62は、対象物として第2対象物72が写る対象画像である。第2対象物72は、第1対象物71とは所定の特徴が異なる対象物である。図5では、所定の特徴は、人の頭部の縦横比である。第1対象物71の頭部71aと、第2対象物72の頭部72aとは縦横比が異なっている。第1対象物71の服71bと、第2対象物72の服72bとは似たような服装であり、服の色が同じである。 As shown in FIG. 5, the second acquisition process S12 includes a second target object 72 in which a second target object 72 having a predetermined feature different from the first target object 71 appears in the first trained model LM1 (especially the feature extraction unit F1). A second feature vector V2 corresponding to the second object 72 is obtained by inputting the image 62 . The second target image 62 is a target image in which the second target object 72 is captured as the target object. The second target object 72 is a target object having a predetermined characteristic different from that of the first target object 71 . In FIG. 5, the predetermined characteristic is the aspect ratio of the human head. The head 71a of the first object 71 and the head 72a of the second object 72 have different aspect ratios. The clothes 71b of the first object 71 and the clothes 72b of the second object 72 are similar in clothes and have the same color.
 第2対象画像62は、第1対象画像61において第1対象物71を所定の特徴が異なるように変化させることで生成されてよい。所定の特徴は、対象物に関して第1学習済みモデルLM1の推論の結果に影響があるかどうかに基づいて設定されてよい。所定の特徴は、対象物の色に関する特徴及び対象物の形状に関する特徴の少なくとも1つを含む。対象物の色に関する特徴は、色相、明度、彩度、及びコントラストを含む。具体的には、対象物の色に関する特徴は、髪の色、服の色、靴の色、インナーシャツ(一部が見えている)の色が挙げられる。対象物の形状に関する特徴は、対象物の縦横比、対象物の頭身、及び対象物の体型を含む。対象物の形状に関する特徴は、幾何学的な特徴であるといえる。 The second target image 62 may be generated by changing the first target image 61 so that the first target object 71 has different predetermined characteristics. The predetermined feature may be set based on whether it affects the inference result of the first trained model LM1 with respect to the object. The predetermined feature includes at least one of a color feature of the object and a shape feature of the object. Color-related features of objects include hue, brightness, saturation, and contrast. Specifically, the features related to the color of the object include the color of the hair, the color of the clothes, the color of the shoes, and the color of the inner shirt (partially visible). Features relating to the shape of the object include the aspect ratio of the object, the head and body of the object, and the body shape of the object. Features relating to the shape of an object can be said to be geometric features.
 評価処理S13は、図5に示すように、第1特徴ベクトルV1と第2特徴ベクトルV2との比較に基づいて所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価をする。評価処理S13は、この評価の結果を示す評価情報D1を生成する。第1特徴ベクトルV1と第2特徴ベクトルV2との相違点は、第1対象物71と第2対象物72との相違点、つまりは所定の特徴の変化である。図5の場合、所定の特徴は人の頭部の縦横比であり、第1特徴ベクトルV1と第2特徴ベクトルV2との相違点には、第1対象物71の頭部71aと第2対象物72の頭部72aとの差が表れると考えられる。したがって、第1特徴ベクトルV1と第2特徴ベクトルV2との比較から、所定の特徴の変化に対する特徴ベクトルの複数の成分(例えば、図5の(v1,v2,v3,v4,v5))それぞれの変化の評価が可能である。 As shown in FIG. 5, the evaluation process S13 evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison between the first feature vector V1 and the second feature vector V2. Evaluation processing S13 generates evaluation information D1 indicating the result of this evaluation. A difference between the first feature vector V1 and the second feature vector V2 is a difference between the first object 71 and the second object 72, that is, a change in a predetermined feature. In the case of FIG. 5, the predetermined feature is the aspect ratio of the human head, and the differences between the first feature vector V1 and the second feature vector V2 include the head 71a of the first object 71 and the second object It is considered that a difference from the head 72a of the object 72 appears. Therefore, from a comparison of the first feature vector V1 and the second feature vector V2, each of the plurality of components of the feature vector (for example, (v1, v2, v3, v4, v5) in FIG. 5) for a given feature change. Assessment of change is possible.
 評価処理S13について更に説明する。図4の評価処理S13は、第1抽出処理S131と、第2抽出処理S132とを含む。 The evaluation process S13 will be further explained. The evaluation process S13 of FIG. 4 includes a first extraction process S131 and a second extraction process S132.
 第1抽出処理S131は、特徴ベクトルの複数の成分から第1特徴ベクトルでの値が閾値以上である成分を抽出する。閾値は、特徴ベクトルの複数の成分の第1特徴ベクトルでの値の代表値に基づいて設定される。代表値は、例えば、特徴ベクトルの複数の成分の第1特徴ベクトルでの値のヒストグラムから求められる。代表値としては、平均値、最頻値、中央値が挙げられる。特徴ベクトルの複数の成分が、(v1,v2,v3,v4,v5)であるとする。特徴ベクトルの複数の成分の第1特徴ベクトルでの値が、(0.7,0.4,0.2,0.8,0.3)であるとする。中央値は、0.4である。閾値が中央値である場合、第1抽出処理S131は、特徴ベクトルの成分v1、v2、v4を抽出する。第1抽出処理S131によって、特徴ベクトルの複数の成分について、第1特徴ベクトルでの値が閾値以上である第1集合が得られる。 The first extraction process S131 extracts a component whose value in the first feature vector is equal to or greater than a threshold from multiple components of the feature vector. The threshold is set based on a representative value of values in the first feature vector of the plurality of components of the feature vector. The representative value is obtained, for example, from a histogram of the values of the plurality of components of the feature vector in the first feature vector. Representative values include mean, mode, and median. Let the multiple components of the feature vector be (v1, v2, v3, v4, v5). Let the values of the components of the feature vector in the first feature vector be (0.7, 0.4, 0.2, 0.8, 0.3). The median value is 0.4. When the threshold is the median value, the first extraction process S131 extracts the components v1, v2, v4 of the feature vector. Through the first extraction process S131, a first set is obtained for a plurality of components of the feature vector, in which the values in the first feature vector are equal to or greater than the threshold.
 第2抽出処理S132は、第1抽出処理S131で抽出された成分のうち第1特徴ベクトルでの値と第2特徴ベクトルでの値との差が所定値以上である成分を抽出する。所定値は、対象画像に写る対象物おいて所定の特徴を変化させたときに対象物の特徴ベクトルにおいて有意な変化をする成分を抽出するために設定される。所定値は、第1学習済みモデルLM1の特徴ベクトルの成分が変化させた所定の特徴を重視しているかどうかの判断となる値である。所定値は、例えば、第1抽出処理S131の閾値と同じであってよい。例えば、特徴ベクトルの複数の成分の第2特徴ベクトルでの値が、(0.1,0.3,0.2,0.4,0.2)であるとする。第1抽出処理S131で抽出された成分v1、v2、v4それぞれの変化は0.6、0.1、0.4である。所定値が閾値と等しい場合は、0.4である。この場合、第2抽出処理S132は、特徴ベクトルの成分v1、v4を抽出する。第2抽出処理S132によって、第1抽出処理S131で抽出された成分について第1特徴ベクトルでの値と第2特徴ベクトルでの値との差が所定値以上である第2集合が得られる。第2集合は第1集合の部分集合である。 The second extraction process S132 extracts a component, among the components extracted in the first extraction process S131, for which the difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value. The predetermined value is set to extract a component that significantly changes in the feature vector of the object when the predetermined feature of the object shown in the target image is changed. The predetermined value is a value used to determine whether or not the predetermined feature changed by the component of the feature vector of the first trained model LM1 is emphasized. The predetermined value may be the same as the threshold in the first extraction process S131, for example. For example, assume that the values of the plurality of components of the feature vector in the second feature vector are (0.1, 0.3, 0.2, 0.4, 0.2). The changes of the components v1, v2 and v4 extracted in the first extraction process S131 are 0.6, 0.1 and 0.4, respectively. If the predetermined value is equal to the threshold, then 0.4. In this case, the second extraction process S132 extracts the components v1 and v4 of the feature vector. By the second extraction process S132, a second set is obtained in which the difference between the values in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value for the components extracted in the first extraction process S131. The second set is a subset of the first set.
 評価情報D1は、評価処理S13により生成される。評価情報D1は、対象物における所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。特に、本実施の形態では、評価情報D1は、第2抽出処理S132によって得られる第2集合を示す。 The evaluation information D1 is generated by the evaluation process S13. The evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature of the object. In particular, in this embodiment, the evaluation information D1 indicates the second set obtained by the second extraction processing S132.
 以上述べた評価システム2によれば、第1学習済みモデルLM1について、評価情報D1が得られる。評価情報D1は、所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。一般にニューラルネットワーク等の機械学習の推論プログラムによる分類はブラックボックスであり、推論結果の解釈について統一的な見解はないが、評価システム2によれば、対象画像に写る対象物おいて所定の特徴を変化させたとき、対象物の特徴ベクトルにおいて有意な変化を生じる成分を把握できる。これによって、第1学習済みモデルLM1の特徴ベクトルの各成分が所定の特徴を重視しているかどうかの一つの判断材料を与えることができる。そのため、特徴ベクトルにおいて有意な変化をする成分を用いて推論を行うことで推論結果に一つの解釈を与えることができる。例えば、色に関する特徴である色味について大きく変化する成分は色味を重視する成分であると解釈できる。評価システム2を用いることで、対象物の色に関する特徴及び対象物の形状に関する特徴に対して、第1学習済みモデルLM1の特徴ベクトルがどの程度反応(どの程度重視)するかが分かる。それらを用いて第1学習済みモデルLM1の推論の判断が説明できるのでブラックボックスの説明性向上の役割が期待できる。 According to the evaluation system 2 described above, the evaluation information D1 is obtained for the first trained model LM1. The evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature. In general, classification by machine learning inference programs such as neural networks is a black box, and there is no unified opinion on the interpretation of inference results. When changed, it is possible to grasp the component that causes a significant change in the feature vector of the object. As a result, it is possible to provide one criterion for determining whether each component of the feature vector of the first trained model LM1 emphasizes a predetermined feature. Therefore, one interpretation can be given to the inference result by performing inference using the component that significantly changes in the feature vector. For example, it can be interpreted that a component that significantly changes the tint, which is a feature related to color, is a component that emphasizes the tint. By using the evaluation system 2, it can be understood how much the feature vector of the first trained model LM1 reacts (how much importance is placed) on the feature regarding the color of the object and the feature regarding the shape of the object. Since the judgment of the inference of the first trained model LM1 can be explained by using them, the black box can be expected to play a role in improving the explainability.
 [1.1.1.2 生成システム]
 図6は、生成システム3のブロック図である。生成システム3は、第1学習済みモデルLM1から第2学習済みモデルLM2の生成を行う。特に、生成システム3は、評価システム2で生成される評価情報D1を利用する。評価情報D1は、対象物における所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。生成システム3は、所定の特徴についてより精度の高い推論結果が得られるように、第1学習済みモデルLM1から第2学習済みモデルLM2を生成する。生成システム3は、インタフェース(入出力装置31及び通信装置32)と、記憶装置33と、演算回路34とを備える。生成システム3は、例えば、1台の端末装置で実現される。端末装置としては、パーソナルコンピュータ(デスクトップコンピュータ、ラップトップコンピュータ)、携帯端末(スマートフォン、タブレット端末、ウェアラブル端末等)等が挙げられる。
[1.1.1.2 Generation system]
FIG. 6 is a block diagram of the generation system 3. As shown in FIG. The generation system 3 generates a second trained model LM2 from the first trained model LM1. In particular, the generation system 3 uses the evaluation information D1 generated by the evaluation system 2. FIG. The evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature of the object. The generation system 3 generates the second trained model LM2 from the first trained model LM1 so as to obtain a more accurate inference result for the predetermined features. The generation system 3 includes an interface (input/output device 31 and communication device 32), a storage device 33, and an arithmetic circuit . The generation system 3 is implemented by, for example, one terminal device. Examples of terminal devices include personal computers (desktop computers, laptop computers), mobile terminals (smartphones, tablet terminals, wearable terminals, etc.), and the like.
 入出力装置31は、ユーザからの情報の入力のための入力装置、及び、ユーザへの情報の出力のための出力装置としての機能を有する。つまり、入出力装置31は、生成システム3への情報の入力、及び、生成システム3からの情報の出力に利用される。入出力装置31は、1以上のヒューマン・マシン・インタフェースを備える。ヒューマン・マシン・インタフェースの例としては、キーボード、ポインティングデバイス(マウス、トラックボール等)、タッチパッド等の入力装置、ディスプレイ、スピーカ等の出力装置、タッチパネル等の入出力装置が挙げられる。 The input/output device 31 functions as an input device for inputting information from the user and as an output device for outputting information to the user. In other words, the input/output device 31 is used for inputting information to the generation system 3 and for outputting information from the generation system 3 . The input/output device 31 has one or more human-machine interfaces. Examples of human-machine interfaces include keyboards, pointing devices (mouse, trackball, etc.), input devices such as touch pads, output devices such as displays and speakers, and input/output devices such as touch panels.
 通信装置32は、外部装置又はシステムと通信可能に接続される。通信装置32は、通信ネットワーク51を通じた評価システム2との通信、及び、通信ネットワーク52を通じた推論システム4との通信に用いられる。通信装置32は、1以上の通信インタフェースを備える。通信装置32は、通信ネットワーク51,52に接続可能であり、通信ネットワーク51,52を通じた通信を行う機能を有する。通信装置32は、所定の通信プロトコルに準拠している。所定の通信プロトコルは、周知の様々な有線及び無線通信規格から選択され得る。 The communication device 32 is communicably connected to an external device or system. The communication device 32 is used for communication with the evaluation system 2 through the communication network 51 and communication with the inference system 4 through the communication network 52 . The communication device 32 has one or more communication interfaces. The communication device 32 can be connected to the communication networks 51 and 52 and has a function of performing communication through the communication networks 51 and 52 . The communication device 32 complies with a predetermined communication protocol. The predetermined communication protocol may be selected from various known wired and wireless communication standards.
 記憶装置33は、演算回路34が利用する情報及び演算回路34で生成される情報を記憶するために用いられる。記憶装置33は、1以上のストレージ(非一時的な記憶媒体)を含む。ストレージは、例えば、ハードディスクドライブ、光学ドライブ、及びソリッドステートドライブ(SSD)のいずれであってもよい。また、ストレージは、内蔵型、外付け型、及びNAS型のいずれであってもよい。なお、生成システム3は、複数の記憶装置33を備えてよい。複数の記憶装置33には情報が分散されて記憶されてよい。 The storage device 33 is used to store information used by the arithmetic circuit 34 and information generated by the arithmetic circuit 34 . The storage device 33 includes one or more storages (non-temporary storage media). The storage can be, for example, hard disk drives, optical drives, and solid state drives (SSDs). Also, the storage may be any of built-in type, external type, and NAS type. Note that the generation system 3 may include a plurality of storage devices 33 . Information may be distributed and stored in a plurality of storage devices 33 .
 記憶装置33に記憶される情報は、第1学習済みモデルLM1と、評価情報D1と、第2学習済みモデルLM2とを含む。図6では、記憶装置33が、第1学習済みモデルLM1と、評価情報D1と、第2学習済みモデルLM2との全てを記憶している状態を示している。第1学習済みモデルLM1と、評価情報D1と、第2学習済みモデルLM2とは常に記憶装置33に記憶されている必要はなく、演算回路34で必要とされるときに記憶装置33に記憶されていればよい。本実施の形態では、評価情報D1は、評価システム2から生成システム3に与えられる。 The information stored in the storage device 33 includes the first learned model LM1, the evaluation information D1, and the second learned model LM2. FIG. 6 shows a state in which the storage device 33 stores all of the first trained model LM1, the evaluation information D1, and the second trained model LM2. The first trained model LM1, the evaluation information D1, and the second trained model LM2 need not always be stored in the storage device 33, and are stored in the storage device 33 when required by the arithmetic circuit 34. It is good if there is In this embodiment, the evaluation information D1 is provided from the evaluation system 2 to the generation system 3. FIG.
 第2学習済みモデルLM2は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する。第2学習済みモデルLM2は、第1学習済みモデルLM1を利用して生成される。特に、第2学習済みモデルLM2は、所定の特徴についてより精度の高い推論結果が得られるように、第1学習済みモデルLM1から生成される。図7は、第2学習済みモデルLM2の概略図である。図7の第2学習済みモデルLM2は、特徴抽出部F1と、判定部F2と、補正部F3とを備える。特徴抽出部F1は、入力された対象画像に写る対象物の特徴ベクトルVを抽出する。補正部F3は、特徴抽出部F1と判定部F2との間にある。補正部F3は、特徴抽出部F1により抽出された特徴ベクトルVを特徴ベクトルVの複数の成分それぞれの有効度に基づいて補正する。特徴ベクトルVの複数の成分それぞれの有効度は、対象物における所定の特徴の変化に対する特徴ベクトルVの複数の成分それぞれの変化に基づいて設定される。特徴ベクトルVにおいて所定の特徴を重視している成分については有効度が高く(例えば、「1」に)設定され、特徴ベクトルVにおいて所定の特徴を重視していない成分については有効度が低く(例えば、「0」に)設定される。有効度については後に詳述する。判定部F2は、補正部F3による補正後の特徴ベクトルVAに基づいて対象物に関する推論の結果を出力する。このように、第2学習済みモデルLM2は、補正部F3を備えている点で、第1学習済みモデルLM1と異なる。 The second trained model LM2 outputs the result of inference regarding the object in response to the input of the target image in which the object is shown. The second trained model LM2 is generated using the first trained model LM1. In particular, the second trained model LM2 is generated from the first trained model LM1 so as to obtain more accurate inference results for a given feature. FIG. 7 is a schematic diagram of the second trained model LM2. The second trained model LM2 in FIG. 7 includes a feature extraction unit F1, a determination unit F2, and a correction unit F3. The feature extraction unit F1 extracts a feature vector V of an object appearing in an input target image. The correction unit F3 is located between the feature extraction unit F1 and the determination unit F2. The correction unit F3 corrects the feature vector V extracted by the feature extraction unit F1 based on the effectiveness of each of the plurality of components of the feature vector V. FIG. The effectiveness of each of the plurality of components of the feature vector V is set based on the change of each of the plurality of components of the feature vector V with respect to the change of the predetermined feature of the object. A component that emphasizes a predetermined feature in the feature vector V has a high degree of effectiveness (for example, "1"), and a component that does not emphasize a predetermined feature in the feature vector V has a low degree of effectiveness ( set to '0'). Effectiveness will be described in detail later. The determination unit F2 outputs a result of inference regarding the object based on the feature vector VA corrected by the correction unit F3. In this way, the second trained model LM2 differs from the first trained model LM1 in that it includes the corrector F3.
 演算回路34は、生成システム3の動作を制御する回路である。演算回路34は、入出力装置31及び通信装置32に接続され、記憶装置33にアクセス可能である。演算回路34は、例えば、1以上のプロセッサ(マイクロプロセッサ)と1以上のメモリとを含むコンピュータシステムにより実現され得る。1以上のプロセッサが(1以上のメモリ又は記憶装置33に記憶された)プログラムを実行することで、演算回路34としての機能を実現する。プログラムは、ここでは記憶装置33に予め記録されているが、インターネット等の電気通信回線を通じて、又はメモリカード等の非一時的な記録媒体に記録されて提供されてもよい。 The arithmetic circuit 34 is a circuit that controls the operation of the generation system 3. The arithmetic circuit 34 is connected to the input/output device 31 and the communication device 32 and can access the storage device 33 . Arithmetic circuit 34 may be realized by, for example, a computer system including one or more processors (microprocessors) and one or more memories. One or more processors execute a program (stored in one or more memories or storage devices 33) to realize the functions of the arithmetic circuit 34. FIG. Although the program is pre-recorded in the storage device 33 here, it may be provided through an electric communication line such as the Internet or recorded in a non-temporary recording medium such as a memory card.
 演算回路34は、第2学習済みモデルLM2の生成を行う。より詳細には、演算回路34は、評価情報D1に基づいて第1学習済みモデルLM1から第2学習済みモデルLM2を生成する。演算回路34は、例えば、図8に示す生成方法を実行する。図8は、生成システム3が実行する生成方法の一例のフローチャートである。図8の生成方法は、決定処理S21と、生成処理S22とを含む。 The arithmetic circuit 34 generates the second trained model LM2. More specifically, the arithmetic circuit 34 generates the second trained model LM2 from the first trained model LM1 based on the evaluation information D1. The arithmetic circuit 34 executes, for example, the generation method shown in FIG. FIG. 8 is a flow chart of an example of a generation method executed by the generation system 3. FIG. The generation method of FIG. 8 includes a determination process S21 and a generation process S22.
 決定処理S21は、評価情報D1に基づいて特徴ベクトルの複数の成分それぞれの有効度を決定する。有効度は、例えば、特徴ベクトルの対応する成分に乗算される。有効度は、特徴ベクトルの成分が対象の特徴を重視している度合いにより決定される。特徴ベクトルの成分が、対象の特徴を重視しているかどうかの判断には評価情報D1が利用される。本実施の形態では、評価情報D1は、第2抽出処理S132によって得られる第2集合を示す。例えば、第2集合に含まれる成分については有効度が「1」に設定され、第2集合に含まれない成分については有効度が「0」に設定される。有効度が「1」ということは、成分を使用することを意味し、有効度が「0」ということは、成分を使用しないことを意味する。 The decision processing S21 decides the effectiveness of each of the plurality of components of the feature vector based on the evaluation information D1. The validity is multiplied by the corresponding component of the feature vector, for example. Effectiveness is determined by the degree to which the components of the feature vector emphasize the feature of interest. The evaluation information D1 is used to determine whether the component of the feature vector emphasizes the feature of interest. In this embodiment, the evaluation information D1 indicates the second set obtained by the second extraction processing S132. For example, the validity level is set to "1" for the components included in the second set, and the validity level is set to "0" for the components not included in the second set. An efficacy of "1" means that the component is used, and an efficacy of "0" means that the component is not used.
 生成処理S22は、第1学習済みモデルLM1を、入力された対象画像から抽出された特徴ベクトルVを決定処理S21で決定された特徴ベクトルVの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルVAに基づいて対象物に関する推論の結果を出力するように変更することで、第1学習済みモデルLM1から第2学習済みモデルLM2を生成する。本実施の形態では、生成処理S22は、第1学習済みモデルLM1の特徴抽出部F1と判定部F2との間に特徴抽出部F1により抽出された特徴ベクトルVを補正する補正部F3を付加して、判定部F2が補正部F3による補正後の特徴ベクトルVAに基づいて対象物に関する推論の結果を出力するように変更することで、第1学習済みモデルLM1から第2学習済みモデルLM2を生成する。生成処理S22は、追加学習をすることなしに、学習済みモデルを生成する。補正部F3は、決定処理S21で決定された特徴ベクトルの複数の成分それぞれの有効度に基づいて補正する。例えば、特徴ベクトルの成分v1、v2、v3、v4、v5が0.7、0.4、0.2、0.8、0.3であり、成分v1、v2、v3、v4、v5の有効度が1、1、0、1、0であるとする。この場合、補正後の特徴ベクトルの成分v1、v2、v3、v4、v5は、0.7、0.4、0.0、0.8、0.0となる。 The generation processing S22 corrects the feature vector V extracted from the input target image in the first trained model LM1 based on the effectiveness of each of the plurality of components of the feature vector V determined in the determination processing S21. A second trained model LM2 is generated from the first trained model LM1 by modifying the obtained corrected feature vector VA so as to output an inference result regarding the object. In the present embodiment, the generation processing S22 adds a correction unit F3 that corrects the feature vector V extracted by the feature extraction unit F1 between the feature extraction unit F1 of the first trained model LM1 and the determination unit F2. Then, the determination unit F2 is changed to output the result of inference regarding the object based on the feature vector VA corrected by the correction unit F3, thereby generating the second trained model LM2 from the first trained model LM1. do. The generation process S22 generates a trained model without additional learning. The correction unit F3 corrects based on the effectiveness of each of the plurality of components of the feature vector determined in the determination processing S21. For example, the feature vector components v1, v2, v3, v4, v5 are 0.7, 0.4, 0.2, 0.8, 0.3, and the valid Let the degrees be 1,1,0,1,0. In this case, the corrected feature vector components v1, v2, v3, v4, and v5 are 0.7, 0.4, 0.0, 0.8, and 0.0.
 以上述べた生成システム3によれば、第1学習済みモデルLM1から追加学習なしに第2学習済みモデルLM2が得られる。第2学習済みモデルLM2では、補正部F3によって、特徴抽出部F1により抽出された特徴ベクトルVの複数の成分が、特徴ベクトルVの複数の成分それぞれの対象の特徴についての有効度に基づいて補正される。このように対象の特徴に関して有効度を設定することで、対象の特徴を重視している成分を対象の特徴を重視していない成分よりも強調することができて、第2学習済みモデルLM2の推論精度の向上が期待できる。例えば、スーツの多いオフィス又は作業服が多い工場等の類似服装の多い利用環境では、公開データで学習された第1学習済みモデルLM1では十分な性能が発揮できないことがある。例えば、第1学習済みモデルLM1が服の色を重視するように学習されていれば類似服装が多い環境下では人物を上手く見分けられない場合がある。そのような場合に、対象物の特徴から対象物同士で類似していない特徴(顔、体型、靴、小物の色等)についての評価情報D1を用いて、第2集合に含まれる成分を強調する補正部F3を第1学習済みモデルLM1に付加して第2学習済みモデルLM2を生成する。このような補正部F3を含む第2学習済みモデルLM2は、対象物同士で類似していない対象物固有の特徴を用いた推論が可能となるから、性能が向上する。 According to the generation system 3 described above, the second trained model LM2 is obtained from the first trained model LM1 without additional learning. In the second trained model LM2, the correction unit F3 corrects the plurality of components of the feature vector V extracted by the feature extraction unit F1 based on the effectiveness of each of the plurality of components of the feature vector V for the target feature. be done. By setting the effectiveness of the target feature in this way, it is possible to emphasize the component that emphasizes the target feature more than the component that does not emphasize the target feature. Improvement in inference accuracy can be expected. For example, in a usage environment with many similar clothes such as an office with many suits or a factory with many work clothes, the first trained model LM1 trained with public data may not exhibit sufficient performance. For example, if the first trained model LM1 is trained to emphasize the color of clothes, it may not be able to distinguish between people well in an environment where there are many similar clothes. In such a case, the components included in the second set are emphasized using the evaluation information D1 about the features (face, body shape, shoes, color of accessories, etc.) that are not similar between the objects from the characteristics of the objects. A second trained model LM2 is generated by adding the correction part F3 to the first trained model LM1. The second trained model LM2 including such a correction part F3 enables inference using features unique to objects that are not similar among objects, and thus improves performance.
 [1.1.1.3 推論システム]
 図9は、推論システム4のブロック図である。推論システム4は、学習済みモデルLM2を利用して対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する。推論システム4は、インタフェース(入出力装置41及び通信装置42)と、記憶装置43と、演算回路44とを備える。推論システム4は、例えば、1台の端末装置で実現される。端末装置としては、パーソナルコンピュータ(デスクトップコンピュータ、ラップトップコンピュータ)、携帯端末(スマートフォン、タブレット端末、ウェアラブル端末等)等が挙げられる。
[1.1.1.3 Inference system]
FIG. 9 is a block diagram of the inference system 4. As shown in FIG. The inference system 4 uses the trained model LM2 to output the result of inference regarding the object in response to the input of the target image in which the object appears. The inference system 4 includes an interface (input/output device 41 and communication device 42 ), a storage device 43 and an arithmetic circuit 44 . The inference system 4 is implemented by, for example, one terminal device. Examples of terminal devices include personal computers (desktop computers, laptop computers), mobile terminals (smartphones, tablet terminals, wearable terminals, etc.), and the like.
 入出力装置41は、ユーザからの情報の入力のための入力装置、及び、ユーザへの情報の出力のための出力装置としての機能を有する。つまり、入出力装置41は、推論システム4への情報の入力、及び、推論システム4からの情報の出力に利用される。入出力装置41は、1以上のヒューマン・マシン・インタフェースを備える。ヒューマン・マシン・インタフェースの例としては、キーボード、ポインティングデバイス(マウス、トラックボール等)、タッチパッド等の入力装置、ディスプレイ、スピーカ等の出力装置、タッチパネル等の入出力装置が挙げられる。 The input/output device 41 functions as an input device for inputting information from the user and as an output device for outputting information to the user. In other words, the input/output device 41 is used for inputting information to the inference system 4 and for outputting information from the inference system 4 . The input/output device 41 has one or more human-machine interfaces. Examples of human-machine interfaces include keyboards, pointing devices (mouse, trackball, etc.), input devices such as touch pads, output devices such as displays and speakers, and input/output devices such as touch panels.
 通信装置42は、外部装置又はシステムと通信可能に接続される。通信装置42は、通信ネットワーク52を通じた生成システム3との通信に用いられる。通信装置42は、1以上の通信インタフェースを備える。通信装置42は、通信ネットワーク52に接続可能であり、通信ネットワーク52を通じた通信を行う機能を有する。通信装置42は、所定の通信プロトコルに準拠している。所定の通信プロトコルは、周知の様々な有線及び無線通信規格から選択され得る。 The communication device 42 is communicably connected to an external device or system. The communication device 42 is used for communication with the generation system 3 through the communication network 52 . The communication device 42 has one or more communication interfaces. The communication device 42 is connectable to a communication network 52 and has a function of communicating through the communication network 52 . The communication device 42 complies with a predetermined communication protocol. The predetermined communication protocol may be selected from various known wired and wireless communication standards.
 記憶装置43は、演算回路44が利用する情報及び演算回路44で生成される情報を記憶するために用いられる。記憶装置43は、1以上のストレージ(非一時的な記憶媒体)を含む。ストレージは、例えば、ハードディスクドライブ、光学ドライブ、及びソリッドステートドライブ(SSD)のいずれであってもよい。また、ストレージは、内蔵型、外付け型、及びNAS型のいずれであってもよい。なお、推論システム4は、複数の記憶装置43を備えてよい。複数の記憶装置43には情報が分散されて記憶されてよい。 The storage device 43 is used to store information used by the arithmetic circuit 44 and information generated by the arithmetic circuit 44 . The storage device 43 includes one or more storages (non-temporary storage media). The storage can be, for example, hard disk drives, optical drives, and solid state drives (SSDs). Also, the storage may be any of built-in type, external type, and NAS type. Note that the inference system 4 may include a plurality of storage devices 43 . Information may be distributed and stored in a plurality of storage devices 43 .
 記憶装置43に記憶される情報は、第2学習済みモデルLM2を含む。図9では、記憶装置43が、第2学習済みモデルLM2を記憶している状態を示している。第2学習済みモデルLM2は常に記憶装置43に記憶されている必要はなく、演算回路44で必要とされるときに記憶装置43に記憶されていればよい。本実施の形態では、第2学習済みモデルLM2は、生成システム3から推論システム4に与えられる。 The information stored in the storage device 43 includes the second trained model LM2. FIG. 9 shows a state in which the storage device 43 stores the second trained model LM2. The second trained model LM2 need not always be stored in the storage device 43, and may be stored in the storage device 43 when it is required by the arithmetic circuit 44. FIG. In this embodiment, the second trained model LM2 is provided from the generation system 3 to the inference system 4. FIG.
 演算回路44は、推論システム4の動作を制御する回路である。演算回路44は、入出力装置41及び通信装置42に接続され、記憶装置43にアクセス可能である。演算回路44は、例えば、1以上のプロセッサ(マイクロプロセッサ)と1以上のメモリとを含むコンピュータシステムにより実現され得る。1以上のプロセッサが(1以上のメモリ又は記憶装置43に記憶された)プログラムを実行することで、演算回路44としての機能を実現する。プログラムは、ここでは記憶装置43に予め記録されているが、インターネット等の電気通信回線を通じて、又はメモリカード等の非一時的な記録媒体に記録されて提供されてもよい。 The arithmetic circuit 44 is a circuit that controls the operation of the inference system 4 . The arithmetic circuit 44 is connected to the input/output device 41 and the communication device 42 and can access the storage device 43 . Arithmetic circuit 44 may be realized by, for example, a computer system including one or more processors (microprocessors) and one or more memories. One or more processors execute a program (stored in one or more memories or storage devices 43) to realize the functions of the arithmetic circuit 44. FIG. Although the program is pre-recorded in the storage device 43 here, it may be provided through an electric communication line such as the Internet or recorded in a non-temporary recording medium such as a memory card.
 演算回路44は、第2学習済みモデルLM2を用いて推論を行う。演算回路44は、例えば、図10に示す推論方法を実行する。図10は、推論システム4が実行する推論方法の一例のフローチャートである。図10の推論方法は、取得処理S31と、推論処理S32とを含む。 The arithmetic circuit 44 makes an inference using the second trained model LM2. The arithmetic circuit 44 executes, for example, the inference method shown in FIG. FIG. 10 is a flow chart of an example of an inference method executed by the inference system 4 . The inference method of FIG. 10 includes acquisition processing S31 and inference processing S32.
 取得処理S31は、所定の対象画像を取得する。所定の対象画像は、推論システム4での推論の対象となる対象物が写る対象画像である。取得処理S31は、例えば、入出力装置41により所定の対象画像を取得する。取得処理S31は、例えば、入出力装置41により所定の対象画像の入力のための画面を提示し、ユーザは画面の指示にしたがって所定の対象画像を入力することが可能である。所定の対象画像の入力は、外部装置から所定の対象画像を推論システム4に入力することだけではなく、推論システム4が記憶している画像から所定の対象画像として使用する画像を特定することも含んでよい。 Acquisition processing S31 acquires a predetermined target image. The predetermined target image is a target image in which an object to be inferred by the inference system 4 is shown. Acquisition processing S31 acquires a predetermined target image by the input/output device 41, for example. In the acquisition process S31, for example, a screen for inputting a predetermined target image is presented by the input/output device 41, and the user can input the predetermined target image according to instructions on the screen. The predetermined target image is input not only by inputting the predetermined target image from an external device to the inference system 4, but also by specifying an image to be used as the predetermined target image from the images stored in the inference system 4. may contain.
 推論処理S32は、取得処理S31で取得した所定の対象画像を記憶装置43に記憶された第2学習済みモデルLM2に入力して所定の対象画像に写る対象物に関する推論の結果を取得する。本実施の形態では、推論処理S32の結果は、取得処理S31で取得した所定の対象画像に写る対象物が所定の対象物と一致するかどうかを示す。 The inference processing S32 inputs the predetermined target image acquired in the acquisition processing S31 to the second trained model LM2 stored in the storage device 43, and acquires the result of inference regarding the object appearing in the predetermined target image. In the present embodiment, the result of the inference processing S32 indicates whether or not the target appearing in the predetermined target image acquired in the acquisition processing S31 matches the predetermined target.
 以上述べた推論システム4によれば、第2学習済みモデルLM2を用いて推論が実行される。第2学習済みモデルLM2では、補正部F3によって、特徴抽出部F1により抽出された特徴ベクトルの複数の成分が、特徴ベクトルの複数の成分それぞれの対象の特徴についての有効度に基づいて補正される。このように対象の特徴に関して有効度を設定することで、対象の特徴を重視している成分を対象の特徴を重視していない成分よりも強調することができて、第2学習済みモデルLM2の推論精度の向上が期待できる。 According to the inference system 4 described above, inference is executed using the second trained model LM2. In the second trained model LM2, the correction unit F3 corrects the plurality of components of the feature vector extracted by the feature extraction unit F1 based on the effectiveness of each of the plurality of components of the feature vector for the target feature. . By setting the effectiveness of the target feature in this way, it is possible to emphasize the component that emphasizes the target feature more than the component that does not emphasize the target feature. Improvement in inference accuracy can be expected.
 [1.1.2 効果等]
 以上述べた評価システム2は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する学習済みモデルLM1を記憶する記憶装置23と、学習済みモデルLM1の評価を行う演算回路24とを備える。学習済みモデルLM1は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。演算回路24は、第1取得処理S11と、第2取得処理S12と、評価処理S13とを実行する。第1取得処理S11は、学習済みモデルLM1に第1対象物が写る第1対象画像を入力して第1対象物に対応する第1特徴ベクトルを取得する。第2取得処理S12は、学習済みモデルLM1に第1対象物とは所定の特徴が異なる第2対象物が写る第2対象画像を入力して第2対象物に対応する第2特徴ベクトルを取得する。評価処理S13は、第1特徴ベクトルと第2特徴ベクトルとの比較に基づいて所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価をする。
[1.1.2 Effects, etc.]
The evaluation system 2 described above includes a storage device 23 that stores a trained model LM1 that outputs an inference result regarding an object in response to an input of a target image in which the object is captured, and an arithmetic circuit that evaluates the trained model LM1. 24. The trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The arithmetic circuit 24 executes a first acquisition process S11, a second acquisition process S12, and an evaluation process S13. A first acquisition process S11 acquires a first feature vector corresponding to the first target by inputting a first target image in which the first target is shown to the trained model LM1. A second acquisition process S12 acquires a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model LM1. do. The evaluation process S13 evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison of the first feature vector and the second feature vector.
 評価システム2によれば、学習済みモデルLM1について、所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価が得られる。所定の特徴は、例えば、人であれば、頭身、体型、髪の色、服の色、靴の色、インナーシャツの色等が挙げられる。そのため、特徴ベクトルにおいて所定の特徴に対して有効な成分を特定することができる。特徴ベクトルの複数の成分のうち所定の特徴に対して有効な成分を用いた推論が可能になり、推論精度の向上が期待できる。さらに、特徴ベクトルの複数の成分のうち所定の特徴に対して有効な成分を用いるようにするだけでよく、学習済みモデル自体に所定の特徴を重視するような追加学習をする必要はない。したがって、評価システム2は、追加学習をしなくても推論精度の向上を可能にする。 According to the evaluation system 2, for the trained model LM1, an evaluation of the change in each of the multiple components of the feature vector with respect to the change in the predetermined feature is obtained. For example, in the case of a person, the predetermined characteristics include head and body, body type, hair color, clothing color, shoe color, inner shirt color, and the like. Therefore, it is possible to identify effective components for a given feature in the feature vector. Inference can be made using a component effective for a predetermined feature among a plurality of components of a feature vector, and an improvement in inference accuracy can be expected. Furthermore, it is only necessary to use a component that is effective for a predetermined feature among a plurality of components of the feature vector, and there is no need to perform additional learning in which a predetermined feature is emphasized in the trained model itself. Therefore, the evaluation system 2 enables improvement of inference accuracy without additional learning.
 また、評価システム2において、評価処理S13は、特徴ベクトルの複数の成分から第1特徴ベクトルでの値が閾値以上である成分を抽出する第1抽出処理S131と、第1抽出処理S131で抽出された成分のうち第1特徴ベクトルでの値と第2特徴ベクトルでの値との差が所定値以上である成分を抽出する第2抽出処理S132とを含む。この構成は、特徴ベクトルの複数の成分それぞれの変化の評価の精度の向上を可能にする。 In the evaluation system 2, the evaluation process S13 includes a first extraction process S131 for extracting a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector, and a second extraction process S132 for extracting a component for which the difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value among the components obtained. This configuration allows for improved accuracy in evaluating changes in each of the multiple components of the feature vector.
 また、評価システム2において、閾値は、特徴ベクトルの複数の成分の第1特徴ベクトルでの値の代表値に基づいて設定される。この構成は、特徴ベクトルの複数の成分それぞれの変化の評価の精度の向上を可能にする。 Also, in the evaluation system 2, the threshold is set based on the representative value of the values in the first feature vector of the plurality of components of the feature vector. This configuration allows for improved accuracy in evaluating changes in each of the multiple components of the feature vector.
 また、評価システム2において、所定の特徴は、対象物の色に関する特徴及び対象物の形状に関する特徴の少なくとも1つを含む。この構成は、推論精度の向上を可能にする。 In addition, in the evaluation system 2, the predetermined features include at least one of features related to the color of the object and features related to the shape of the object. This configuration allows for improved inference accuracy.
 また、評価システム2において、対象物の色に関する特徴は、色相、明度、彩度、及びコントラストを含む。対象物の形状に関する特徴は、対象物の縦横比、対象物の頭身、及び対象物の体型を含む。この構成は、推論精度の向上を可能にする。 Also, in the evaluation system 2, the features related to the color of the object include hue, brightness, saturation, and contrast. Features relating to the shape of the object include the aspect ratio of the object, the head and body of the object, and the body shape of the object. This configuration allows for improved inference accuracy.
 また、評価システム2において、推論の結果は、対象画像に写る対象物が特定の対象物と一致するかどうかを示す。この構成は、対象画像に写る対象物が特定の対象物と一致するかどうかについての推論精度の向上を可能にする。 In addition, in the evaluation system 2, the inference result indicates whether or not the object shown in the target image matches the specific object. This configuration enables improved inference accuracy as to whether an object appearing in the target image matches a specific object.
 評価システム2は、以下の方法(評価方法)を実行しているといえる。すなわち、評価方法は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する学習済みモデルLM1の評価を行う。学習済みモデルLM1は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。評価方法は、第1取得処理S11と、第2取得処理S12と、評価処理S13とを含む。第1取得処理S11は、学習済みモデルLM1に第1対象物が写る第1対象画像を入力して第1対象物に対応する第1特徴ベクトルを取得する。第2取得処理S12は、学習済みモデルLM1に第1対象物とは所定の特徴が異なる第2対象物が写る第2対象画像を入力して第2対象物に対応する第2特徴ベクトルを取得する。評価処理S13は、第1特徴ベクトルと第2特徴ベクトルとの比較に基づいて所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価をする。この構成は、追加学習をしなくても推論精度の向上を可能にする。 It can be said that the evaluation system 2 executes the following method (evaluation method). That is, the evaluation method evaluates the trained model LM1 that outputs the result of inference regarding the object in response to the input of the target image in which the object is captured. The trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation method includes a first acquisition process S11, a second acquisition process S12, and an evaluation process S13. A first acquisition process S11 acquires a first feature vector corresponding to the first target by inputting a first target image in which the first target is shown to the trained model LM1. A second acquisition process S12 acquires a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model LM1. do. The evaluation process S13 evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison of the first feature vector and the second feature vector. This configuration allows for improved inference accuracy without additional learning.
 評価システム2は、演算回路24を利用して実現されている。つまり、評価システム2が実行する方法(評価方法)は、演算回路24がプログラムを実行することにより実現され得る。このプログラムは、演算回路24に、上記の評価方法を実行させるためのコンピュータプログラムである。この構成は、追加学習をしなくても推論精度の向上を可能にする。 The evaluation system 2 is implemented using an arithmetic circuit 24. That is, the method (evaluation method) executed by the evaluation system 2 can be realized by the arithmetic circuit 24 executing the program. This program is a computer program for causing the arithmetic circuit 24 to execute the evaluation method described above. This configuration allows for improved inference accuracy without additional learning.
 以上述べた生成システム3は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する第1学習済みモデルLM1と第1学習済みモデルLM1の評価情報D1とを記憶する記憶装置33と、評価情報D1に基づいて第1学習済みモデルLM1から第2学習済みモデルLM2を生成する演算回路34とを備える。第1学習済みモデルLM1は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。評価情報D1は、対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。演算回路34は、決定処理S21と、生成処理S22とを実行する。決定処理S21は、1以上の所定の特徴のうちの対象の特徴について評価情報D1に基づいて特徴ベクトルの複数の成分それぞれの有効度を決定する。生成処理S22は、第1学習済みモデルLM1を、入力された対象画像から抽出された特徴ベクトルを決定処理S21で決定された特徴ベクトルの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように変更することで、第1学習済みモデルLM1から第2学習済みモデルLM2を生成する。 The generation system 3 described above has a memory that stores the first trained model LM1 that outputs the result of inference about the object in response to the input of the target image that shows the object, and the evaluation information D1 of the first trained model LM1. A device 33 and an arithmetic circuit 34 for generating a second trained model LM2 from the first trained model LM1 based on the evaluation information D1. The first trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The arithmetic circuit 34 executes a determination process S21 and a generation process S22. The determination processing S21 determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information D1 for the target feature among the one or more predetermined features. The generating process S22 obtains the first trained model LM1 by correcting the feature vectors extracted from the input target image based on the effectiveness of each of the plurality of components of the feature vectors determined in the determining process S21. A second trained model LM2 is generated from the first trained model LM1 by changing the output of the inference result regarding the object based on the corrected feature vector.
 生成システム3は、第1学習済みモデルLM1に、特徴ベクトルを特徴ベクトルの複数の成分それぞれの有効度に基づいて補正する処理を付加することで、第1学習済みモデルLM1から第2学習済みモデルLM2を生成する。これによって、第2学習済みモデルLM2では、特徴ベクトルの複数の成分のうち所定の特徴に対して有効な成分を用いた推論が可能になり、推論精度の向上が期待できる。さらに、第2学習済みモデルLM2の生成にあたっては、第1学習済みモデルLM1に上記のような処理を追加するだけでよく、第2学習済みモデルLM2自体に所定の特徴を重視するような追加学習をする必要はない。したがって、生成システム3は、追加学習をしなくても推論精度の向上を可能にする。 The generation system 3 adds to the first trained model LM1 a process of correcting the feature vector based on the effectiveness of each of the plurality of components of the feature vector, thereby converting the first trained model LM1 to the second trained model Generate LM2. As a result, in the second trained model LM2, it becomes possible to make an inference using an effective component for a predetermined feature among the plurality of components of the feature vector, and an improvement in inference accuracy can be expected. Furthermore, in generating the second trained model LM2, it is sufficient to add the above-described processing to the first trained model LM1, and additional learning that emphasizes a predetermined feature in the second trained model LM2 itself is performed. you don't have to Therefore, the generation system 3 enables improvement of inference accuracy without additional learning.
 生成システム3は、以下の方法(生成方法)を実行しているといえる。すなわち、生成方法は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する第1学習済みモデルLM1から、第1学習済みモデルLM1の評価情報D1に基づいて、第2学習済みモデルLM2を生成する。第1学習済みモデルLM1は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。評価情報D1は、対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。生成方法は、決定処理S21と、生成処理S22とを含む。決定処理S21は、複数の所定の特徴のうちの対象の特徴について評価情報に基づいて特徴ベクトルの複数の成分の有効度を決定する。生成処理S22は、第1学習済みモデルLM1を、入力された対象画像から抽出された特徴ベクトルを決定処理S21で決定された特徴ベクトルの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように変更することで、第1学習済みモデルLM1から第2学習済みモデルLM2を生成する。この構成は、追加学習をしなくても推論精度の向上を可能にする。 It can be said that the generation system 3 executes the following method (generation method). That is, the generation method is to generate a second Generate a trained model LM2. The first trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The generation method includes determination processing S21 and generation processing S22. The decision processing S21 decides the effectiveness of the plurality of components of the feature vector based on the evaluation information for the target feature among the plurality of predetermined features. The generating process S22 obtains the first trained model LM1 by correcting the feature vectors extracted from the input target image based on the effectiveness of each of the plurality of components of the feature vectors determined in the determining process S21. A second trained model LM2 is generated from the first trained model LM1 by changing the output of the inference result regarding the object based on the corrected feature vector. This configuration allows for improved inference accuracy without additional learning.
 生成システム3は、演算回路34を利用して実現されている。つまり、生成システム3が実行する方法(生成方法)は、演算回路34がプログラムを実行することにより実現され得る。このプログラムは、演算回路34に、上記の生成方法を実行させるためのコンピュータプログラムである。この構成は、追加学習をしなくても推論精度の向上を可能にする。 The generation system 3 is implemented using an arithmetic circuit 34. That is, the method (generation method) executed by the generation system 3 can be realized by the arithmetic circuit 34 executing the program. This program is a computer program for causing the arithmetic circuit 34 to execute the above generation method. This configuration allows for improved inference accuracy without additional learning.
 以上述べた推論システム4は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する学習済みモデルLM2を記憶する記憶装置43と、演算回路44とを備える。学習済みモデルLM2は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルを特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。特徴ベクトルの複数の成分それぞれの有効度は、対象物における所定の特徴の変化に対する特徴ベクトルの複数の成分の変化に基づいて設定される。演算回路24は、取得処理S31と、推論処理S32とを実行する。取得処理S31は、所定の対象画像を取得する。推論処理S32は、取得処理S31で取得した所定の対象画像を記憶装置43に記憶された学習済みモデルLM2に入力して所定の対象画像に写る対象物に関する推論の結果を取得する。 The inference system 4 described above includes a storage device 43 that stores a learned model LM2 that outputs an inference result regarding an object in response to an input of a target image in which the object appears, and an arithmetic circuit 44. The trained model LM2 extracts the feature vector of the object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and converts the corrected feature vector to configured to output a result of an inference about the object based on the The effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object. The arithmetic circuit 24 executes an acquisition process S31 and an inference process S32. Acquisition processing S31 acquires a predetermined target image. The inference processing S32 inputs the predetermined target image acquired in the acquisition processing S31 to the learned model LM2 stored in the storage device 43, and acquires the result of inference regarding the object appearing in the predetermined target image.
 推論システム4が用いる学習済みモデルLM2は、特徴ベクトルを特徴ベクトルの複数の成分それぞれの有効度に基づいて補正する処理を含んでいる。これによって、学習済みモデルLM2では、特徴ベクトルの複数の成分のうち所定の特徴に対して有効な成分を用いた推論が可能になり、推論精度の向上が期待できる。さらに、特徴ベクトルの複数の成分それぞれの有効度は、対象物における所定の特徴の変化に対する特徴ベクトルの複数の成分の変化に基づいて設定される。そのため、学習済みモデル自体に所定の特徴を重視するような追加学習をする必要はない。したがって、推論システム4は、追加学習をしなくても推論精度の向上を可能にする。 The learned model LM2 used by the inference system 4 includes processing for correcting the feature vector based on the effectiveness of each of the multiple components of the feature vector. As a result, in the trained model LM2, it becomes possible to make an inference using an effective component for a predetermined feature among the plurality of components of the feature vector, and an improvement in inference accuracy can be expected. Further, the effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object. Therefore, it is not necessary to perform additional learning such as emphasizing predetermined features in the trained model itself. Therefore, the inference system 4 enables improvement in inference accuracy without additional learning.
 推論システム4は、以下の方法(推論方法)を実行しているといえる。すなわち、推論方法は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する学習済みモデルLM2を用いる。学習済みモデルLM2は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルを特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。特徴ベクトルの複数の成分それぞれの有効度は、対象物における所定の特徴の変化に対する特徴ベクトルの複数の成分の変化に基づいて設定される。推論方法は、取得処理S31と、推論処理S32とを含む。取得処理S31は、所定の対象画像を取得する。推論処理S32は、取得処理S31で取得した所定の対象画像を学習済みモデルLM2に入力して所定の対象画像に写る対象物に関する推論の結果を取得する。この構成は、追加学習をしなくても推論精度の向上を可能にする。 It can be said that the inference system 4 executes the following method (inference method). That is, the inference method uses a trained model LM2 that outputs an inference result regarding an object in response to an input of a target image in which the object appears. The trained model LM2 extracts the feature vector of the object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and converts the corrected feature vector to configured to output a result of an inference about the object based on the The effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object. The inference method includes an acquisition process S31 and an inference process S32. Acquisition processing S31 acquires a predetermined target image. The inference processing S32 inputs the predetermined target image acquired in the acquisition processing S31 to the learned model LM2 and acquires the result of inference regarding the target object appearing in the predetermined target image. This configuration allows for improved inference accuracy without additional learning.
 推論システム4は、演算回路44を利用して実現されている。つまり、推論システム4が実行する方法(生成方法)は、演算回路44がプログラムを実行することにより実現され得る。このプログラムは、演算回路44に、上記の推論方法を実行させるためのコンピュータプログラムである。この構成は、追加学習をしなくても推論精度の向上を可能にする。 The inference system 4 is implemented using an arithmetic circuit 44. That is, the method (generation method) executed by the inference system 4 can be realized by the arithmetic circuit 44 executing the program. This program is a computer program for causing the arithmetic circuit 44 to execute the inference method described above. This configuration allows for improved inference accuracy without additional learning.
 以上述べた学習済みモデルLM2は、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する。学習済みモデルLM2は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された特徴ベクトルを特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように構成される。特徴ベクトルの複数の成分それぞれの有効度は、対象物における所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化に基づいて設定される。この構成は、追加学習をしなくても推論精度の向上を可能にする。 The learned model LM2 described above outputs the result of inference regarding the object in response to the input of the target image in which the object is captured. The trained model LM2 extracts the feature vector of the object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and converts the corrected feature vector to configured to output a result of an inference about the object based on the The effectiveness of each of the multiple components of the feature vector is set based on the change of each of the multiple components of the feature vector with respect to the change of the predetermined feature in the object. This configuration allows for improved inference accuracy without additional learning.
 [1.2.実施の形態2]
 [1.2.1 構成]
 図11は、実施の形態2の情報処理システム1Aの構成例のブロック図である。図11の情報処理システム1Aは、図1の情報処理システム1と同様に、対象物の再照合を実行可能とする。図11の情報処理システム1Aは、予め用意された学習済みモデルから、再照合を行う環境下に適合する学習済みモデルを新たに生成して、新たに生成された学習済みモデルによる再照合を可能とするために用いられる。
[1.2. Embodiment 2]
[1.2.1 Configuration]
FIG. 11 is a block diagram of a configuration example of an information processing system 1A according to the second embodiment. The information processing system 1A of FIG. 11 enables execution of re-matching of the target object in the same manner as the information processing system 1 of FIG. The information processing system 1A of FIG. 11 newly generates a trained model suitable for the environment in which re-matching is performed from a trained model prepared in advance, and enables re-matching using the newly generated trained model. is used to
 図11の情報処理システム1Aは、評価システム2Aと、生成システム3Aと、推論システム4とを備える。 The information processing system 1A of FIG. 11 includes an evaluation system 2A, a generation system 3A, and an inference system 4.
 図12は、評価システム2Aのブロック図である。評価システム2Aは、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する予め用意された第1学習済みモデルLM1の評価を行う。評価システム2Aは、インタフェース(入出力装置21及び通信装置22)と、記憶装置23Aと、演算回路24Aとを備える。 FIG. 12 is a block diagram of the evaluation system 2A. The evaluation system 2A evaluates a first trained model LM1 prepared in advance that outputs an inference result regarding an object in response to an input of a target image showing the object. The evaluation system 2A includes an interface (input/output device 21 and communication device 22), a storage device 23A, and an arithmetic circuit 24A.
 記憶装置23Aに記憶される情報は、第1学習済みモデルLM1と、データベースDB1と、評価情報D1Aとを含む。図12では、記憶装置23Aが、第1学習済みモデルLM1と、データベースDB1と、評価情報D1との全てを記憶している状態を示している。第1学習済みモデルLM1と、データベースDB1と、評価情報D1Aとは常に記憶装置23Aに記憶されている必要はなく、演算回路24Aで必要とされるときに記憶装置23Aに記憶されていればよい。 The information stored in the storage device 23A includes the first learned model LM1, the database DB1, and the evaluation information D1A. FIG. 12 shows a state in which the storage device 23A stores all of the first trained model LM1, the database DB1, and the evaluation information D1. The first trained model LM1, the database DB1, and the evaluation information D1A need not always be stored in the storage device 23A, and may be stored in the storage device 23A when required by the arithmetic circuit 24A. .
 データベースDB1は、第1学習済みモデルLM1の評価に用いるデータを含む。データベースDB1は、複数の第1対象画像と、複数の第2対象画像とを含む。一つの第1対象物に対しては、第1対象物とは互いに異なる複数の所定の特徴がそれぞれ異なる複数の第2対象物が存在し得る。本実施の形態では、データベースDB1には、互いに異なる複数の第1対象物がそれぞれ写る複数の第1対象画像が登録される。データベースDB1には、互いに異なる複数の所定の特徴毎に複数の第1対象物とはそれぞれ所定の特徴が異なる複数の第2対象物が写る複数の第2対象画像が登録される。なお、データベースDB1に登録される画像の枚数は、例えば、第1学習済みモデルLM1の追加学習によって再利用モデルを生成するのに必要な枚数に比べて少ない。 The database DB1 contains data used for evaluating the first trained model LM1. Database DB1 includes a plurality of first target images and a plurality of second target images. For one first object, there can be a plurality of second objects that differ from the first object in a plurality of predetermined characteristics. In the present embodiment, a plurality of first target images each showing a plurality of different first targets are registered in the database DB1. In the database DB1, a plurality of second target images are registered in which a plurality of second target objects having predetermined features different from the plurality of first target objects are captured for each of a plurality of predetermined features different from each other. Note that the number of images registered in the database DB1 is smaller than, for example, the number of images required to generate a reuse model by additional learning of the first trained model LM1.
 演算回路24Aは、第1学習済みモデルLM1の評価を行う。演算回路24Aは、例えば、図13に示す評価方法を実行する。図13は、評価システム2Aが実行する評価方法の一例のフローチャートである。 The arithmetic circuit 24A evaluates the first trained model LM1. 24 A of arithmetic circuits perform the evaluation method shown in FIG. 13, for example. FIG. 13 is a flow chart of an example of an evaluation method executed by the evaluation system 2A.
 図13の評価方法は、第1取得処理S11Aと、第2取得処理S12Aと、評価処理S13Aとを含む。図14は、図13の評価方法の概略的な説明図である。 The evaluation method of FIG. 13 includes a first acquisition process S11A, a second acquisition process S12A, and an evaluation process S13A. 14 is a schematic illustration of the evaluation method of FIG. 13. FIG.
 第1取得処理S11Aは、図14に示すように、第1学習済みモデルLM1(特に特徴抽出部F1)に第1対象物71が写る第1対象画像61を入力して第1対象物71に対応する第1特徴ベクトルV1を取得する。第1対象画像61は、例えば、データベースDB1から取得される。本実施の形態では、データベースDB1には、互いに異なる複数の第1対象物71がそれぞれ写る複数の第1対象画像61が登録される。第1取得処理S11Aは、互いに異なる複数の第1対象物71がそれぞれ写る複数の第1対象画像61を第1学習済みモデルLM1にそれぞれ入力して複数の第1対象物71にそれぞれ対応する複数の第1特徴ベクトルV1を取得する。第1取得処理S11Aによって、互いに異なる複数の第1対象物71にそれぞれ対応する複数の第1特徴ベクトルV1が得られる。 In the first acquisition processing S11A, as shown in FIG. 14, the first target image 61 in which the first target object 71 is captured is input to the first trained model LM1 (especially the feature extraction unit F1), and the first target image 61 is Obtain the corresponding first feature vector V1. The first target image 61 is acquired from the database DB1, for example. In the present embodiment, a plurality of first target images 61 each showing a plurality of different first targets 71 are registered in the database DB1. In the first acquisition processing S11A, a plurality of first target images 61 each including a plurality of first targets 71 different from each other are input to the first trained model LM1, and a plurality of first target images 61 corresponding to the plurality of first targets 71 are obtained. obtain the first feature vector V1 of A plurality of first feature vectors V1 respectively corresponding to a plurality of first objects 71 different from each other are obtained by the first acquisition processing S11A.
 第2取得処理S12Aは、図14に示すように、第1学習済みモデルLM1(特に特徴抽出部F1)に第1対象物71とは所定の特徴が異なる第2対象物72が写る第2対象画像62を入力して第2対象物72に対応する第2特徴ベクトルV2を取得する。図14では、一例として、所定の特徴は、人の頭部の縦横比である。第1対象物71の頭部71aと、第2対象物72の頭部72aとは縦横比が異なっている。第1対象物71の服71bと、第2対象物72の服72bとは似たような服装であり、服の色が同じである。第2対象画像62は、例えば、データベースDB1から取得される。本実施の形態では、データベースDB1には、互いに異なる複数の所定の特徴毎に複数の第1対象物71とはそれぞれ所定の特徴が異なる複数の第2対象物72が写る複数の第2対象画像62が登録される。第2取得処理S12Aは、互いに異なる複数の所定の特徴の各々に関して、第1学習済みモデルLM1に第2対象画像62を入力して第2特徴ベクトルV2を取得する。第2取得処理S12は、複数の第1対象物71とはそれぞれ所定の特徴が異なる複数の第2対象物72が写る複数の第2対象画像62を第1学習済みモデルLM1にそれぞれ入力して複数の第2対象物72に対応する複数の第2特徴ベクトルV2を取得する。第2取得処理S12によって、第1対象物71の第1特徴ベクトルV1毎に、第1対象物71と複数の所定の特徴がそれぞれ異なる複数の第2対象物72の第2特徴ベクトルV2が得られる。 As shown in FIG. 14, the second acquisition process S12A is a second target object 72 in which a second target object 72 having a predetermined feature different from the first target object 71 appears in the first trained model LM1 (especially the feature extraction unit F1). A second feature vector V2 corresponding to the second object 72 is obtained by inputting the image 62 . In FIG. 14, as an example, the predetermined characteristic is the aspect ratio of a person's head. The head 71a of the first object 71 and the head 72a of the second object 72 have different aspect ratios. The clothes 71b of the first object 71 and the clothes 72b of the second object 72 are similar in clothes and have the same color. The second target image 62 is acquired from the database DB1, for example. In this embodiment, the database DB1 stores a plurality of second target images in which a plurality of second target objects 72 having predetermined features different from the plurality of first target objects 71 are captured for each of a plurality of predetermined features different from each other. 62 is registered. A second acquisition process S12A acquires a second feature vector V2 by inputting the second target image 62 to the first trained model LM1 for each of a plurality of different predetermined features. In the second acquisition process S12, a plurality of second target images 62 including a plurality of second targets 72 having predetermined characteristics different from the plurality of first targets 71 are input to the first trained model LM1. A plurality of second feature vectors V2 corresponding to a plurality of second objects 72 are acquired. Through the second acquisition process S12, second feature vectors V2 of a plurality of second objects 72 different from the first target object 71 in a plurality of predetermined features are obtained for each first feature vector V1 of the first target object 71. be done.
 評価処理S13Aは、図14に示すように、第1特徴ベクトルV1と第2特徴ベクトルV2との比較に基づいて所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価をする。評価処理S13Aは、この評価の結果を示す評価情報D1を生成する。図13の評価処理S13Aは、第1抽出処理S131Aと、第2抽出処理S132Aと、演算処理S133Aとを含む。 As shown in FIG. 14, the evaluation process S13A evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison between the first feature vector V1 and the second feature vector V2. The evaluation process S13A generates evaluation information D1 indicating the result of this evaluation. The evaluation process S13A of FIG. 13 includes a first extraction process S131A, a second extraction process S132A, and an arithmetic process S133A.
 第1抽出処理S131Aは、特徴ベクトルの複数の成分から第1特徴ベクトルでの値が閾値以上である成分を抽出する。本実施の形態では、第1取得処理S11Aによって、互いに異なる複数の第1対象物にそれぞれ対応する複数の第1特徴ベクトルが得られる。そのため、第1抽出処理S131Aは、複数の第1対象画像の各々に関して特徴ベクトルの複数の成分から第1特徴ベクトルでの値が閾値以上である成分を抽出する。閾値は、特徴ベクトルの複数の成分の第1特徴ベクトルでの値の代表値に基づいて設定される。第1抽出処理S131Aによって、特徴ベクトルの複数の成分について、第1特徴ベクトルでの値が閾値以上である第1集合が得られる。 The first extraction process S131A extracts a component whose value in the first feature vector is equal to or greater than a threshold from multiple components of the feature vector. In the present embodiment, a plurality of first feature vectors respectively corresponding to a plurality of different first objects are obtained by the first acquisition processing S11A. Therefore, the first extraction process S131A extracts a component whose value in the first feature vector is equal to or greater than the threshold from the plurality of components of the feature vector for each of the plurality of first target images. The threshold is set based on a representative value of values in the first feature vector of the plurality of components of the feature vector. Through the first extraction processing S131A, a first set in which values in the first feature vector are equal to or greater than the threshold is obtained for a plurality of components of the feature vector.
 第2抽出処理S132Aは、第1抽出処理S131Aで抽出された成分のうち第1特徴ベクトルでの値と第2特徴ベクトルでの値との差が所定値以上である成分を抽出する。本実施の形態では、第2取得処理S12Aによって、第1対象物の第1特徴ベクトル毎に、第1対象物と複数の所定の特徴がそれぞれ異なる複数の第2対象物の第2特徴ベクトルが得られる。そのため、複数の所定の特徴がそれぞれ異なる複数の第2対象物の第2特徴ベクトル毎に、第1抽出処理S131Aで抽出された成分のうち第1特徴ベクトルでの値と第2特徴ベクトルでの値との差が所定値以上である成分が抽出される。第2抽出処理S132Aによって、第1抽出処理S131Aで抽出された成分について第1特徴ベクトルでの値と第2特徴ベクトルでの値との差が所定値以上である第2集合が得られる。第2集合は第1集合の部分集合である。 The second extraction process S132A extracts a component whose difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value among the components extracted in the first extraction process S131A. In the present embodiment, the second acquisition processing S12A obtains second feature vectors of a plurality of second targets that differ from the first target in a plurality of predetermined features for each first feature vector of the first target. can get. Therefore, for each second feature vector of a plurality of second objects having different predetermined features, the value of the first feature vector among the components extracted in the first extraction processing S131A and the value of the second feature vector A component whose difference from the value is equal to or greater than a predetermined value is extracted. The second extraction process S132A obtains a second set in which the difference between the values in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value for the components extracted in the first extraction process S131A. The second set is a subset of the first set.
 演算処理S133Aは、特徴ベクトルの複数の成分の各々に関して、所定の特徴の変化に対する反応割合として、第1抽出処理S131Aで抽出された回数に対する第2抽出処理S132Aで抽出された回数の割合を求める。第1抽出処理S131Aで抽出された回数は、第1集合に含まれる回数であり、第2抽出処理S132Aで抽出された回数は第2集合に含まれる回数である。例えば、特徴ベクトルの成分v1について、第1抽出処理S131Aで抽出された回数が100回であり、第2抽出処理S132Aで抽出された回数が10回であるとする。この場合、特徴ベクトルの成分v1の所定の特徴の変化に対する反応割合は、10/100=0.1である。演算処理S133Aによって、特徴ベクトルの成分毎に、複数の所定の特徴それぞれについての反応割合が求められる。これによって、特徴ベクトルのどの成分がどの所定の特徴に対してよく反応するかが容易に把握できる。 The arithmetic processing S133A obtains the ratio of the number of times extracted in the second extraction processing S132A to the number of times extracted in the first extraction processing S131A as a response rate to a change in a predetermined feature for each of the plurality of components of the feature vector. . The number of times extracted in the first extraction process S131A is the number of times included in the first set, and the number of times extracted in the second extraction process S132A is the number of times included in the second set. For example, assume that the number of extractions in the first extraction processing S131A is 100 and the number of extractions in the second extraction processing S132A is 10 for the component v1 of the feature vector. In this case, the response rate to a change in the predetermined feature of component v1 of the feature vector is 10/100=0.1. Through the arithmetic processing S133A, the reaction rate for each of the plurality of predetermined features is obtained for each component of the feature vector. This makes it easy to grasp which component of the feature vector responds well to which predetermined feature.
 評価情報D1Aは、評価処理S13Aにより生成される。評価情報D1Aは、対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。特に、本実施の形態では、評価情報D1Aは、特徴ベクトルの複数の成分の各々に関して、所定の特徴の変化に対する反応割合を示す。下記表1は、評価情報D1Aの一例である。表1において、所定の特徴は、色相、明度、コントラスト、縦横比、及び、頭身である。 The evaluation information D1A is generated by the evaluation processing S13A. The evaluation information D1A indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. In particular, in the present embodiment, the evaluation information D1A indicates the response rate to a change in a predetermined feature for each of the plurality of components of the feature vector. Table 1 below is an example of the evaluation information D1A. In Table 1, the predetermined features are hue, brightness, contrast, aspect ratio, and head-to-body.
Figure JPOXMLDOC01-appb-T000001
Figure JPOXMLDOC01-appb-T000001
 図15は、生成システム3Aのブロック図である。生成システム3Aは、第1学習済みモデルLM1から第2学習済みモデルLM2の生成を行う。特に、生成システム3Aは、評価システム2Aで生成される評価情報D1Aを利用する。評価情報D1Aは、対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。生成システム3Aは、所定の特徴についてより精度の高い推論結果が得られるように、第1学習済みモデルLM1から第2学習済みモデルLM2を生成する。生成システム3Aは、インタフェース(入出力装置31及び通信装置32)と、記憶装置33Aと、演算回路34Aとを備える。 FIG. 15 is a block diagram of the generating system 3A. The generation system 3A generates the second trained model LM2 from the first trained model LM1. In particular, the generation system 3A uses the evaluation information D1A generated by the evaluation system 2A. The evaluation information D1A indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The generation system 3A generates the second trained model LM2 from the first trained model LM1 so as to obtain a more accurate inference result for the predetermined feature. The generation system 3A includes an interface (input/output device 31 and communication device 32), a storage device 33A, and an arithmetic circuit 34A.
 記憶装置33Aに記憶される情報は、第1学習済みモデルLM1と、評価情報D1Aと、第2学習済みモデルLM2とを含む。第1学習済みモデルLM1と、評価情報D1Aと、第2学習済みモデルLM2とは常に記憶装置33Aに記憶されている必要はなく、演算回路34Aで必要とされるときに記憶装置33Aに記憶されていればよい。 The information stored in the storage device 33A includes the first learned model LM1, the evaluation information D1A, and the second learned model LM2. The first trained model LM1, the evaluation information D1A, and the second trained model LM2 need not always be stored in the storage device 33A, and are stored in the storage device 33A when required by the arithmetic circuit 34A. It is good if there is
 演算回路34Aは、第2学習済みモデルLM2の生成を行う。より詳細には、演算回路34Aは、評価情報D1Aに基づいて第1学習済みモデルLM1から第2学習済みモデルLM2を生成する。演算回路34Aは、例えば、図16に示す生成方法を実行する。図16は、生成システム3Aが実行する生成方法の一例のフローチャートである。図16の生成方法は、決定処理S21Aと、生成処理S22Aとを含む。 The arithmetic circuit 34A generates the second trained model LM2. More specifically, the arithmetic circuit 34A generates the second trained model LM2 from the first trained model LM1 based on the evaluation information D1A. 34 A of arithmetic circuits perform the production|generation method shown in FIG. 16, for example. FIG. 16 is a flow chart of an example of a generation method executed by the generation system 3A. The generation method of FIG. 16 includes determination processing S21A and generation processing S22A.
 決定処理S21Aは、1以上の所定の特徴のうちの対象の特徴について評価情報D1Aに基づいて特徴ベクトルの複数の成分それぞれの有効度を決定する。 The determination processing S21A determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information D1A for the target feature among the one or more predetermined features.
 対象の特徴は、1以上の所定の特徴から、第2学習済みモデルLM2の利用環境において第2学習済みモデルLM2の推論の結果に影響を与えるかどうかに基づいて選択される。例えば、第2学習済みモデルLM2の利用環境が、オフィス又は工場の場合、第2学習済みモデルLM2に入力される対象画像に写る対象物は同一又は類似の服装の人である可能性が高くなる。このような場合に、作業着の緑色又はスーツの黒色等の対象物の服の色という特徴は、複数の対象物に共通する特徴であり、第2学習済みモデルLM2の推論の結果に影響を与える可能性が低い。一方で、靴の色という特徴又は首元から見えるインナーシャツの色という特徴、顔のテクスチャという特徴、又は身に着けている小物という特徴は、服の色という特徴に比べれば、対象物に固有の特徴であり、第2学習済みモデルLM2の推論の結果に影響を与える可能性が高い。このような対象物に固有の特徴に関して有効度を設定することで、対象物に固有の特徴を重視している成分を対象物に固有の特徴を重視していない成分よりも強調することができて、第2学習済みモデルLM2の推論精度の向上が期待できる。対象の特徴は、複数の所定の特徴から、人が目視等により決めてもよいし自動的に決められてもよい。 A feature of interest is selected from one or more predetermined features based on whether it affects the result of inference of the second trained model LM2 in the usage environment of the second trained model LM2. For example, if the usage environment of the second trained model LM2 is an office or a factory, there is a high possibility that the objects appearing in the target images input to the second trained model LM2 are people wearing the same or similar clothes. . In such a case, the feature of the target's clothing color, such as the green of work clothes or the black of a suit, is a feature common to a plurality of targets, and does not affect the inference result of the second trained model LM2. less likely to give On the other hand, the feature of the color of shoes, the color of the inner shirt visible from the neck, the texture of the face, or the accessories worn are more specific to the object than the color of clothes. and is likely to affect the inference result of the second trained model LM2. By setting the validity level for such features unique to the object, it is possible to emphasize the components that emphasize the features unique to the object over the components that do not emphasize the characteristics unique to the object. Therefore, an improvement in the inference accuracy of the second trained model LM2 can be expected. The feature of the object may be determined by a human eye or the like from a plurality of predetermined features, or may be determined automatically.
 有効度は、例えば、特徴ベクトルの対応する成分に乗算される。有効度は、特徴ベクトルの成分が対象の特徴を重視している度合いにより決定される。特徴ベクトルの成分が、対象の特徴を重視しているかどうかの判断には評価情報D1Aが利用される。本実施の形態では、評価情報D1Aは、特徴ベクトルの複数の成分の各々に関して、対象の特徴の変化に対する反応割合を示す。例えば、成分の反応割合が基準値以上かどうかで、所定の特徴を重視しているかどうかが決定される。反応割合が基準値以上の成分については有効度が「1」に設定され、反応割合が基準値未満の成分については有効度が「0」に設定される。有効度が「1」ということは、成分を使用することを意味し、有効度が「0」ということは、成分を使用しないことを意味する。基準値は、固定値であってもよいが、第2学習済みモデルLM2の性能を考慮して設定されてよい。反応割合は0から1までの値であるから、基準値を0から1まで変化させることで、特徴ベクトルの複数の成分それぞれの有効度が変更される。よって、第2学習済みモデルLM2の性能が最もよくなるときの基準値によって特徴ベクトルの複数の成分それぞれの有効度を決定することができる。 For example, the effectiveness is multiplied by the corresponding component of the feature vector. Effectiveness is determined by the degree to which the components of the feature vector emphasize the feature of interest. The evaluation information D1A is used to determine whether the component of the feature vector emphasizes the feature of interest. In the present embodiment, the evaluation information D1A indicates the response rate to the change in the feature of interest for each of the plurality of components of the feature vector. For example, depending on whether the reaction ratio of the component is equal to or greater than a reference value, it is determined whether or not a given feature is emphasized. The degree of effectiveness is set to "1" for components whose reaction rate is equal to or greater than the reference value, and the degree of effectiveness is set to "0" for components whose reaction rate is less than the reference value. An efficacy of "1" means that the component is used, and an efficacy of "0" means that the component is not used. The reference value may be a fixed value, or may be set in consideration of the performance of the second trained model LM2. Since the reaction rate is a value between 0 and 1, changing the reference value from 0 to 1 changes the effectiveness of each of the plurality of components of the feature vector. Therefore, the effectiveness of each of the plurality of components of the feature vector can be determined by the reference value when the performance of the second trained model LM2 is the best.
 生成処理S22Aは、第1学習済みモデルLM1を、入力された対象画像から抽出された特徴ベクトルVを決定処理S21で決定された特徴ベクトルVの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルVAに基づいて対象物に関する推論の結果を出力するように変更することで、第1学習済みモデルLM1から第2学習済みモデルLM2を生成する。本実施の形態では、生成処理S22Aは、第1学習済みモデルLM1の特徴抽出部F1と判定部F2との間に特徴抽出部F1により抽出された特徴ベクトルを補正する補正部F3を付加して、判定部F2が補正部F3による補正後の特徴ベクトルに基づいて対象物に関する推論の結果を出力するように変更することで、第1学習済みモデルLM1から第2学習済みモデルLM2を生成する。生成処理S22Aは、追加学習をすることなしに、学習済みモデルを生成する。 The generation processing S22A corrects the feature vector V extracted from the input target image in the first trained model LM1 based on the effectiveness of each of the plurality of components of the feature vector V determined in the determination processing S21. A second trained model LM2 is generated from the first trained model LM1 by modifying the obtained corrected feature vector VA so as to output an inference result regarding the object. In the present embodiment, the generation processing S22A adds a correction unit F3 that corrects the feature vector extracted by the feature extraction unit F1 between the feature extraction unit F1 and the determination unit F2 of the first trained model LM1. , the second trained model LM2 is generated from the first trained model LM1 by changing the determination unit F2 to output the result of inference regarding the object based on the feature vector corrected by the correction unit F3. The generation processing S22A generates a trained model without additional learning.
 生成システム3Aでは、対象物の特徴から対象物同士で類似していない特徴(顔、体型、靴、小物の色等)を対象の特徴として選択して、対象の特徴への反応割合が高い成分を強調する補正部F3を第1学習済みモデルLM1に付加して第2学習済みモデルLM2を生成することができる。このような補正部F3を含む第2学習済みモデルLM2は、対象物同士で類似していない対象物固有の特徴を用いた推論が可能となるから、性能が向上する。 In the generation system 3A, features that are not similar between objects (face, body shape, shoes, color of accessories, etc.) are selected from among the features of the objects, and components with a high response rate to the characteristics of the object are selected. can be added to the first trained model LM1 to generate the second trained model LM2. The second trained model LM2 including such a correction part F3 enables inference using features unique to objects that are not similar among objects, and thus improves performance.
 [1.2.2 効果等]
 以上述べた評価システム2Aにおいて、第1取得処理S11Aは、互いに異なる複数の第1対象物がそれぞれ写る複数の第1対象画像を第1学習済みモデルLM1にそれぞれ入力して複数の第1対象物にそれぞれ対応する複数の第1特徴ベクトルを取得する。第2取得処理S12Aは、複数の第1対象物とはそれぞれ所定の特徴が異なる複数の第2対象物が写る複数の第2対象画像を第1学習済みモデルLM1にそれぞれ入力して複数の第2対象物に対応する複数の第2特徴ベクトルを取得する。評価処理S13Aは、第1抽出処理S131Aと、第2抽出処理S132Aと、演算処理S133Aとを含む。第1抽出処理S131Aは、複数の第1対象画像の各々に関して特徴ベクトルの複数の成分から第1特徴ベクトルでの値が閾値以上である成分を抽出する。第2抽出処理S132Aは、第1抽出処理S131Aで抽出された成分のうち第1特徴ベクトルでの値と第2特徴ベクトルでの値との差が所定値以上である成分を抽出する。演算処理S133Aは、特徴ベクトルの複数の成分の各々に関して、所定の特徴の変化に対する反応割合として、第1抽出処理で抽出された回数に対する第2抽出処理で抽出された回数の割合を求める。この構成は、特徴ベクトルの複数の成分それぞれの変化の評価を得ることを可能にする。
[1.2.2 Effects, etc.]
In the evaluation system 2A described above, the first acquisition processing S11A inputs a plurality of first target images each showing a plurality of different first targets to the first trained model LM1 to obtain a plurality of first targets. a plurality of first feature vectors respectively corresponding to . In the second acquisition processing S12A, a plurality of second target images in which a plurality of second target objects having predetermined characteristics different from the plurality of first target objects are respectively input to the first trained model LM1 to obtain a plurality of second target images. A plurality of second feature vectors corresponding to the two objects are obtained. The evaluation process S13A includes a first extraction process S131A, a second extraction process S132A, and an arithmetic process S133A. The first extraction processing S131A extracts a component whose value in the first feature vector is equal to or greater than a threshold from the plurality of components of the feature vector for each of the plurality of first target images. The second extraction process S132A extracts a component whose difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value among the components extracted in the first extraction process S131A. The arithmetic processing S133A obtains the ratio of the number of times of extraction in the second extraction process to the number of times of extraction in the first extraction process as a response rate to a change in a predetermined feature for each of the plurality of components of the feature vector. This configuration allows obtaining an estimate of the change in each of the multiple components of the feature vector.
 また、評価システム2Aにおいて、第2取得処理S12Aは、互いに異なる複数の所定の特徴の各々に関して、第1学習済みモデルLM1に第2対象画像を入力して第2特徴ベクトルを取得する。評価処理S13Aは、複数の所定の特徴の各々に関して、所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化を評価する。この構成は、複数の所定の特徴について、特徴ベクトルの複数の成分それぞれの変化の評価を得ることを可能にする。 Also, in the evaluation system 2A, the second acquisition process S12A acquires a second feature vector by inputting the second target image to the first trained model LM1 for each of a plurality of different predetermined features. The evaluation processing S13A evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature for each of the plurality of predetermined features. This configuration allows obtaining an estimate of the change in each of the multiple components of the feature vector for multiple predetermined features.
 また、評価システム2Aにおいて、閾値は、特徴ベクトルの複数の成分の第1特徴ベクトルでの値の代表値に基づいて設定される。この構成は、特徴ベクトルの複数の成分それぞれの変化の評価の精度の向上を図ることを可能にする。 Also, in the evaluation system 2A, the threshold is set based on the representative value of the values in the first feature vector of the plurality of components of the feature vector. This configuration makes it possible to improve the accuracy of evaluation of changes in each of the plurality of components of the feature vector.
 [2.変形例]
 本開示の実施の形態は、上記実施の形態に限定されない。上記実施の形態は、本開示の課題を達成できれば、設計等に応じて種々の変更が可能である。以下に、上記実施の形態の変形例を列挙する。以下に説明する変形例は、適宜組み合わせて適用可能である。
[2. Modification]
Embodiments of the present disclosure are not limited to the above embodiments. The above-described embodiment can be modified in various ways according to the design, etc., as long as the subject of the present disclosure can be achieved. Modifications of the above embodiment are listed below. Modifications described below can be applied in combination as appropriate.
 一変形例では、情報処理システム1は、評価システム2、生成システム3、及び推論システム4の少なくとも一つを備えていてよい。プログラムは、評価方法、生成方法、及び推論方法の少なくとも一つを、演算回路に実行させるためのプログラムであってよい。この点は、情報処理システム1Aにおいても同様である。 In one variation, the information processing system 1 may include at least one of the evaluation system 2, the generation system 3, and the inference system 4. The program may be a program for causing an arithmetic circuit to execute at least one of an evaluation method, a generation method, and an inference method. This point also applies to the information processing system 1A.
 一変形例では、推論の結果は、特に限定されない。推論の結果は、対象画像に写る対象物の分類の結果であってよい。 In one variation, the result of inference is not particularly limited. The result of the inference may be the result of classification of objects appearing in the target image.
 一変形例では、第2取得処理S12は、互いに異なる複数の所定の特徴の各々に関して、第1学習済みモデルLM1に第2対象画像を入力して第2特徴ベクトルを取得してよい。つまり、一枚の第1対象画像に対して所定の特徴が互いの異なる複数の第2対象画像が設定されてよい。 In a modified example, the second acquisition process S12 may acquire a second feature vector by inputting the second target image to the first trained model LM1 for each of a plurality of different predetermined features. That is, a plurality of second target images having different predetermined characteristics may be set for one first target image.
 一変形例では、情報処理システム1において、評価システム2、生成システム3、及び推論システム4がそれぞれ異なるコンピュータシステムで実現されることは必須ではない。評価システム2、生成システム3、及び推論システム4の少なくとも2つは単一のコンピュータシステムで実現されてもよい。この点は、情報処理システム1Aにおいても同様である。 In one modification, in the information processing system 1, it is not essential that the evaluation system 2, the generation system 3, and the inference system 4 are implemented by different computer systems. At least two of the evaluation system 2, generation system 3, and reasoning system 4 may be implemented in a single computer system. This point also applies to the information processing system 1A.
 一変形例では、評価システム2(2A)、生成システム3(3A)、及び推論システム4は、それぞれ、入出力装置21,31,41と通信装置22,32,42との両方を備える必要はない。この点は、評価システム2A及び生成システム3Aにおいても同様である。 In one variation, evaluation system 2 (2A), generation system 3 (3A), and reasoning system 4 need not include both input/ output devices 21, 31, 41 and communication devices 22, 32, 42, respectively. Absent. This point is the same for the evaluation system 2A and the generation system 3A.
 一変形例では、評価システム2、生成システム3、及び推論システム4の各々は、複数のコンピュータシステムで実現されてもよい。つまり、評価システム2、生成システム3、及び推論システム4の各々における複数の機能(構成要素)が、1つの筐体内に集約されていることは必須ではなく、評価システム2、生成システム3、及び推論システム4の各々の構成要素は、複数の筐体に分散して設けられていてもよい。さらに、評価システム2、生成システム3、及び推論システム4の各々の少なくとも一部の機能、例えば、演算回路24,34,44の一部の機能がクラウド(クラウドコンピューティング)等によって実現されてもよい。この点は、評価システム2A及び生成システム3Aにおいても同様である。 In one variation, each of the evaluation system 2, generation system 3, and reasoning system 4 may be implemented in multiple computer systems. In other words, it is not essential that multiple functions (components) in each of the evaluation system 2, the generation system 3, and the inference system 4 are integrated in one housing, and the evaluation system 2, the generation system 3, and the Each component of the inference system 4 may be distributed over a plurality of housings. Furthermore, even if at least some functions of each of the evaluation system 2, the generation system 3, and the inference system 4, for example, some functions of the arithmetic circuits 24, 34, and 44 are realized by the cloud (cloud computing), etc. good. This point is the same for the evaluation system 2A and the generation system 3A.
 [3.態様]
 上記実施の形態及び変形例から明らかなように、本開示は、下記の態様を含む。以下では、実施の形態との対応関係を明示するためだけに、符号を括弧付きで付している。
[3. mode]
As is clear from the above embodiments and modifications, the present disclosure includes the following aspects. In the following, reference numerals are attached with parentheses only for the purpose of clarifying correspondence with the embodiments.
 第1の態様は、評価システム(2;2A)であって、対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する学習済みモデル(LM1)を記憶する記憶装置(23;23A)と、前記学習済みモデル(LM1)の評価を行う演算回路(24;24A)とを備える。前記学習済みモデル(LM1)は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成される。前記演算回路(24;24A)は、第1取得処理(S11;S11A)と、第2取得処理(S12;S12A)と、評価処理(S13;S13A)とを実行する。前記第1取得処理(S11;S11A)は、前記学習済みモデル(LM1)に第1対象物が写る第1対象画像を入力して前記第1対象物に対応する第1特徴ベクトルを取得する。前記第2取得処理(S12;S12A)は、前記学習済みモデル(LM1)に前記第1対象物とは所定の特徴が異なる第2対象物が写る第2対象画像を入力して前記第2対象物に対応する第2特徴ベクトルを取得する前記評価処理(S13;S13A)は、前記第1特徴ベクトルと前記第2特徴ベクトルとの比較に基づいて前記所定の特徴の変化に対する前記特徴ベクトルの複数の成分それぞれの変化の評価をする。この態様は、追加学習をしなくても推論精度の向上を可能にする。 A first aspect is an evaluation system (2; 2A), which is a storage device ( 23; 23A) and an arithmetic circuit (24; 24A) for evaluating the learned model (LM1). The trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The arithmetic circuit (24; 24A) executes a first acquisition process (S11; S11A), a second acquisition process (S12; S12A), and an evaluation process (S13; S13A). The first acquisition process (S11; S11A) acquires a first feature vector corresponding to the first target by inputting a first target image showing the first target to the learned model (LM1). In the second acquisition process (S12; S12A), a second target image in which a second target having a predetermined characteristic different from that of the first target is input to the trained model (LM1) to obtain the second target. The evaluation process (S13; S13A) for obtaining a second feature vector corresponding to the object includes determining a plurality of the feature vectors for changes in the predetermined feature based on a comparison between the first feature vector and the second feature vector. Evaluate the change in each of the components of This aspect allows for improved inference accuracy without additional learning.
 第2の態様は、第1の態様に基づく評価システム(2)である。第2の態様において、前記評価処理(S13)は、第1抽出処理(S131)と、第2抽出処理(S132)とを実行する。前記第1抽出処理(S131)は、前記特徴ベクトルの複数の成分から前記第1特徴ベクトルでの値が閾値以上である成分を抽出する。前記第2抽出処理(S132)は、前記第1抽出処理(S131)で抽出された成分のうち前記第1特徴ベクトルでの値と前記第2特徴ベクトルでの値との差が所定値以上である成分を抽出する。この態様は、特徴ベクトルの複数の成分それぞれの変化の評価の精度の向上を図ることを可能にする。 The second aspect is the evaluation system (2) based on the first aspect. In the second aspect, the evaluation process (S13) includes a first extraction process (S131) and a second extraction process (S132). The first extraction process (S131) extracts a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector. In the second extraction process (S132), the difference between the value of the first feature vector and the value of the second feature vector among the components extracted in the first extraction process (S131) is equal to or greater than a predetermined value. Extract a component. This aspect makes it possible to improve the accuracy of evaluating changes in each of the plurality of components of the feature vector.
 第3の態様は、第1の態様に基づく評価システム(2;2A)である。第3の態様において、前記第2取得処理(S12;S12A)は、互いに異なる複数の前記所定の特徴の各々に関して、前記第1学習済みモデル(LM1)に前記第2対象画像を入力して前記第2特徴ベクトルを取得する。前記評価処理(S13;S13A)は、前記複数の所定の特徴の各々に関して、所定の特徴の変化に対する前記特徴ベクトルの複数の成分それぞれの変化を評価する。この態様は、複数の所定の特徴について、特徴ベクトルの複数の成分それぞれの変化の評価を得ることを可能にする。 The third aspect is a rating system (2; 2A) based on the first aspect. In the third aspect, the second acquisition process (S12; S12A) inputs the second target image to the first trained model (LM1) for each of the plurality of predetermined features different from each other, and Obtain a second feature vector. The evaluation process (S13; S13A) evaluates changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature for each of the plurality of predetermined features. This aspect allows obtaining an estimate of the change in each of the multiple components of the feature vector for multiple predetermined features.
 第4の態様は、第3の態様に基づく評価システム(2A)である。第4の態様において、前記第1取得処理(S11A)は、互いに異なる複数の前記第1対象物がそれぞれ写る複数の前記第1対象画像を前記第2学習済みモデル(LM1)にそれぞれ入力して前記複数の第1対象物にそれぞれ対応する複数の前記第1特徴ベクトルを取得する。前記第2取得処理(S12A)は、前記複数の第1対象物とはそれぞれ前記所定の特徴が異なる複数の前記第2対象物が写る複数の前記第2対象画像を前記第2学習済みモデル(LM1)にそれぞれ入力して前記複数の第2対象物に対応する複数の第2特徴ベクトルを取得する。前記評価処理(S13A)は、第1抽出処理(S131A)と、第2抽出処理(S132A)と、演算処理(S133)とを実行する。前記第1抽出処理(S131A)は、前記複数の第1対象画像の各々に関して前記特徴ベクトルの複数の成分から前記第1特徴ベクトルでの値が閾値以上である成分を抽出する。前記第2抽出処理(S132A)は、前記第1抽出処理(S131A)で抽出された成分のうち前記第1特徴ベクトルでの値と前記第2特徴ベクトルでの値との差が所定値以上である成分を抽出する。前記評価処理(S133A)は、前記特徴ベクトルの複数の成分の各々に関して、前記所定の特徴の変化に対する反応割合として、前記第1抽出処理で抽出された回数に対する前記第2抽出処理で抽出された回数の割合を求める。この態様は、特徴ベクトルの複数の成分それぞれの変化の評価を得ることを可能にする。 The fourth aspect is the evaluation system (2A) based on the third aspect. In the fourth aspect, the first acquisition process (S11A) includes inputting a plurality of first target images each including a plurality of different first target objects to the second trained model (LM1). Obtaining a plurality of first feature vectors respectively corresponding to the plurality of first objects. The second acquisition process (S12A) includes obtaining the plurality of second target images, in which the plurality of second target objects having the predetermined characteristics different from the plurality of first target objects, are captured by the second trained model ( LM1) to obtain a plurality of second feature vectors corresponding to the plurality of second objects. The evaluation process (S13A) executes a first extraction process (S131A), a second extraction process (S132A), and an arithmetic process (S133). The first extraction process (S131A) extracts a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector for each of the plurality of first target images. In the second extraction process (S132A), the difference between the value of the first feature vector and the value of the second feature vector among the components extracted in the first extraction process (S131A) is equal to or greater than a predetermined value. Extract a component. In the evaluation process (S133A), with respect to each of the plurality of components of the feature vector, as a response rate to the change in the predetermined feature, Find the ratio of the number of times. This aspect allows obtaining an estimate of the change in each of the multiple components of the feature vector.
 第5の態様は、第2又は第4の態様に基づく評価システム(2;2A)である。第5の態様において、前記閾値は、前記特徴ベクトルの複数の成分の前記第1特徴ベクトルでの値の代表値に基づいて設定される。この態様によれば、特徴ベクトルの複数の成分それぞれの変化の評価の精度の向上が図れる。 The fifth aspect is a rating system (2; 2A) based on the second or fourth aspect. In the fifth aspect, the threshold is set based on a representative value of values in the first feature vector of the plurality of components of the feature vector. According to this aspect, it is possible to improve the accuracy of evaluating the change in each of the plurality of components of the feature vector.
 第6の態様は、第1~第5の態様のいずれか一つに基づく評価システム(2;2A)である。第6の態様において、前記所定の特徴は、前記対象物の色に関する特徴及び前記対象物の形状に関する特徴の少なくとも1つを含む。この態様によれば、推論精度の向上が可能になる。 The sixth aspect is a rating system (2; 2A) based on any one of the first to fifth aspects. In a sixth aspect, the predetermined characteristic includes at least one of a color characteristic of the object and a shape characteristic of the object. According to this aspect, it is possible to improve the inference accuracy.
 第7の態様は、第6の態様に基づく評価システム(2;2A)である。第7の態様において、前記対象物の色に関する特徴は、色相、明度、彩度、及びコントラストを含む。前記対象物の形状に関する特徴は、前記対象物の縦横比、前記対象物の頭身、及び前記対象物の体型を含む。この態様によれば、推論精度の向上が可能になる。 The seventh aspect is a rating system (2; 2A) based on the sixth aspect. In a seventh aspect, the color-related features of the object include hue, brightness, saturation, and contrast. The features related to the shape of the object include the aspect ratio of the object, the head and body of the object, and the body shape of the object. According to this aspect, it is possible to improve the inference accuracy.
 第8の態様は、第1~第7の態様のいずれか一つに基づく評価システム(2;2A)である。第8の態様において、前記推論の結果は、前記対象画像に写る対象物が特定の対象物と一致するかどうかを示す。この態様によれば、対象画像に写る対象物が特定の対象物と一致するかどうかについての推論精度の向上が可能になる。 The eighth aspect is a rating system (2; 2A) based on any one of the first to seventh aspects. In the eighth aspect, the inference result indicates whether or not the object appearing in the target image matches a specific object. According to this aspect, it is possible to improve the inference accuracy as to whether or not the object appearing in the target image matches the specific object.
 第9の態様は、対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する学習済みモデル(LM1)の評価を行う評価方法である。前記学習済みモデル(LM1)は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成される。前記評価方法は、第1取得処理(S11;S11A)と、第2取得処理(S12;S12A)と、評価処理(S13;S13A)とを含む。前記第1取得処理(S11;S11A)は、前記学習済みモデル(LM1)に第1対象物が写る第1対象画像を入力して前記第1対象物に対応する第1特徴ベクトルを取得する。前記第2取得処理(S12;S12A)は、前記学習済みモデル(LM1)に前記第1対象物とは所定の特徴が異なる第2対象物が写る第2対象画像を入力して前記第2対象物に対応する第2特徴ベクトルを取得する。前記評価処理(S13;S13A)は、前記第1特徴ベクトルと前記第2特徴ベクトルとの比較に基づいて前記所定の特徴の変化に対する前記特徴ベクトルの複数の成分それぞれの変化の評価をする。この態様は、追加学習をしなくても推論精度の向上を可能にする。 A ninth aspect is an evaluation method for evaluating a trained model (LM1) that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation method includes a first acquisition process (S11; S11A), a second acquisition process (S12; S12A), and an evaluation process (S13; S13A). The first acquisition process (S11; S11A) acquires a first feature vector corresponding to the first target by inputting a first target image showing the first target to the learned model (LM1). In the second acquisition process (S12; S12A), a second target image in which a second target having a predetermined characteristic different from that of the first target is input to the trained model (LM1) to obtain the second target. Obtain a second feature vector corresponding to the object. The evaluation process (S13; S13A) evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison between the first feature vector and the second feature vector. This aspect allows for improved inference accuracy without additional learning.
 第10の態様は、生成システム(3;3A)であって、対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する第1学習済みモデル(LM1)と前記第1学習済みモデル(LM1)の評価情報(D1;D1A)とを記憶する記憶装置(33;33A)と、前記評価情報(D1;D1A)に基づいて前記第1学習済みモデル(LM1)から第2学習済みモデル(LM2)を生成する演算回路(34;34A)とを備える。前記第1学習済みモデル(LM1)は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成される。前記評価情報(D1;D1A)は、前記対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。前記演算回路(34;34A)は、決定処理(S21;S21A)と、生成処理(S22;S22A)とを実行する。前記決定処理(S21;S21A)は、前記1以上の所定の特徴のうちの対象の特徴について前記評価情報(D1;D1A)に基づいて前記特徴ベクトルの複数の成分それぞれの有効度を決定する。前記生成処理(S22;S22A)は、前記第1学習済みモデル(LM1)を、入力された対象画像から抽出された前記特徴ベクトルを前記決定処理(S21;S21A)で決定された前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように変更することで、前記第1学習済みモデル(LM1)から前記第2学習済みモデル(LM2)を生成する。この態様は、追加学習をしなくても推論精度の向上を可能にする。 A tenth aspect is a generation system (3; 3A), which includes a first trained model (LM1) for outputting an inference result regarding the object in response to input of a target image in which the object is captured, and the first a storage device (33; 33A) for storing evaluation information (D1; D1A) of a trained model (LM1); and a second trained model (LM1) based on the evaluation information (D1; D1A). and an arithmetic circuit (34; 34A) for generating a trained model (LM2). The first trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. be. The evaluation information (D1; D1A) indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The arithmetic circuit (34; 34A) executes a determination process (S21; S21A) and a generation process (S22; S22A). The determination processing (S21; S21A) determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information (D1; D1A) for the target feature among the one or more predetermined features. The generating process (S22; S22A) converts the first trained model (LM1) into the feature vector extracted from the input target image, which is the feature vector determined in the determination process (S21; S21A). By changing to output the result of inference about the object based on the corrected feature vector obtained by correcting based on the effectiveness of each of the plurality of components, from the first trained model (LM1) Generate the second trained model (LM2). This aspect allows for improved inference accuracy without additional learning.
 第11の態様は、対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する第1学習済みモデル(LM1)から、前記第1学習済みモデル(LM1)の評価情報(D1;D1A)に基づいて、第2学習済みモデル(LM2)を生成する生成方法である。前記第1学習済みモデル(LM1)は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成される。前記評価情報(D1;D1A)は、前記対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。前記生成方法は、決定処理(S21;S21A)と、生成処理(S22;S22A)とを含む。前記決定処理(S21;S21A)は、前記複数の所定の特徴のうちの対象の特徴について前記評価情報に基づいて前記特徴ベクトルの複数の成分の有効度を決定する。前記生成処理(S22;S22A)は、前記第1学習済みモデル(LM1)を、入力された対象画像から抽出された前記特徴ベクトルを前記決定処理(S21;S21A)で決定された前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように変更することで、前記第1学習済みモデル(LM1)から前記第2学習済みモデル(LM2)を生成する。この態様は、追加学習をしなくても推論精度の向上を可能にする。 In an eleventh aspect, evaluation information ( D1; D1A) is a generation method for generating a second trained model (LM2). The first trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. be. The evaluation information (D1; D1A) indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The generation method includes a determination process (S21; S21A) and a generation process (S22; S22A). The determination processing (S21; S21A) determines effectiveness of a plurality of components of the feature vector based on the evaluation information for a target feature among the plurality of predetermined features. The generating process (S22; S22A) converts the first trained model (LM1) into the feature vector extracted from the input target image, which is the feature vector determined in the determination process (S21; S21A). By changing to output the result of inference about the object based on the corrected feature vector obtained by correcting based on the effectiveness of each of the plurality of components, from the first trained model (LM1) Generate the second trained model (LM2). This aspect allows for improved inference accuracy without additional learning.
 第12の態様は、推論システム(4)であって、対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する学習済みモデル(LM2)を記憶する記憶装置(43)と、演算回路(44)とを備える。前記学習済みモデル(LM2)は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルを前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成される。前記特徴ベクトルの複数の成分それぞれの有効度は、前記対象物における所定の特徴の変化に対する前記特徴ベクトルの複数の成分の変化に基づいて設定される。前記演算回路(24)は、所定の対象画像を取得する取得処理(S31)と、前記取得処理(S31)で取得した前記所定の対象画像を前記記憶装置(43)に記憶された前記学習済みモデル(LM2)に入力して前記所定の対象画像に写る対象物に関する推論の結果を取得する推論処理(S32)とを実行する。この態様は、追加学習をしなくても推論精度の向上を可能にする。 A twelfth aspect is an inference system (4) comprising a storage device (43) storing a trained model (LM2) outputting an inference result regarding an object in response to an input of a target image showing the object. and an arithmetic circuit (44). The learned model (LM2) extracts a feature vector of an object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and corrects It is configured to output a result of inference about the object based on subsequent feature vectors. The effectiveness of each of the plurality of components of the feature vector is set based on the change of the plurality of components of the feature vector with respect to the change of a predetermined feature of the object. The arithmetic circuit (24) performs an acquisition process (S31) for acquiring a predetermined target image, and the learned target image stored in the storage device (43) for the predetermined target image acquired in the acquisition process (S31). and an inference process (S32) for inputting to the model (LM2) and obtaining an inference result regarding the object appearing in the predetermined target image. This aspect allows for improved inference accuracy without additional learning.
 第13の態様は、対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する学習済みモデル(LM2)を用いる推論方法である。前記学習済みモデル(LM2)は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルを前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成される。前記特徴ベクトルの複数の成分それぞれの有効度は、前記対象物における所定の特徴の変化に対する前記特徴ベクトルの複数の成分の変化に基づいて設定される。前記推論方法は、所定の対象画像を取得する取得処理(S31)と、前記取得処理(S31)で取得した前記所定の対象画像を前記学習済みモデル(LM2)に入力して前記所定の対象画像に写る対象物に関する推論の結果を取得する推論処理(S32)とを含む。この態様は、追加学習をしなくても推論精度の向上を可能にする。 A thirteenth aspect is an inference method using a learned model (LM2) that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The learned model (LM2) extracts a feature vector of an object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and corrects It is configured to output a result of inference about the object based on subsequent feature vectors. The effectiveness of each of the plurality of components of the feature vector is set based on the change of the plurality of components of the feature vector with respect to the change of a predetermined feature of the object. The inference method includes acquisition processing (S31) for acquiring a predetermined target image, and inputting the predetermined target image acquired in the acquisition processing (S31) to the trained model (LM2) to obtain the predetermined target image. Inference processing (S32) for acquiring the result of inference about the object in the image. This aspect allows for improved inference accuracy without additional learning.
 第14の態様は、対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する学習済みモデル(LM2)であって、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルを前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成される。前記特徴ベクトルの複数の成分それぞれの有効度は、前記対象物における所定の特徴の変化に対する前記特徴ベクトルの複数の成分それぞれの変化に基づいて設定される。この態様は、追加学習をしなくても推論精度の向上を可能にする。 A fourteenth aspect is a trained model (LM2) that outputs a result of inference regarding an object in response to an input of a target image in which the object is shown, wherein the feature vector of the object shown in the input target image is correcting the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector; and outputting a result of inference about the object based on the corrected feature vector. . The validity of each of the plurality of components of the feature vector is set based on the change of each of the plurality of components of the feature vector with respect to the change of a predetermined feature of the object. This aspect allows for improved inference accuracy without additional learning.
 第15の態様は、第9の態様に基づく評価方法、第11の態様に基づく生成方法、及び第13の態様に基づく推論方法の少なくとも一つを、演算回路(24,24A,34,34A,44)に実行させるための、プログラムである。この態様は、追加学習をしなくても推論精度の向上を可能にする。 In a fifteenth aspect, at least one of the evaluation method based on the ninth aspect, the generation method based on the eleventh aspect, and the inference method based on the thirteenth aspect is applied to arithmetic circuits (24, 24A, 34, 34A, 44) to be executed. This aspect allows for improved inference accuracy without additional learning.
 第16の態様は、情報処理システム(1;1A)であって、評価システム(2;2A)と、生成システム(3;3A)と、推論システム(4)とを備える。前記評価システム(2;2A)は、対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する第1学習済みモデル(LM1)の評価情報(D1;D1A)を生成する。前記生成システム(3;3A)は、前記評価情報(D1;D1A)に基づいて前記第1学習済みモデル(LM1)から第2学習済みモデル(LM2)を生成する。前記推論システム(4)は、前記第2学習済みモデル(LM2)を利用して前記対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する。前記第1学習済みモデル(LM1)は、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成される。前記評価システム(2;2A)は、第1取得処理(S11;S11A)と、第2取得処理(S12;S12A)と、評価処理(S13;S13A)とを実行する。前記第1取得処理(S11;S11A)は、前記第1学習済みモデル(LM1)に第1対象物が写る第1対象画像を入力して前記第1対象物に対応する第1特徴ベクトルを取得する。前記第2取得処理(S12;S12A)と、前記第1学習済みモデル(LM1)に前記第1対象物とは所定の特徴が異なる第2対象物が写る第2対象画像を入力して前記第2対象物に対応する第2特徴ベクトルを取得する。評価処理(S13)は、前記第1特徴ベクトルと前記第2特徴ベクトルとの比較に基づいて前記所定の特徴の変化に対する前記特徴ベクトルの複数の成分それぞれの変化の評価をして前記評価情報(D1;D1A)を生成する。前記評価情報(D1;D1A)は、前記対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示す。前記生成システム(3;3A)は、決定処理(S21;S21A)と、生成処理(S22;S22A)とを実行する。前記決定処理(S21;S21A)は、前記1以上の所定の特徴のうちの対象の特徴について前記評価情報(D1;D1A)に基づいて前記特徴ベクトルの複数の成分それぞれの有効度を決定する。前記生成処理(S22;S22A)は、前記第1学習済みモデル(LM1)を、入力された対象画像から抽出された前記特徴ベクトルを前記決定処理で決定された前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように変更することで、前記第1学習済みモデル(LM1)から前記第2学習済みモデル(LM2)を生成する。この態様は、追加学習をしなくても推論精度の向上を可能にする。 A sixteenth aspect is an information processing system (1; 1A) comprising an evaluation system (2; 2A), a generation system (3; 3A), and an inference system (4). The evaluation system (2; 2A) generates evaluation information (D1; D1A) of a first trained model (LM1) that outputs an inference result regarding the object in response to an input of a target image showing the object. . The generating system (3; 3A) generates a second trained model (LM2) from the first trained model (LM1) based on the evaluation information (D1; D1A). The inference system (4) uses the second trained model (LM2) to output an inference result regarding the object in response to an input of a target image in which the object is shown. The first trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. be. The evaluation system (2; 2A) executes a first acquisition process (S11; S11A), a second acquisition process (S12; S12A), and an evaluation process (S13; S13A). The first acquisition process (S11; S11A) acquires a first feature vector corresponding to the first target by inputting a first target image showing the first target to the first trained model (LM1). do. a second acquisition process (S12; S12A); 2. Obtain a second feature vector corresponding to the object. Evaluation processing (S13) evaluates changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature based on comparison between the first feature vector and the second feature vector, and outputs the evaluation information ( D1; D1A). The evaluation information (D1; D1A) indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The generation system (3; 3A) executes a determination process (S21; S21A) and a generation process (S22; S22A). The determination processing (S21; S21A) determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information (D1; D1A) for the target feature among the one or more predetermined features. The generating process (S22; S22A) converts the first trained model (LM1) into the feature vectors extracted from the input target image, each of the plurality of components of the feature vectors determined in the determination process. By changing to output the result of inference about the object based on the corrected feature vector obtained by correcting based on the effectiveness, the second learned model (LM1) is changed from the first learned model (LM1) Generate a model (LM2). This aspect allows for improved inference accuracy without additional learning.
 [4.用語]
 本開示では、機械学習に関する用語を以下のように定義して用いる。
[4. the term]
In the present disclosure, terms related to machine learning are defined and used as follows.
 「学習済みモデル」とは「学習済みパラメータ」が組み込まれた「推論プログラム」をいう。 A "learned model" is an "inference program" that incorporates "learned parameters".
 「学習済みパラメータ」とは、学習用データセットを用いた学習の結果、得られたパラメータ(係数)をいう。学習済みパラメータは、学習用データセットを学習用プログラムに対して入力することで、一定の目的のために機械的に調整されることで生成される。学習済みパラメータは、学習の目的にあわせて調整されているものの、単体では単なるパラメータ(数値等の情報)にすぎず、これを推論プログラムに組み込むことで初めて学習済みモデルとして機能する。例えば、ディープラーニングの場合には、学習済みパラメータの中で主要なものとしては、各ノード間のリンクの重み付けに用いられるパラメータ等がこれに該当する。 "Trained parameters" refer to parameters (coefficients) obtained as a result of learning using the learning data set. A learned parameter is generated by inputting a learning data set to a learning program and mechanically adjusting it for a certain purpose. Although the trained parameters are adjusted according to the purpose of learning, they are merely parameters (information such as numerical values) by themselves, and they function as trained models only when they are incorporated into an inference program. For example, in the case of deep learning, among the learned parameters, the parameters used for weighting the links between nodes correspond to this.
 「推論プログラム」とは、組み込まれた学習済みパラメータを適用することで、入力に対して一定の結果を出力することを可能にするプログラムをいう。例えば、入力として与えられた画像に対して、学習の結果として取得された学習済みパラメータを適用し、当該画像に対する結果(認証や判定)を出力するための一連の演算手順を規定したプログラムである。 "Inference program" refers to a program that can output certain results for input by applying the built-in learned parameters. For example, it is a program that defines a series of calculation procedures for applying learned parameters obtained as a result of learning to an image given as an input and outputting results (authentication and judgment) for the image. .
 「学習用データセット」とは、訓練データセットともいい、生データに対して、欠測値や外れ値の除去等の前処理や、ラベル情報(正解データ)等の別個のデータの付加等、あるいはこれらを組み合わせて、変換・加工処理を施すことによって、対象とする学習の手法による解析を容易にするために生成された二次的な加工データをいう。学習用データセットには、生データに一定の変換を加えていわば「水増し」されたデータを含むこともある。 "Learning data set" is also known as a training data set. For raw data, preprocessing such as removal of missing values and outliers, addition of separate data such as label information (correct data), etc. Alternatively, it refers to secondary processed data generated to facilitate analysis by the target learning method by combining these and applying conversion/processing processing. The training data set may also contain data that has been "padded" by applying certain transformations to the raw data.
 「生データ」とは、ユーザやベンダ、その他の事業者や研究機関等により一次的に取得されたデータであって、データベースに読み込むことができるよう変換・加工処理されたものをいう。 "Raw data" refers to data that is primarily obtained by users, vendors, other business operators, research institutions, etc., and that has been converted and processed so that it can be read into the database.
 「学習用プログラム」とは、学習用データセットの中から一定の規則を見出し、その規則を表現するモデルを生成するためのアルゴリズムを実行するプログラムをいう。具体的には、採用する学習手法による学習を実現するために、コンピュータに実行させる手順を規定するプログラムがこれに該当する。 "Learning program" refers to a program that finds certain rules from a learning data set and executes an algorithm to generate a model that expresses those rules. Specifically, this corresponds to a program that defines a procedure to be executed by a computer in order to realize learning by the adopted learning method.
 「追加学習」とは、既存の学習済みモデルに、異なる学習用データセットを適用して、更なる学習を行うことで、新たに学習済みパラメータを生成することを意味する。 "Additional learning" means generating new learned parameters by applying a different training data set to an existing trained model and performing further learning.
 「再利用モデル」とは、追加学習により新たに生成された学習済みパラメータが組み込まれた推論プログラムを意味する。 "Reused model" means an inference program that incorporates learned parameters newly generated by additional learning.
 本開示は、評価システム、評価方法、生成システム、生成方法、推論システム、推論方法、学習済みモデル、及び、プログラムに関する。具体的には、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する予め用意された学習済みモデルの評価を行う評価システム及び評価方法、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する学習済みモデルから新たな学習済みモデルを生成するための生成システム及び生成方法、学習済みモデルを利用して対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する推論システム及び推論方法、対象物が写る対象画像の入力に対して対象物に関する推論の結果を出力する学習済みモデル、及び、評価方法、生成方法及び推論方法のためのプログラムに、本開示は適用可能である。 The present disclosure relates to evaluation systems, evaluation methods, generation systems, generation methods, inference systems, inference methods, trained models, and programs. Specifically, an evaluation system and evaluation method that evaluates a pre-prepared trained model that outputs the result of inference about an object in response to an input of a target image in which the object is captured, and an input of a target image in which the target is captured. A generation system and generation method for generating a new trained model from a trained model that outputs the result of inference about the object, and a target image that shows the object using the trained model An inference system and inference method that outputs inference results regarding an object, a trained model that outputs inference results regarding an object in response to input of a target image in which the object appears, and an evaluation method, generation method, and inference method The present disclosure is applicable to a program for
  1,1A 情報処理システム
  2,2A 評価システム
  23,23A 記憶装置
  24,24A 演算回路
  3,3A 生成システム
  33,33A 記憶装置
  34,34A 演算回路
  4 推論システム
  43 記憶装置
  44 演算回路
  61 第1対象画像(対象画像)
  62 第2対象画像(対象画像)
  71 第1対象物(対象物)
  72 第2対象物(対象物)
  LM1 学習済みモデル(第1学習済みモデル)
  LM2 学習済みモデル(第2学習済みモデル)
  D1,D1A 評価情報
  S11,S11A 第1取得処理
  S12,S12A 第2取得処理
  S13,S13A 評価処理
  S131,S131A 第1抽出処理
  S132,S132A 第2抽出処理
  S133A 演算処理
  S21,S21A 決定処理
  S22 生成処理
  S31 取得処理
  S32 推論処理
1, 1A Information processing system 2, 2A Evaluation system 23, 23A Storage device 24, 24A Arithmetic circuit 3, 3A Generation system 33, 33A Storage device 34, 34A Arithmetic circuit 4 Inference system 43 Storage device 44 Arithmetic circuit 61 First target image (target image)
62 second target image (target image)
71 first object (object)
72 second object (object)
LM1 trained model (first trained model)
LM2 trained model (second trained model)
D1, D1A Evaluation information S11, S11A First acquisition processing S12, S12A Second acquisition processing S13, S13A Evaluation processing S131, S131A First extraction processing S132, S132A Second extraction processing S133A Calculation processing S21, S21A Decision processing S22 Generation processing S31 Acquisition processing S32 Inference processing

Claims (16)

  1.  対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する学習済みモデルを記憶する記憶装置と、
     前記学習済みモデルの評価を行う演算回路と、
     を備え、
     前記学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、
      抽出された前記特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成され、
     前記演算回路は、
      前記学習済みモデルに第1対象物が写る第1対象画像を入力して前記第1対象物に対応する第1特徴ベクトルを取得する第1取得処理と、
      前記学習済みモデルに前記第1対象物とは所定の特徴が異なる第2対象物が写る第2対象画像を入力して前記第2対象物に対応する第2特徴ベクトルを取得する第2取得処理と、
      前記第1特徴ベクトルと前記第2特徴ベクトルとの比較に基づいて前記所定の特徴の変化に対する前記特徴ベクトルの複数の成分それぞれの変化の評価をする評価処理と、
     を実行する、
     評価システム。
    a storage device that stores a trained model that outputs an inference result regarding the object in response to an input of a target image in which the object is captured;
    an arithmetic circuit that evaluates the trained model;
    with
    The trained model extracts a feature vector of an object appearing in an input target image,
    configured to output a result of inference about the object based on the extracted feature vector;
    The arithmetic circuit is
    a first acquisition process of acquiring a first feature vector corresponding to the first target by inputting a first target image in which the first target is shown to the trained model;
    A second acquisition process of acquiring a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model. When,
    an evaluation process for evaluating changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector;
    run the
    rating system.
  2.  前記評価処理は、
      前記特徴ベクトルの複数の成分から前記第1特徴ベクトルでの値が閾値以上である成分を抽出する第1抽出処理と、
      前記第1抽出処理で抽出された成分のうち前記第1特徴ベクトルでの値と前記第2特徴ベクトルでの値との差が所定値以上である成分を抽出する第2抽出処理と、
     を含む、
     請求項1に記載の評価システム。
    The evaluation process includes:
    a first extraction process for extracting a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector;
    a second extraction process for extracting, from among the components extracted in the first extraction process, a component for which a difference between a value in the first feature vector and a value in the second feature vector is equal to or greater than a predetermined value;
    including,
    The evaluation system according to claim 1.
  3.  前記第2取得処理は、互いに異なる複数の前記所定の特徴の各々に関して、前記学習済みモデルに前記第2対象画像を入力して前記第2特徴ベクトルを取得し、
     前記評価処理は、前記複数の所定の特徴の各々に関して、所定の特徴の変化に対する前記特徴ベクトルの複数の成分それぞれの変化を評価する、
     請求項1に記載の評価システム。
    The second obtaining process obtains the second feature vector by inputting the second target image to the trained model for each of the plurality of predetermined features that are different from each other;
    The evaluation process evaluates, for each of the plurality of predetermined features, changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature.
    The evaluation system according to claim 1.
  4.  前記第1取得処理は、互いに異なる複数の前記第1対象物がそれぞれ写る複数の前記第1対象画像を前記学習済みモデルにそれぞれ入力して前記複数の第1対象物にそれぞれ対応する複数の前記第1特徴ベクトルを取得し、
     前記第2取得処理は、前記複数の第1対象物とはそれぞれ前記所定の特徴が異なる複数の前記第2対象物が写る複数の前記第2対象画像を前記学習済みモデルにそれぞれ入力して前記複数の第2対象物に対応する複数の第2特徴ベクトルを取得し、
     前記評価処理は、
      前記複数の第1対象画像の各々に関して前記特徴ベクトルの複数の成分から前記第1特徴ベクトルでの値が閾値以上である成分を抽出する第1抽出処理と、
      前記第1抽出処理で抽出された成分のうち前記第1特徴ベクトルでの値と前記第2特徴ベクトルでの値との差が所定値以上である成分を抽出する第2抽出処理と、
      前記特徴ベクトルの複数の成分の各々に関して、前記所定の特徴の変化に対する反応割合として、前記第1抽出処理で抽出された回数に対する前記第2抽出処理で抽出された回数の割合を求める演算処理と、
     を含む、
     請求項3に記載の評価システム。
    In the first acquisition process, a plurality of first target images each including a plurality of different first targets are input to the trained model, and a plurality of the first target images respectively corresponding to the plurality of first targets are obtained. obtain a first feature vector;
    In the second acquisition process, a plurality of second target images in which a plurality of second targets having the predetermined characteristics different from the plurality of first targets are input to the trained model, respectively, and the Obtaining a plurality of second feature vectors corresponding to a plurality of second objects;
    The evaluation process includes:
    a first extraction process for extracting a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector for each of the plurality of first target images;
    a second extraction process for extracting, from among the components extracted in the first extraction process, a component for which a difference between a value in the first feature vector and a value in the second feature vector is equal to or greater than a predetermined value;
    calculating a ratio of the number of times extracted in the second extraction process to the number of times extracted in the first extraction process as a response rate to the change in the predetermined feature for each of the plurality of components of the feature vector; ,
    including,
    The evaluation system according to claim 3.
  5.  前記閾値は、前記特徴ベクトルの複数の成分の前記第1特徴ベクトルでの値の代表値に基づいて設定される、
     請求項2又は4に記載の評価システム。
    The threshold is set based on a representative value of values in the first feature vector of a plurality of components of the feature vector.
    The evaluation system according to claim 2 or 4.
  6.  前記所定の特徴は、前記対象物の色に関する特徴及び前記対象物の形状に関する特徴の少なくとも1つを含む、
     請求項1~5のいずれか一つに記載の評価システム。
    The predetermined feature includes at least one of a color feature of the object and a shape feature of the object.
    The evaluation system according to any one of claims 1-5.
  7.  前記対象物の色に関する特徴は、色相、明度、彩度、及びコントラストを含み、
     前記対象物の形状に関する特徴は、前記対象物の縦横比、前記対象物の頭身、及び前記対象物の体型を含む、
     請求項6に記載の評価システム。
    The color features of the object include hue, brightness, saturation, and contrast;
    The features related to the shape of the object include the aspect ratio of the object, the head and body of the object, and the body shape of the object,
    The evaluation system according to claim 6.
  8.  前記推論の結果は、前記対象画像に写る対象物が特定の対象物と一致するかどうかを示す、
     請求項1~7のいずれか一つに記載の評価システム。
    the result of the inference indicates whether or not the object in the target image matches a specific object;
    The evaluation system according to any one of claims 1-7.
  9.  対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する学習済みモデルの評価を行う評価方法であって、
     前記学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成され、
     前記評価方法は、
      前記学習済みモデルに第1対象物が写る第1対象画像を入力して前記第1対象物に対応する第1特徴ベクトルを取得する第1取得処理と、
      前記学習済みモデルに前記第1対象物とは所定の特徴が異なる第2対象物が写る第2対象画像を入力して前記第2対象物に対応する第2特徴ベクトルを取得する第2取得処理と、
      前記第1特徴ベクトルと前記第2特徴ベクトルとの比較に基づいて前記所定の特徴の変化に対する前記特徴ベクトルの複数の成分それぞれの変化の評価をする評価処理と、
     を含む、
     評価方法。
    An evaluation method for evaluating a trained model that outputs an inference result regarding an object in response to an input of a target image in which the object is captured,
    The trained model is configured to extract a feature vector of an object appearing in an input target image, and output an inference result regarding the object based on the extracted feature vector,
    The evaluation method is
    a first acquisition process of acquiring a first feature vector corresponding to the first target by inputting a first target image in which the first target is shown to the trained model;
    A second acquisition process of acquiring a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model. When,
    an evaluation process for evaluating changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector;
    including,
    Evaluation method.
  10.  対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する第1学習済みモデルと前記第1学習済みモデルの評価情報とを記憶する記憶装置と、
     前記評価情報に基づいて前記第1学習済みモデルから第2学習済みモデルを生成する演算回路と、
     を備え、
     前記第1学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成され、
     前記評価情報は、前記対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示し、
     前記演算回路は、
      前記1以上の所定の特徴のうちの対象の特徴について前記評価情報に基づいて前記特徴ベクトルの複数の成分それぞれの有効度を決定する決定処理と、
      前記第1学習済みモデルを、入力された対象画像から抽出された前記特徴ベクトルを前記決定処理で決定された前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように変更することで、前記第1学習済みモデルから前記第2学習済みモデルを生成する生成処理と、
     を実行する、
     生成システム。
    a storage device that stores a first trained model that outputs an inference result regarding the object in response to an input of a target image showing the object and evaluation information of the first trained model;
    an arithmetic circuit that generates a second trained model from the first trained model based on the evaluation information;
    with
    The first trained model is configured to extract a feature vector of an object appearing in an input target image, and output an inference result regarding the object based on the extracted feature vector,
    The evaluation information indicates an evaluation of a change in each of a plurality of components of a feature vector with respect to each of the one or more predetermined features of the object with respect to a change in the predetermined feature;
    The arithmetic circuit is
    a determination process of determining effectiveness of each of the plurality of components of the feature vector based on the evaluation information for a target feature among the one or more predetermined features;
    A post-correction obtained by correcting the first trained model based on the effectiveness of each of the plurality of components of the feature vector determined in the determination process for the feature vector extracted from the input target image a generating process for generating the second trained model from the first trained model by modifying the feature vector to output an inference result regarding the object;
    run the
    generation system.
  11.  対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する第1学習済みモデルから、前記第1学習済みモデルの評価情報に基づいて、第2学習済みモデルを生成する生成方法であって、
     前記第1学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成され、
     前記評価情報は、前記対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示し、
     前記生成方法は、
      前記複数の所定の特徴のうちの対象の特徴について前記評価情報に基づいて前記特徴ベクトルの複数の成分の有効度を決定する決定処理と、
      前記第1学習済みモデルを、入力された対象画像から抽出された前記特徴ベクトルを前記決定処理で決定された前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように変更することで、前記第1学習済みモデルから前記第2学習済みモデルを生成する生成処理と、
     を含む、
     生成方法。
    Generation of generating a second trained model based on evaluation information of the first trained model from a first trained model that outputs an inference result regarding the object in response to an input of a target image showing the object. a method,
    The first trained model is configured to extract a feature vector of an object appearing in an input target image, and output an inference result regarding the object based on the extracted feature vector,
    The evaluation information indicates an evaluation of a change in each of a plurality of components of a feature vector with respect to each of the one or more predetermined features of the object with respect to a change in the predetermined feature;
    The generating method is
    a determination process of determining effectiveness of a plurality of components of the feature vector based on the evaluation information for a target feature among the plurality of predetermined features;
    A post-correction obtained by correcting the first trained model based on the effectiveness of each of the plurality of components of the feature vector determined in the determination process for the feature vector extracted from the input target image a generating process for generating the second trained model from the first trained model by modifying the feature vector to output an inference result regarding the object;
    including,
    generation method.
  12.  対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する学習済みモデルを記憶する記憶装置と、
     演算回路と、
     を備え、
     前記学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルを前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成され、
     前記特徴ベクトルの複数の成分それぞれの有効度は、前記対象物における所定の特徴の変化に対する前記特徴ベクトルの複数の成分の変化に基づいて設定され、
     前記演算回路は、
      所定の対象画像を取得する取得処理と、
      前記取得処理で取得した前記所定の対象画像を前記記憶装置に記憶された前記学習済みモデルに入力して前記所定の対象画像に写る対象物に関する推論の結果を取得する推論処理と、
     を実行する、
     推論システム。
    a storage device that stores a trained model that outputs an inference result regarding the object in response to an input of a target image in which the object is captured;
    an arithmetic circuit;
    with
    The learned model extracts a feature vector of an object appearing in an input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and corrects the feature after correction. configured to output a result of inference about the object based on the vector;
    The effectiveness of each of the plurality of components of the feature vector is set based on a change in the plurality of components of the feature vector with respect to a change in a predetermined feature of the object;
    The arithmetic circuit is
    Acquisition processing for acquiring a predetermined target image;
    an inference process of inputting the predetermined target image obtained by the obtaining process into the trained model stored in the storage device and obtaining an inference result regarding an object appearing in the predetermined target image;
    run the
    reasoning system.
  13.  対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する学習済みモデルを用いる推論方法であって、
     前記学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルを前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成され、
     前記特徴ベクトルの複数の成分それぞれの有効度は、前記対象物における所定の特徴の変化に対する前記特徴ベクトルの複数の成分の変化に基づいて設定され、
     前記推論方法は、
      所定の対象画像を取得する取得処理と、
      前記取得処理で取得した前記所定の対象画像を前記学習済みモデルに入力して前記所定の対象画像に写る対象物に関する推論の結果を取得する推論処理と、
     を含む、
     推論方法。
    An inference method using a trained model that outputs an inference result regarding an object in response to an input of a target image in which the object is captured,
    The learned model extracts a feature vector of an object appearing in an input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and corrects the feature after correction. configured to output a result of inference about the object based on the vector;
    The effectiveness of each of the plurality of components of the feature vector is set based on a change in the plurality of components of the feature vector with respect to a change in a predetermined feature of the object;
    The inference method includes:
    Acquisition processing for acquiring a predetermined target image;
    an inference process of inputting the predetermined target image acquired in the acquisition process into the trained model and acquiring an inference result regarding an object appearing in the predetermined target image;
    including,
    reasoning method.
  14.  対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する学習済みモデルであって、
     入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルを前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正し、補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成され、
     前記特徴ベクトルの複数の成分それぞれの有効度は、前記対象物における所定の特徴の変化に対する前記特徴ベクトルの複数の成分それぞれの変化に基づいて設定される、
     学習済みモデル。
    A trained model that outputs an inference result regarding the object in response to an input of a target image in which the object is captured,
    Extracting a feature vector of an object appearing in an input target image, correcting the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and correcting the feature vector based on the corrected feature vector configured to output the results of inferences about things,
    The effectiveness of each of the plurality of components of the feature vector is set based on a change of each of the plurality of components of the feature vector with respect to a change in a predetermined feature of the object.
    Trained model.
  15.  請求項9に記載の評価方法、請求項11に記載の生成方法、及び請求項13に記載の推論方法の少なくとも一つを、演算回路に実行させるための、
     プログラム。
    for causing an arithmetic circuit to execute at least one of the evaluation method according to claim 9, the generation method according to claim 11, and the inference method according to claim 13,
    program.
  16.  対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する第1学習済みモデルの評価情報を生成する評価システムと、
     前記評価情報に基づいて前記第1学習済みモデルから第2学習済みモデルを生成する生成システムと、
     前記第2学習済みモデルを利用して前記対象物が写る対象画像の入力に対して前記対象物に関する推論の結果を出力する推論システムと、
     を備え、
     前記第1学習済みモデルは、入力された対象画像に写る対象物の特徴ベクトルを抽出し、抽出された前記特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように構成され、
     前記評価システムは、前記第1学習済みモデルに第1対象物が写る第1対象画像を入力して前記第1対象物に対応する第1特徴ベクトルを取得する第1取得処理と、前記第1学習済みモデルに前記第1対象物とは所定の特徴が異なる第2対象物が写る第2対象画像を入力して前記第2対象物に対応する第2特徴ベクトルを取得する第2取得処理と、前記第1特徴ベクトルと前記第2特徴ベクトルとの比較に基づいて前記所定の特徴の変化に対する前記特徴ベクトルの複数の成分それぞれの変化の評価をして前記評価情報を生成する評価処理とを実行し、
     前記評価情報は、前記対象物における1以上の所定の特徴の各々に関して所定の特徴の変化に対する特徴ベクトルの複数の成分それぞれの変化の評価を示し、
     前記生成システムは、前記1以上の所定の特徴のうちの対象の特徴について前記評価情報に基づいて前記特徴ベクトルの複数の成分それぞれの有効度を決定する決定処理と、前記第1学習済みモデルを、入力された対象画像から抽出された前記特徴ベクトルを前記決定処理で決定された前記特徴ベクトルの複数の成分それぞれの有効度に基づいて補正して得られる補正後の特徴ベクトルに基づいて前記対象物に関する推論の結果を出力するように変更することで、前記第1学習済みモデルから前記第2学習済みモデルを生成する生成処理とを実行する、
     情報処理システム。
    an evaluation system that generates evaluation information of a first trained model that outputs an inference result regarding the object in response to an input of a target image showing the object;
    a generation system that generates a second trained model from the first trained model based on the evaluation information;
    an inference system that uses the second trained model to output an inference result regarding the object in response to input of a target image showing the object;
    with
    The first trained model is configured to extract a feature vector of an object appearing in an input target image, and output an inference result regarding the object based on the extracted feature vector,
    The evaluation system includes a first acquisition process of inputting a first target image in which a first target appears in the first trained model and acquiring a first feature vector corresponding to the first target; a second acquisition process of acquiring a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model; and an evaluation process for generating the evaluation information by evaluating changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature based on a comparison between the first feature vector and the second feature vector. run,
    The evaluation information indicates an evaluation of a change in each of a plurality of components of a feature vector with respect to each of the one or more predetermined features of the object with respect to a change in the predetermined feature;
    The generation system includes a determination process for determining the effectiveness of each of the plurality of components of the feature vector based on the evaluation information for the target feature among the one or more predetermined features, and the first trained model. and correcting the feature vector extracted from the input target image based on the effectiveness of each of the plurality of components of the feature vector determined in the determination processing, and determining the target based on the corrected feature vector obtained and a generation process of generating the second trained model from the first trained model by modifying it to output a result of inference about an entity.
    Information processing system.
PCT/JP2022/018907 2021-07-09 2022-04-26 Assessment system, assessment method, generation system, generation method, inference system, inference method, trained model, program, and information processing system WO2023281904A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-114126 2021-07-09
JP2021114126 2021-07-09

Publications (1)

Publication Number Publication Date
WO2023281904A1 true WO2023281904A1 (en) 2023-01-12

Family

ID=84801537

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/018907 WO2023281904A1 (en) 2021-07-09 2022-04-26 Assessment system, assessment method, generation system, generation method, inference system, inference method, trained model, program, and information processing system

Country Status (1)

Country Link
WO (1) WO2023281904A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020525908A (en) * 2017-09-27 2020-08-27 シェンチェン センスタイム テクノロジー カンパニー リミテッドShenzhen Sensetime Technology Co.,Ltd Image search method, device, device and readable storage medium
JP2021060692A (en) * 2019-10-03 2021-04-15 株式会社東芝 Inference result evaluation system, inference result evaluation device, and method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020525908A (en) * 2017-09-27 2020-08-27 シェンチェン センスタイム テクノロジー カンパニー リミテッドShenzhen Sensetime Technology Co.,Ltd Image search method, device, device and readable storage medium
JP2021060692A (en) * 2019-10-03 2021-04-15 株式会社東芝 Inference result evaluation system, inference result evaluation device, and method thereof

Similar Documents

Publication Publication Date Title
Gao et al. Adaptive fusion and category-level dictionary learning model for multiview human action recognition
Han et al. Background prior-based salient object detection via deep reconstruction residual
Hong et al. Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval
JP6411510B2 (en) System and method for identifying faces in unconstrained media
US9020250B2 (en) Methods and systems for building a universal dress style learner
JP2020508522A (en) Periodic hostile generation networks for unsupervised cross-domain image generation
Kucer et al. Leveraging expert feature knowledge for predicting image aesthetics
JP2022503647A (en) Cross-domain image conversion
Meng et al. Constrained discriminative projection learning for image classification
An et al. An illumination normalization model for face recognition under varied lighting conditions
Murtaza et al. Face recognition using adaptive margin fisher’s criterion and linear discriminant analysis
Barra et al. Gait analysis for gender classification in forensics
CN111062426A (en) Method, device, electronic equipment and medium for establishing training set
An et al. Integrating appearance features and soft biometrics for person re-identification
Ren et al. A deep and structured metric learning method for robust person re-identification
Cao et al. Load balanced gans for multi-view face image synthesis
CN112257738A (en) Training method and device of machine learning model and classification method and device of image
Liu et al. Multitask feature selection by graph-clustered feature sharing
Han et al. Local sparse structure denoising for low-light-level image
Christoudias et al. Co-training with noisy perceptual observations
Jiang et al. Appearance-based gaze tracking: A brief review
Štepec et al. Constellation-based deep ear recognition
Mignon et al. Reconstructing faces from their signatures using RBF regression
Xu et al. Similarity measures for content-based image retrieval based on intuitionistic fuzzy set theory.
CN113378620B (en) Cross-camera pedestrian re-identification method in surveillance video noise environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22837325

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE