WO2023281904A1

WO2023281904A1 - Assessment system, assessment method, generation system, generation method, inference system, inference method, trained model, program, and information processing system

Info

Publication number: WO2023281904A1
Application number: PCT/JP2022/018907
Authority: WO
Inventors: 祐介加藤; 俊介安木; 拓実小島
Original assignee: パナソニックＩｐマネジメント株式会社
Priority date: 2021-07-09
Filing date: 2022-04-26
Publication date: 2023-01-12

Abstract

Provided are an assessment system, an assessment method, a generation system, a generation method, an inference system, an inference method, a trained model, a program, and an information processing system that enable improvement in inference accuracy without additional training. An assessment system (2) assesses a trained model (LM1). The trained model (LM1) is configured to: extract a feature vector of a target object included in an inputted target image; and output a result of inference concerning the target object on the basis of the extracted feature vector. On the basis of a comparison between a first feature vector that corresponds to a first target object and that is obtained by inputting, to the trained model (LM1), a first target image in which a first target object is included and a second feature vector that corresponds to a second target object having a predetermined feature different from the first target object and that is obtained by inputting, to the trained model (LM1), a second target image in which the second target object is included, the assessment system (2) assesses respective changes of a plurality of components of the feature vectors with respect to changes of predetermined features.

Description

Evaluation system, evaluation method, generation system, generation method, inference system, inference method, trained model, program, and information processing system

The present disclosure relates to evaluation systems, evaluation methods, generation systems, generation methods, inference systems, inference methods, trained models, programs, and information processing systems.

Patent Document 1 discloses an image search method. The image retrieval method disclosed in Patent Literature 1 performs dimensionality reduction on each convolutional layer feature of an image to be retrieved, obtains each dimensionality reduction feature, performs clustering based on each dimensionality reduction feature, and obtains a plurality of cluster features. fusing a plurality of cluster features to obtain a global feature; and retrieving an image to be searched from a database based on the global feature.

Japanese Patent Publication No. 2020-525908

The image search method disclosed in Patent Document 1 uses a trained model. In general, a large amount of data is required to generate a trained model. Since it costs money to prepare a large amount of data, public data that anyone can use may be used. However, a trained model that has been trained with public data is likely to be used in a general environment, but in a special environment where the input data is biased, the inference accuracy tends to decrease. In order to suppress the deterioration of inference accuracy, it is conceivable to perform additional learning to the trained model so that it can handle special environments, but for such additional learning, it is necessary to deal with special environments. It requires a large amount of data to be processed and is costly.

The present disclosure provides an evaluation system, evaluation method, generation system, generation method, inference system, inference method, trained model, program, and information processing system that enable improvement of inference accuracy without additional learning. do.

An evaluation system according to one aspect of the present disclosure includes a storage device that stores a trained model that outputs an inference result regarding an object in response to an input of a target image that captures the object, and an arithmetic circuit that evaluates the trained model. Prepare. The trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The arithmetic circuit executes a first acquisition process, a second acquisition process, and an evaluation process. In the first acquisition process, a first target image including a first target is input to the learned model to acquire a first feature vector corresponding to the first target. In the second acquisition process, a second target image in which a second target having a predetermined feature different from that of the first target is captured is input to the trained model, and a second feature vector corresponding to the second target is acquired. The evaluation process evaluates changes in each of the plurality of components of the feature vector for changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector.

An evaluation method of one aspect of the present disclosure is an evaluation method that evaluates a trained model that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation method includes a first acquisition process, a second acquisition process, and an evaluation process. In the first acquisition process, a first target image including a first target is input to the learned model to acquire a first feature vector corresponding to the first target. In the second acquisition process, a second target image in which a second target having a predetermined feature different from that of the first target is captured is input to the trained model, and a second feature vector corresponding to the second target is acquired. The evaluation process evaluates changes in each of the plurality of components of the feature vector for changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector.

A generation system according to one aspect of the present disclosure is a storage device that stores a first trained model that outputs an inference result regarding an object in response to an input of a target image that captures the object, and evaluation information of the first trained model. and an arithmetic circuit that generates a second trained model from the first trained model based on the evaluation information. The first trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation information indicates an evaluation of the change in each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change in the predetermined feature. The arithmetic circuit executes determination processing and generation processing. The determining process determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information for the target feature among the one or more predetermined features. In the generating process, the first learned model is corrected based on the validity of each of the plurality of components of the feature vectors extracted from the input target image, and the corrected A second trained model is generated from the first trained model by modifying it to output an inference result about the object based on the feature vector.

A generation method according to one aspect of the present disclosure is based on evaluation information of a first trained model that outputs a result of inference about an object in response to an input of a target image in which the object is captured. 2 Generate a trained model. The first trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation information indicates an evaluation of the change in each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change in the predetermined feature. The generation method includes determination processing and generation processing. The determining process determines the effectiveness of the plurality of components of the feature vector based on the evaluation information for the target feature among the plurality of predetermined features. In the generating process, the first learned model is corrected based on the validity of each of the plurality of components of the feature vectors extracted from the input target image, and the corrected A second trained model is generated from the first trained model by modifying it to output an inference result about the object based on the feature vector.

An inference system according to one aspect of the present disclosure includes a storage device that stores a trained model that outputs an inference result regarding an object in response to an input of a target image showing the object, and an arithmetic circuit. The trained model extracts the feature vectors of the object appearing in the input target image, corrects the extracted feature vectors based on the effectiveness of each of the multiple components of the feature vectors, output the result of inference about the object. The effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object. The arithmetic circuit executes acquisition processing and inference processing. Acquisition processing acquires a predetermined target image. In the inference process, the predetermined target image acquired in the acquisition process is input to the trained model stored in the storage device, and an inference result regarding the target object appearing in the predetermined target image is acquired.

An inference method of one aspect of the present disclosure uses a trained model that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The trained model extracts the feature vectors of the object appearing in the input target image, corrects the extracted feature vectors based on the effectiveness of each of the multiple components of the feature vectors, output the result of inference about the object. The effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object. The inference method includes an acquisition process and an inference process. Acquisition processing acquires a predetermined target image. In the inference process, the predetermined target image acquired in the acquisition process is input to the trained model, and an inference result regarding the object appearing in the predetermined target image is acquired.

A trained model of one aspect of the present disclosure outputs an inference result regarding an object in response to an input of a target image in which the object is shown. The trained model extracts the feature vectors of the object appearing in the input target image, corrects the extracted feature vectors based on the effectiveness of each of the multiple components of the feature vectors, output the result of inference about the object. The effectiveness of each of the multiple components of the feature vector is set based on the change of each of the multiple components of the feature vector with respect to the change of the predetermined feature in the object.

A program of one aspect of the present disclosure is a program for causing an arithmetic circuit to execute at least one of the evaluation method, the generation method, and the inference method.

An information processing system of one aspect of the present disclosure includes an evaluation system, a generation system, and an inference system. The evaluation system generates evaluation information of a first trained model that outputs an inference result regarding an object in response to an input of a target image showing the object. A generation system generates a second trained model from the first trained model based on the evaluation information. The inference system uses the second trained model to output an inference result regarding the object in response to the input of the target image in which the object appears. The first trained model is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation system executes a first acquisition process, a second acquisition process, and an evaluation process. In the first acquisition process, a first target image including a first target is input to the first trained model to acquire a first feature vector corresponding to the first target. In the second acquisition process, a second target image in which a second target having a predetermined feature different from the first target is input to the first trained model, and a second feature vector corresponding to the second target is acquired. do. The evaluation process evaluates changes in each of the plurality of components of the feature vector for changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector. The evaluation information indicates an evaluation of the change in each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change in the predetermined feature. A production system performs a decision process and a production process. The determining process determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information for the target feature among the one or more predetermined features. In the generating process, the first learned model is corrected based on the validity of each of the plurality of components of the feature vectors extracted from the input target image, and the corrected A second trained model is generated from the first trained model by modifying it to output an inference result about the object based on the feature vector.

Aspects of the present disclosure enable inference accuracy to be improved without additional learning.

1 is a block diagram of a configuration example of an information processing system according to a first embodiment; FIG. Block diagram of a configuration example of an evaluation system for the information processing system in FIG. Schematic diagram of an example of a first trained model evaluated by the evaluation system of FIG. Flowchart of an example of an evaluation method executed by the evaluation system of FIG. Schematic explanatory diagram of the evaluation method of FIG. Block diagram of a configuration example of a generation system of the information processing system in FIG. Schematic diagram of an example of a second trained model generated by the generation system of FIG. Flowchart of an example of a generation method executed by the generation system of FIG. Block diagram of a configuration example of an inference system of the information processing system in FIG. Flowchart of an example of an inference method executed by the inference system of FIG. Block diagram of a configuration example of an information processing system according to a second embodiment Block diagram of a configuration example of an evaluation system for the information processing system in FIG. Flowchart of an example of an evaluation method executed by the evaluation system of FIG. Schematic explanatory diagram of the evaluation method of FIG. 13 Block diagram of a configuration example of the generation system of the information processing system of FIG. Flowchart of an example of a generation method executed by the generation system of FIG.

[1. Embodiment]
[1.1 Embodiment 1]
[1.1.1 Configuration]
FIG. 1 is a block diagram of an information processing system 1 according to this embodiment. The information processing system 1 enables execution of rematching of objects. Rematching of objects is a task of searching a large number of images for images in which the same objects as the images prepared in advance appear. The target object is, for example, a person. In other words, the information processing system 1 can execute a task of searching a large number of images for an image in which the same person as the image prepared in advance appears.

A trained model is used for rematching. In general, a large amount of data is required to generate a trained model. Since it costs money to prepare a large amount of data, public data that anyone can use may be used. However, a trained model that has been trained with public data is likely to be used in a general environment, but in a special environment where the input data is biased, the inference accuracy tends to decrease.

The information processing system 1 of FIG. 1 newly generates a trained model suitable for the environment in which re-matching is performed from a trained model prepared in advance, and enables re-matching using the newly generated trained model. is used to

The information processing system 1 in FIG. 1 includes an evaluation system 2, a generation system 3, and an inference system 4. The evaluation system 2 evaluates a prepared trained model (first trained model) LM1 (see FIG. 3) that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The generating system 3 generates a trained model (second trained model) LM2 (see FIG. 7) from the first trained model LM1 (see FIG. 3) based on the evaluation information D1 of the evaluation system 2. FIG. The inference system 4 uses the second trained model LM2 to make an inference about the object for the input of the target image showing the object. In this embodiment, the inference system 4 outputs the result of re-matching.

In the information processing system 1 of FIG. 1, the evaluation system 2 is communicably connected to the generation system 3 via the communication network 51 . The generation system 3 is communicably connected to the inference system 4 via a communication network 52 .

[1.1.1.1 Evaluation system]
FIG. 2 is a block diagram of the evaluation system 2. As shown in FIG. The evaluation system 2 evaluates a first trained model LM1 prepared in advance that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The evaluation system 2 includes an interface (input/output device 21 and communication device 22), a storage device 23, and an arithmetic circuit 24. FIG. The evaluation system 2 is realized by, for example, one terminal device. Examples of terminal devices include personal computers (desktop computers, laptop computers), mobile terminals (smartphones, tablet terminals, wearable terminals, etc.), and the like.

The input/output device 21 functions as an input device for inputting information from the user and as an output device for outputting information to the user. That is, the input/output device 21 is used for inputting information to the evaluation system 2 and for outputting information from the evaluation system 2 . The input/output device 21 has one or more human-machine interfaces. Examples of human-machine interfaces include keyboards, pointing devices (mouse, trackball, etc.), input devices such as touch pads, output devices such as displays and speakers, and input/output devices such as touch panels.

The communication device 22 is communicably connected to an external device or system. The communication device 22 is used for communication with the generation system 3 through the communication network 51 . The communication device 22 has one or more communication interfaces. The communication device 22 is connectable to the communication network 51 and has a function of communicating through the communication network 51 . The communication device 22 complies with a predetermined communication protocol. The predetermined communication protocol may be selected from various known wired and wireless communication standards.

The storage device 23 is used to store information used by the arithmetic circuit 24 and information generated by the arithmetic circuit 24 . The storage device 23 includes one or more storages (non-temporary storage media). The storage can be, for example, hard disk drives, optical drives, and solid state drives (SSDs). Also, the storage may be any of built-in type, external type, and NAS (network-attached storage) type. Note that the evaluation system 2 may include a plurality of storage devices 23 . Information may be distributed and stored in a plurality of storage devices 23 .

The information stored in the storage device 23 includes the first learned model LM1 and evaluation information D1. FIG. 2 shows a state in which the storage device 23 stores all of the first trained model LM1 and the evaluation information D1. The first trained model LM1 and the evaluation information D1 need not always be stored in the storage device 23, and may be stored in the storage device 23 when the arithmetic circuit 24 needs them.

The first trained model LM1 outputs the result of inference regarding the object in response to the input of the target image in which the object is shown. In this embodiment, the result of the inference indicates whether an object matches a particular object. A target object is a person. The first trained model LM1 is used, for example, to search for a target image in which a specific target is captured from multiple target images. That is, the first trained model LM1 is a model for person re-matching. Person re-matching is a task of searching a large number of images for an image in which the same person as the image prepared in advance appears. The first trained model LM1 becomes the base of the second trained model LM2 generated by the generation system 3. FIG. The first trained model LM1 may be generated, for example, by an external system different from the information processing system 1 and provided to the information processing system 1 (in particular, the evaluation system 2 and the generation system 3).

FIG. 3 is a schematic diagram of an example of the first trained model LM1. The first trained model LM1, for example, uses a model having a neural network structure, inputs a target image showing the target, and machine learning using a learning data set in which the result of inference about the target is the correct answer ( It is obtained from a trained model generated by performing supervised learning. The first trained model LM1 is configured to extract a feature vector V of an object appearing in an input target image, and output an inference result regarding the object based on the extracted feature vector V. More specifically, the first trained model LM1 in FIG. 3 includes a feature extraction unit F1 and a determination unit F2. The feature extraction unit F1 extracts a feature vector (feature amount) V of an object appearing in a target image input in response to input of the target image. The feature extraction unit F1 in FIG. 3 is configured including an input layer F11, a plurality of intermediate layers (hidden layers) F12 and F13, and an output layer F14. Note that FIG. 3 shows a simplified structure of the first trained model LM1, and the structure of an actual neural network, for example, the structure of a convolutional neural network (CNN), is between the input layer and the output layer. , an arbitrary number of intermediate layers such as convolutional layers, pooling layers, activation functions, fully connected layers, etc. The determination unit F2 outputs a result of inference regarding the object based on the feature vector V extracted by the feature extraction unit F1. The determination unit F2 is, for example, a discriminator. Algorithms such as the K nearest neighbor method (KNN) and the support vector machine (SVM) can be used for the discriminator. As described above, in the present embodiment, the result of inference indicates whether an object matches a particular object. Therefore, the determination unit F2 outputs the result of matching of the object (matching result) in response to the input of the feature vector V of the object from the feature extraction unit F1. The output from the determination unit F2 indicates whether or not the object appearing in the target image input to the first trained model LM1 is a specific object. For example, the determination unit F2 compares the feature vector V extracted by the feature extraction unit F1 with the feature vector of the specific target object, thereby determining the feature vector extracted by the feature extraction unit F1 and the feature of the specific target object. The degree of matching (similarity) with the vector is obtained, and the objects appearing in the target image are compared based on the degree of similarity. Since the feature vector is an n-dimensional vector, the degree of matching can be evaluated by cosine similarity, Euclidean distance, or the like. As an example, the determination unit F2 outputs a result that the target matches the specific target when the degree of matching is equal to or higher than the determination value.

The arithmetic circuit 24 is a circuit that controls the operation of the evaluation system 2. The arithmetic circuit 24 is connected to the input/output device 21 and the communication device 22 and can access the storage device 23 . Arithmetic circuit 24 may be realized by, for example, a computer system including one or more processors (microprocessors) and one or more memories. One or more processors execute a program (stored in one or more memories or storage devices 23) to realize the functions of the arithmetic circuit 24. FIG. Although the program is pre-recorded in the storage device 23 here, it may be provided through an electric communication line such as the Internet or recorded in a non-temporary recording medium such as a memory card.

The arithmetic circuit 24 evaluates the first trained model LM1. The arithmetic circuit 24 executes, for example, the evaluation method shown in FIG. FIG. 4 is a flowchart of an example of an evaluation method executed by the evaluation system 2. FIG. FIG. 5 is a schematic illustration of the evaluation method of FIG.

The evaluation method in FIG. 4 includes a first acquisition process S11, a second acquisition process S12, and an evaluation process S13.

In the first acquisition process S11, as shown in FIG. 5, the first target image 61 in which the first target object 71 is captured is input to the first trained model LM1 (especially the feature extraction unit F1), and the first target image 61 is captured by the first target object 71. Obtain the corresponding first feature vector V1. The first target image 61 is a target image in which a first target object 71 is captured as a target object.

As shown in FIG. 5, the second acquisition process S12 includes a second target object 72 in which a second target object 72 having a predetermined feature different from the first target object 71 appears in the first trained model LM1 (especially the feature extraction unit F1). A second feature vector V2 corresponding to the second object 72 is obtained by inputting the image 62 . The second target image 62 is a target image in which the second target object 72 is captured as the target object. The second target object 72 is a target object having a predetermined characteristic different from that of the first target object 71 . In FIG. 5, the predetermined characteristic is the aspect ratio of the human head. The head 71a of the first object 71 and the head 72a of the second object 72 have different aspect ratios. The clothes 71b of the first object 71 and the clothes 72b of the second object 72 are similar in clothes and have the same color.

The second target image 62 may be generated by changing the first target image 61 so that the first target object 71 has different predetermined characteristics. The predetermined feature may be set based on whether it affects the inference result of the first trained model LM1 with respect to the object. The predetermined feature includes at least one of a color feature of the object and a shape feature of the object. Color-related features of objects include hue, brightness, saturation, and contrast. Specifically, the features related to the color of the object include the color of the hair, the color of the clothes, the color of the shoes, and the color of the inner shirt (partially visible). Features relating to the shape of the object include the aspect ratio of the object, the head and body of the object, and the body shape of the object. Features relating to the shape of an object can be said to be geometric features.

As shown in FIG. 5, the evaluation process S13 evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison between the first feature vector V1 and the second feature vector V2. Evaluation processing S13 generates evaluation information D1 indicating the result of this evaluation. A difference between the first feature vector V1 and the second feature vector V2 is a difference between the first object 71 and the second object 72, that is, a change in a predetermined feature. In the case of FIG. 5, the predetermined feature is the aspect ratio of the human head, and the differences between the first feature vector V1 and the second feature vector V2 include the head 71a of the first object 71 and the second object It is considered that a difference from the head 72a of the object 72 appears. Therefore, from a comparison of the first feature vector V1 and the second feature vector V2, each of the plurality of components of the feature vector (for example, (v1, v2, v3, v4, v5) in FIG. 5) for a given feature change. Assessment of change is possible.

The evaluation process S13 will be further explained. The evaluation process S13 of FIG. 4 includes a first extraction process S131 and a second extraction process S132.

The first extraction process S131 extracts a component whose value in the first feature vector is equal to or greater than a threshold from multiple components of the feature vector. The threshold is set based on a representative value of values in the first feature vector of the plurality of components of the feature vector. The representative value is obtained, for example, from a histogram of the values of the plurality of components of the feature vector in the first feature vector. Representative values include mean, mode, and median. Let the multiple components of the feature vector be (v1, v2, v3, v4, v5). Let the values of the components of the feature vector in the first feature vector be (0.7, 0.4, 0.2, 0.8, 0.3). The median value is 0.4. When the threshold is the median value, the first extraction process S131 extracts the components v1, v2, v4 of the feature vector. Through the first extraction process S131, a first set is obtained for a plurality of components of the feature vector, in which the values in the first feature vector are equal to or greater than the threshold.

The second extraction process S132 extracts a component, among the components extracted in the first extraction process S131, for which the difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value. The predetermined value is set to extract a component that significantly changes in the feature vector of the object when the predetermined feature of the object shown in the target image is changed. The predetermined value is a value used to determine whether or not the predetermined feature changed by the component of the feature vector of the first trained model LM1 is emphasized. The predetermined value may be the same as the threshold in the first extraction process S131, for example. For example, assume that the values of the plurality of components of the feature vector in the second feature vector are (0.1, 0.3, 0.2, 0.4, 0.2). The changes of the components v1, v2 and v4 extracted in the first extraction process S131 are 0.6, 0.1 and 0.4, respectively. If the predetermined value is equal to the threshold, then 0.4. In this case, the second extraction process S132 extracts the components v1 and v4 of the feature vector. By the second extraction process S132, a second set is obtained in which the difference between the values in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value for the components extracted in the first extraction process S131. The second set is a subset of the first set.

The evaluation information D1 is generated by the evaluation process S13. The evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature of the object. In particular, in this embodiment, the evaluation information D1 indicates the second set obtained by the second extraction processing S132.

According to the evaluation system 2 described above, the evaluation information D1 is obtained for the first trained model LM1. The evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature. In general, classification by machine learning inference programs such as neural networks is a black box, and there is no unified opinion on the interpretation of inference results. When changed, it is possible to grasp the component that causes a significant change in the feature vector of the object. As a result, it is possible to provide one criterion for determining whether each component of the feature vector of the first trained model LM1 emphasizes a predetermined feature. Therefore, one interpretation can be given to the inference result by performing inference using the component that significantly changes in the feature vector. For example, it can be interpreted that a component that significantly changes the tint, which is a feature related to color, is a component that emphasizes the tint. By using the evaluation system 2, it can be understood how much the feature vector of the first trained model LM1 reacts (how much importance is placed) on the feature regarding the color of the object and the feature regarding the shape of the object. Since the judgment of the inference of the first trained model LM1 can be explained by using them, the black box can be expected to play a role in improving the explainability.

[1.1.1.2 Generation system]
FIG. 6 is a block diagram of the generation system 3. As shown in FIG. The generation system 3 generates a second trained model LM2 from the first trained model LM1. In particular, the generation system 3 uses the evaluation information D1 generated by the evaluation system 2. FIG. The evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature of the object. The generation system 3 generates the second trained model LM2 from the first trained model LM1 so as to obtain a more accurate inference result for the predetermined features. The generation system 3 includes an interface (input/output device 31 and communication device 32), a storage device 33, and an arithmetic circuit . The generation system 3 is implemented by, for example, one terminal device. Examples of terminal devices include personal computers (desktop computers, laptop computers), mobile terminals (smartphones, tablet terminals, wearable terminals, etc.), and the like.

The input/output device 31 functions as an input device for inputting information from the user and as an output device for outputting information to the user. In other words, the input/output device 31 is used for inputting information to the generation system 3 and for outputting information from the generation system 3 . The input/output device 31 has one or more human-machine interfaces. Examples of human-machine interfaces include keyboards, pointing devices (mouse, trackball, etc.), input devices such as touch pads, output devices such as displays and speakers, and input/output devices such as touch panels.

The communication device 32 is communicably connected to an external device or system. The communication device 32 is used for communication with the evaluation system 2 through the communication network 51 and communication with the inference system 4 through the communication network 52 . The communication device 32 has one or more communication interfaces. The communication device 32 can be connected to the

communication networks

51 and 52 and has a function of performing communication through the

communication networks

51 and 52 . The communication device 32 complies with a predetermined communication protocol. The predetermined communication protocol may be selected from various known wired and wireless communication standards.

The storage device 33 is used to store information used by the arithmetic circuit 34 and information generated by the arithmetic circuit 34 . The storage device 33 includes one or more storages (non-temporary storage media). The storage can be, for example, hard disk drives, optical drives, and solid state drives (SSDs). Also, the storage may be any of built-in type, external type, and NAS type. Note that the generation system 3 may include a plurality of storage devices 33 . Information may be distributed and stored in a plurality of storage devices 33 .

The information stored in the storage device 33 includes the first learned model LM1, the evaluation information D1, and the second learned model LM2. FIG. 6 shows a state in which the storage device 33 stores all of the first trained model LM1, the evaluation information D1, and the second trained model LM2. The first trained model LM1, the evaluation information D1, and the second trained model LM2 need not always be stored in the storage device 33, and are stored in the storage device 33 when required by the arithmetic circuit 34. It is good if there is In this embodiment, the evaluation information D1 is provided from the evaluation system 2 to the generation system 3. FIG.

The second trained model LM2 outputs the result of inference regarding the object in response to the input of the target image in which the object is shown. The second trained model LM2 is generated using the first trained model LM1. In particular, the second trained model LM2 is generated from the first trained model LM1 so as to obtain more accurate inference results for a given feature. FIG. 7 is a schematic diagram of the second trained model LM2. The second trained model LM2 in FIG. 7 includes a feature extraction unit F1, a determination unit F2, and a correction unit F3. The feature extraction unit F1 extracts a feature vector V of an object appearing in an input target image. The correction unit F3 is located between the feature extraction unit F1 and the determination unit F2. The correction unit F3 corrects the feature vector V extracted by the feature extraction unit F1 based on the effectiveness of each of the plurality of components of the feature vector V. FIG. The effectiveness of each of the plurality of components of the feature vector V is set based on the change of each of the plurality of components of the feature vector V with respect to the change of the predetermined feature of the object. A component that emphasizes a predetermined feature in the feature vector V has a high degree of effectiveness (for example, "1"), and a component that does not emphasize a predetermined feature in the feature vector V has a low degree of effectiveness ( set to '0'). Effectiveness will be described in detail later. The determination unit F2 outputs a result of inference regarding the object based on the feature vector VA corrected by the correction unit F3. In this way, the second trained model LM2 differs from the first trained model LM1 in that it includes the corrector F3.

The arithmetic circuit 34 is a circuit that controls the operation of the generation system 3. The arithmetic circuit 34 is connected to the input/output device 31 and the communication device 32 and can access the storage device 33 . Arithmetic circuit 34 may be realized by, for example, a computer system including one or more processors (microprocessors) and one or more memories. One or more processors execute a program (stored in one or more memories or storage devices 33) to realize the functions of the arithmetic circuit 34. FIG. Although the program is pre-recorded in the storage device 33 here, it may be provided through an electric communication line such as the Internet or recorded in a non-temporary recording medium such as a memory card.

The arithmetic circuit 34 generates the second trained model LM2. More specifically, the arithmetic circuit 34 generates the second trained model LM2 from the first trained model LM1 based on the evaluation information D1. The arithmetic circuit 34 executes, for example, the generation method shown in FIG. FIG. 8 is a flow chart of an example of a generation method executed by the generation system 3. FIG. The generation method of FIG. 8 includes a determination process S21 and a generation process S22.

The decision processing S21 decides the effectiveness of each of the plurality of components of the feature vector based on the evaluation information D1. The validity is multiplied by the corresponding component of the feature vector, for example. Effectiveness is determined by the degree to which the components of the feature vector emphasize the feature of interest. The evaluation information D1 is used to determine whether the component of the feature vector emphasizes the feature of interest. In this embodiment, the evaluation information D1 indicates the second set obtained by the second extraction processing S132. For example, the validity level is set to "1" for the components included in the second set, and the validity level is set to "0" for the components not included in the second set. An efficacy of "1" means that the component is used, and an efficacy of "0" means that the component is not used.

The generation processing S22 corrects the feature vector V extracted from the input target image in the first trained model LM1 based on the effectiveness of each of the plurality of components of the feature vector V determined in the determination processing S21. A second trained model LM2 is generated from the first trained model LM1 by modifying the obtained corrected feature vector VA so as to output an inference result regarding the object. In the present embodiment, the generation processing S22 adds a correction unit F3 that corrects the feature vector V extracted by the feature extraction unit F1 between the feature extraction unit F1 of the first trained model LM1 and the determination unit F2. Then, the determination unit F2 is changed to output the result of inference regarding the object based on the feature vector VA corrected by the correction unit F3, thereby generating the second trained model LM2 from the first trained model LM1. do. The generation process S22 generates a trained model without additional learning. The correction unit F3 corrects based on the effectiveness of each of the plurality of components of the feature vector determined in the determination processing S21. For example, the feature vector components v1, v2, v3, v4, v5 are 0.7, 0.4, 0.2, 0.8, 0.3, and the valid Let the degrees be 1,1,0,1,0. In this case, the corrected feature vector components v1, v2, v3, v4, and v5 are 0.7, 0.4, 0.0, 0.8, and 0.0.

According to the generation system 3 described above, the second trained model LM2 is obtained from the first trained model LM1 without additional learning. In the second trained model LM2, the correction unit F3 corrects the plurality of components of the feature vector V extracted by the feature extraction unit F1 based on the effectiveness of each of the plurality of components of the feature vector V for the target feature. be done. By setting the effectiveness of the target feature in this way, it is possible to emphasize the component that emphasizes the target feature more than the component that does not emphasize the target feature. Improvement in inference accuracy can be expected. For example, in a usage environment with many similar clothes such as an office with many suits or a factory with many work clothes, the first trained model LM1 trained with public data may not exhibit sufficient performance. For example, if the first trained model LM1 is trained to emphasize the color of clothes, it may not be able to distinguish between people well in an environment where there are many similar clothes. In such a case, the components included in the second set are emphasized using the evaluation information D1 about the features (face, body shape, shoes, color of accessories, etc.) that are not similar between the objects from the characteristics of the objects. A second trained model LM2 is generated by adding the correction part F3 to the first trained model LM1. The second trained model LM2 including such a correction part F3 enables inference using features unique to objects that are not similar among objects, and thus improves performance.

[1.1.1.3 Inference system]
FIG. 9 is a block diagram of the inference system 4. As shown in FIG. The inference system 4 uses the trained model LM2 to output the result of inference regarding the object in response to the input of the target image in which the object appears. The inference system 4 includes an interface (input/output device 41 and communication device 42 ), a storage device 43 and an arithmetic circuit 44 . The inference system 4 is implemented by, for example, one terminal device. Examples of terminal devices include personal computers (desktop computers, laptop computers), mobile terminals (smartphones, tablet terminals, wearable terminals, etc.), and the like.

The input/output device 41 functions as an input device for inputting information from the user and as an output device for outputting information to the user. In other words, the input/output device 41 is used for inputting information to the inference system 4 and for outputting information from the inference system 4 . The input/output device 41 has one or more human-machine interfaces. Examples of human-machine interfaces include keyboards, pointing devices (mouse, trackball, etc.), input devices such as touch pads, output devices such as displays and speakers, and input/output devices such as touch panels.

The communication device 42 is communicably connected to an external device or system. The communication device 42 is used for communication with the generation system 3 through the communication network 52 . The communication device 42 has one or more communication interfaces. The communication device 42 is connectable to a communication network 52 and has a function of communicating through the communication network 52 . The communication device 42 complies with a predetermined communication protocol. The predetermined communication protocol may be selected from various known wired and wireless communication standards.

The storage device 43 is used to store information used by the arithmetic circuit 44 and information generated by the arithmetic circuit 44 . The storage device 43 includes one or more storages (non-temporary storage media). The storage can be, for example, hard disk drives, optical drives, and solid state drives (SSDs). Also, the storage may be any of built-in type, external type, and NAS type. Note that the inference system 4 may include a plurality of storage devices 43 . Information may be distributed and stored in a plurality of storage devices 43 .

The information stored in the storage device 43 includes the second trained model LM2. FIG. 9 shows a state in which the storage device 43 stores the second trained model LM2. The second trained model LM2 need not always be stored in the storage device 43, and may be stored in the storage device 43 when it is required by the arithmetic circuit 44. FIG. In this embodiment, the second trained model LM2 is provided from the generation system 3 to the inference system 4. FIG.

The arithmetic circuit 44 is a circuit that controls the operation of the inference system 4 . The arithmetic circuit 44 is connected to the input/output device 41 and the communication device 42 and can access the storage device 43 . Arithmetic circuit 44 may be realized by, for example, a computer system including one or more processors (microprocessors) and one or more memories. One or more processors execute a program (stored in one or more memories or storage devices 43) to realize the functions of the arithmetic circuit 44. FIG. Although the program is pre-recorded in the storage device 43 here, it may be provided through an electric communication line such as the Internet or recorded in a non-temporary recording medium such as a memory card.

The arithmetic circuit 44 makes an inference using the second trained model LM2. The arithmetic circuit 44 executes, for example, the inference method shown in FIG. FIG. 10 is a flow chart of an example of an inference method executed by the inference system 4 . The inference method of FIG. 10 includes acquisition processing S31 and inference processing S32.

Acquisition processing S31 acquires a predetermined target image. The predetermined target image is a target image in which an object to be inferred by the inference system 4 is shown. Acquisition processing S31 acquires a predetermined target image by the input/output device 41, for example. In the acquisition process S31, for example, a screen for inputting a predetermined target image is presented by the input/output device 41, and the user can input the predetermined target image according to instructions on the screen. The predetermined target image is input not only by inputting the predetermined target image from an external device to the inference system 4, but also by specifying an image to be used as the predetermined target image from the images stored in the inference system 4. may contain.

The inference processing S32 inputs the predetermined target image acquired in the acquisition processing S31 to the second trained model LM2 stored in the storage device 43, and acquires the result of inference regarding the object appearing in the predetermined target image. In the present embodiment, the result of the inference processing S32 indicates whether or not the target appearing in the predetermined target image acquired in the acquisition processing S31 matches the predetermined target.

According to the inference system 4 described above, inference is executed using the second trained model LM2. In the second trained model LM2, the correction unit F3 corrects the plurality of components of the feature vector extracted by the feature extraction unit F1 based on the effectiveness of each of the plurality of components of the feature vector for the target feature. . By setting the effectiveness of the target feature in this way, it is possible to emphasize the component that emphasizes the target feature more than the component that does not emphasize the target feature. Improvement in inference accuracy can be expected.

[1.1.2 Effects, etc.]
The evaluation system 2 described above includes a storage device 23 that stores a trained model LM1 that outputs an inference result regarding an object in response to an input of a target image in which the object is captured, and an arithmetic circuit that evaluates the trained model LM1. 24. The trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The arithmetic circuit 24 executes a first acquisition process S11, a second acquisition process S12, and an evaluation process S13. A first acquisition process S11 acquires a first feature vector corresponding to the first target by inputting a first target image in which the first target is shown to the trained model LM1. A second acquisition process S12 acquires a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model LM1. do. The evaluation process S13 evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison of the first feature vector and the second feature vector.

According to the evaluation system 2, for the trained model LM1, an evaluation of the change in each of the multiple components of the feature vector with respect to the change in the predetermined feature is obtained. For example, in the case of a person, the predetermined characteristics include head and body, body type, hair color, clothing color, shoe color, inner shirt color, and the like. Therefore, it is possible to identify effective components for a given feature in the feature vector. Inference can be made using a component effective for a predetermined feature among a plurality of components of a feature vector, and an improvement in inference accuracy can be expected. Furthermore, it is only necessary to use a component that is effective for a predetermined feature among a plurality of components of the feature vector, and there is no need to perform additional learning in which a predetermined feature is emphasized in the trained model itself. Therefore, the evaluation system 2 enables improvement of inference accuracy without additional learning.

In the evaluation system 2, the evaluation process S13 includes a first extraction process S131 for extracting a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector, and a second extraction process S132 for extracting a component for which the difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value among the components obtained. This configuration allows for improved accuracy in evaluating changes in each of the multiple components of the feature vector.

Also, in the evaluation system 2, the threshold is set based on the representative value of the values in the first feature vector of the plurality of components of the feature vector. This configuration allows for improved accuracy in evaluating changes in each of the multiple components of the feature vector.

In addition, in the evaluation system 2, the predetermined features include at least one of features related to the color of the object and features related to the shape of the object. This configuration allows for improved inference accuracy.

Also, in the evaluation system 2, the features related to the color of the object include hue, brightness, saturation, and contrast. Features relating to the shape of the object include the aspect ratio of the object, the head and body of the object, and the body shape of the object. This configuration allows for improved inference accuracy.

In addition, in the evaluation system 2, the inference result indicates whether or not the object shown in the target image matches the specific object. This configuration enables improved inference accuracy as to whether an object appearing in the target image matches a specific object.

It can be said that the evaluation system 2 executes the following method (evaluation method). That is, the evaluation method evaluates the trained model LM1 that outputs the result of inference regarding the object in response to the input of the target image in which the object is captured. The trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation method includes a first acquisition process S11, a second acquisition process S12, and an evaluation process S13. A first acquisition process S11 acquires a first feature vector corresponding to the first target by inputting a first target image in which the first target is shown to the trained model LM1. A second acquisition process S12 acquires a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model LM1. do. The evaluation process S13 evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison of the first feature vector and the second feature vector. This configuration allows for improved inference accuracy without additional learning.

The evaluation system 2 is implemented using an arithmetic circuit 24. That is, the method (evaluation method) executed by the evaluation system 2 can be realized by the arithmetic circuit 24 executing the program. This program is a computer program for causing the arithmetic circuit 24 to execute the evaluation method described above. This configuration allows for improved inference accuracy without additional learning.

The generation system 3 described above has a memory that stores the first trained model LM1 that outputs the result of inference about the object in response to the input of the target image that shows the object, and the evaluation information D1 of the first trained model LM1. A device 33 and an arithmetic circuit 34 for generating a second trained model LM2 from the first trained model LM1 based on the evaluation information D1. The first trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The arithmetic circuit 34 executes a determination process S21 and a generation process S22. The determination processing S21 determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information D1 for the target feature among the one or more predetermined features. The generating process S22 obtains the first trained model LM1 by correcting the feature vectors extracted from the input target image based on the effectiveness of each of the plurality of components of the feature vectors determined in the determining process S21. A second trained model LM2 is generated from the first trained model LM1 by changing the output of the inference result regarding the object based on the corrected feature vector.

The generation system 3 adds to the first trained model LM1 a process of correcting the feature vector based on the effectiveness of each of the plurality of components of the feature vector, thereby converting the first trained model LM1 to the second trained model Generate LM2. As a result, in the second trained model LM2, it becomes possible to make an inference using an effective component for a predetermined feature among the plurality of components of the feature vector, and an improvement in inference accuracy can be expected. Furthermore, in generating the second trained model LM2, it is sufficient to add the above-described processing to the first trained model LM1, and additional learning that emphasizes a predetermined feature in the second trained model LM2 itself is performed. you don't have to Therefore, the generation system 3 enables improvement of inference accuracy without additional learning.

It can be said that the generation system 3 executes the following method (generation method). That is, the generation method is to generate a second Generate a trained model LM2. The first trained model LM1 is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation information D1 indicates the evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The generation method includes determination processing S21 and generation processing S22. The decision processing S21 decides the effectiveness of the plurality of components of the feature vector based on the evaluation information for the target feature among the plurality of predetermined features. The generating process S22 obtains the first trained model LM1 by correcting the feature vectors extracted from the input target image based on the effectiveness of each of the plurality of components of the feature vectors determined in the determining process S21. A second trained model LM2 is generated from the first trained model LM1 by changing the output of the inference result regarding the object based on the corrected feature vector. This configuration allows for improved inference accuracy without additional learning.

The generation system 3 is implemented using an arithmetic circuit 34. That is, the method (generation method) executed by the generation system 3 can be realized by the arithmetic circuit 34 executing the program. This program is a computer program for causing the arithmetic circuit 34 to execute the above generation method. This configuration allows for improved inference accuracy without additional learning.

The inference system 4 described above includes a storage device 43 that stores a learned model LM2 that outputs an inference result regarding an object in response to an input of a target image in which the object appears, and an arithmetic circuit 44. The trained model LM2 extracts the feature vector of the object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and converts the corrected feature vector to configured to output a result of an inference about the object based on the The effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object. The arithmetic circuit 24 executes an acquisition process S31 and an inference process S32. Acquisition processing S31 acquires a predetermined target image. The inference processing S32 inputs the predetermined target image acquired in the acquisition processing S31 to the learned model LM2 stored in the storage device 43, and acquires the result of inference regarding the object appearing in the predetermined target image.

The learned model LM2 used by the inference system 4 includes processing for correcting the feature vector based on the effectiveness of each of the multiple components of the feature vector. As a result, in the trained model LM2, it becomes possible to make an inference using an effective component for a predetermined feature among the plurality of components of the feature vector, and an improvement in inference accuracy can be expected. Further, the effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object. Therefore, it is not necessary to perform additional learning such as emphasizing predetermined features in the trained model itself. Therefore, the inference system 4 enables improvement in inference accuracy without additional learning.

It can be said that the inference system 4 executes the following method (inference method). That is, the inference method uses a trained model LM2 that outputs an inference result regarding an object in response to an input of a target image in which the object appears. The trained model LM2 extracts the feature vector of the object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and converts the corrected feature vector to configured to output a result of an inference about the object based on the The effectiveness of each of the plurality of feature vector components is set based on the change of the plurality of feature vector components with respect to the predetermined feature change in the object. The inference method includes an acquisition process S31 and an inference process S32. Acquisition processing S31 acquires a predetermined target image. The inference processing S32 inputs the predetermined target image acquired in the acquisition processing S31 to the learned model LM2 and acquires the result of inference regarding the target object appearing in the predetermined target image. This configuration allows for improved inference accuracy without additional learning.

The inference system 4 is implemented using an arithmetic circuit 44. That is, the method (generation method) executed by the inference system 4 can be realized by the arithmetic circuit 44 executing the program. This program is a computer program for causing the arithmetic circuit 44 to execute the inference method described above. This configuration allows for improved inference accuracy without additional learning.

The learned model LM2 described above outputs the result of inference regarding the object in response to the input of the target image in which the object is captured. The trained model LM2 extracts the feature vector of the object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and converts the corrected feature vector to configured to output a result of an inference about the object based on the The effectiveness of each of the multiple components of the feature vector is set based on the change of each of the multiple components of the feature vector with respect to the change of the predetermined feature in the object. This configuration allows for improved inference accuracy without additional learning.

[1.2. Embodiment 2]
[1.2.1 Configuration]
FIG. 11 is a block diagram of a configuration example of an information processing system 1A according to the second embodiment. The information processing system 1A of FIG. 11 enables execution of re-matching of the target object in the same manner as the information processing system 1 of FIG. The information processing system 1A of FIG. 11 newly generates a trained model suitable for the environment in which re-matching is performed from a trained model prepared in advance, and enables re-matching using the newly generated trained model. is used to

The information processing system 1A of FIG. 11 includes an evaluation system 2A, a generation system 3A, and an inference system 4.

FIG. 12 is a block diagram of the evaluation system 2A. The evaluation system 2A evaluates a first trained model LM1 prepared in advance that outputs an inference result regarding an object in response to an input of a target image showing the object. The evaluation system 2A includes an interface (input/output device 21 and communication device 22), a storage device 23A, and an arithmetic circuit 24A.

The information stored in the storage device 23A includes the first learned model LM1, the database DB1, and the evaluation information D1A. FIG. 12 shows a state in which the storage device 23A stores all of the first trained model LM1, the database DB1, and the evaluation information D1. The first trained model LM1, the database DB1, and the evaluation information D1A need not always be stored in the storage device 23A, and may be stored in the storage device 23A when required by the arithmetic circuit 24A. .

The database DB1 contains data used for evaluating the first trained model LM1. Database DB1 includes a plurality of first target images and a plurality of second target images. For one first object, there can be a plurality of second objects that differ from the first object in a plurality of predetermined characteristics. In the present embodiment, a plurality of first target images each showing a plurality of different first targets are registered in the database DB1. In the database DB1, a plurality of second target images are registered in which a plurality of second target objects having predetermined features different from the plurality of first target objects are captured for each of a plurality of predetermined features different from each other. Note that the number of images registered in the database DB1 is smaller than, for example, the number of images required to generate a reuse model by additional learning of the first trained model LM1.

The arithmetic circuit 24A evaluates the first trained model LM1. 24 A of arithmetic circuits perform the evaluation method shown in FIG. 13, for example. FIG. 13 is a flow chart of an example of an evaluation method executed by the evaluation system 2A.

The evaluation method of FIG. 13 includes a first acquisition process S11A, a second acquisition process S12A, and an evaluation process S13A. 14 is a schematic illustration of the evaluation method of FIG. 13. FIG.

In the first acquisition processing S11A, as shown in FIG. 14, the first target image 61 in which the first target object 71 is captured is input to the first trained model LM1 (especially the feature extraction unit F1), and the first target image 61 is Obtain the corresponding first feature vector V1. The first target image 61 is acquired from the database DB1, for example. In the present embodiment, a plurality of first target images 61 each showing a plurality of different first targets 71 are registered in the database DB1. In the first acquisition processing S11A, a plurality of first target images 61 each including a plurality of first targets 71 different from each other are input to the first trained model LM1, and a plurality of first target images 61 corresponding to the plurality of first targets 71 are obtained. obtain the first feature vector V1 of A plurality of first feature vectors V1 respectively corresponding to a plurality of first objects 71 different from each other are obtained by the first acquisition processing S11A.

As shown in FIG. 14, the second acquisition process S12A is a second target object 72 in which a second target object 72 having a predetermined feature different from the first target object 71 appears in the first trained model LM1 (especially the feature extraction unit F1). A second feature vector V2 corresponding to the second object 72 is obtained by inputting the image 62 . In FIG. 14, as an example, the predetermined characteristic is the aspect ratio of a person's head. The head 71a of the first object 71 and the head 72a of the second object 72 have different aspect ratios. The clothes 71b of the first object 71 and the clothes 72b of the second object 72 are similar in clothes and have the same color. The second target image 62 is acquired from the database DB1, for example. In this embodiment, the database DB1 stores a plurality of second target images in which a plurality of second target objects 72 having predetermined features different from the plurality of first target objects 71 are captured for each of a plurality of predetermined features different from each other. 62 is registered. A second acquisition process S12A acquires a second feature vector V2 by inputting the second target image 62 to the first trained model LM1 for each of a plurality of different predetermined features. In the second acquisition process S12, a plurality of second target images 62 including a plurality of second targets 72 having predetermined characteristics different from the plurality of first targets 71 are input to the first trained model LM1. A plurality of second feature vectors V2 corresponding to a plurality of second objects 72 are acquired. Through the second acquisition process S12, second feature vectors V2 of a plurality of second objects 72 different from the first target object 71 in a plurality of predetermined features are obtained for each first feature vector V1 of the first target object 71. be done.

As shown in FIG. 14, the evaluation process S13A evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison between the first feature vector V1 and the second feature vector V2. The evaluation process S13A generates evaluation information D1 indicating the result of this evaluation. The evaluation process S13A of FIG. 13 includes a first extraction process S131A, a second extraction process S132A, and an arithmetic process S133A.

The first extraction process S131A extracts a component whose value in the first feature vector is equal to or greater than a threshold from multiple components of the feature vector. In the present embodiment, a plurality of first feature vectors respectively corresponding to a plurality of different first objects are obtained by the first acquisition processing S11A. Therefore, the first extraction process S131A extracts a component whose value in the first feature vector is equal to or greater than the threshold from the plurality of components of the feature vector for each of the plurality of first target images. The threshold is set based on a representative value of values in the first feature vector of the plurality of components of the feature vector. Through the first extraction processing S131A, a first set in which values in the first feature vector are equal to or greater than the threshold is obtained for a plurality of components of the feature vector.

The second extraction process S132A extracts a component whose difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value among the components extracted in the first extraction process S131A. In the present embodiment, the second acquisition processing S12A obtains second feature vectors of a plurality of second targets that differ from the first target in a plurality of predetermined features for each first feature vector of the first target. can get. Therefore, for each second feature vector of a plurality of second objects having different predetermined features, the value of the first feature vector among the components extracted in the first extraction processing S131A and the value of the second feature vector A component whose difference from the value is equal to or greater than a predetermined value is extracted. The second extraction process S132A obtains a second set in which the difference between the values in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value for the components extracted in the first extraction process S131A. The second set is a subset of the first set.

The arithmetic processing S133A obtains the ratio of the number of times extracted in the second extraction processing S132A to the number of times extracted in the first extraction processing S131A as a response rate to a change in a predetermined feature for each of the plurality of components of the feature vector. . The number of times extracted in the first extraction process S131A is the number of times included in the first set, and the number of times extracted in the second extraction process S132A is the number of times included in the second set. For example, assume that the number of extractions in the first extraction processing S131A is 100 and the number of extractions in the second extraction processing S132A is 10 for the component v1 of the feature vector. In this case, the response rate to a change in the predetermined feature of component v1 of the feature vector is 10/100=0.1. Through the arithmetic processing S133A, the reaction rate for each of the plurality of predetermined features is obtained for each component of the feature vector. This makes it easy to grasp which component of the feature vector responds well to which predetermined feature.

The evaluation information D1A is generated by the evaluation processing S13A. The evaluation information D1A indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. In particular, in the present embodiment, the evaluation information D1A indicates the response rate to a change in a predetermined feature for each of the plurality of components of the feature vector. Table 1 below is an example of the evaluation information D1A. In Table 1, the predetermined features are hue, brightness, contrast, aspect ratio, and head-to-body.

FIG. 15 is a block diagram of the generating system 3A. The generation system 3A generates the second trained model LM2 from the first trained model LM1. In particular, the generation system 3A uses the evaluation information D1A generated by the evaluation system 2A. The evaluation information D1A indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The generation system 3A generates the second trained model LM2 from the first trained model LM1 so as to obtain a more accurate inference result for the predetermined feature. The generation system 3A includes an interface (input/output device 31 and communication device 32), a storage device 33A, and an arithmetic circuit 34A.

The information stored in the storage device 33A includes the first learned model LM1, the evaluation information D1A, and the second learned model LM2. The first trained model LM1, the evaluation information D1A, and the second trained model LM2 need not always be stored in the storage device 33A, and are stored in the storage device 33A when required by the arithmetic circuit 34A. It is good if there is

The arithmetic circuit 34A generates the second trained model LM2. More specifically, the arithmetic circuit 34A generates the second trained model LM2 from the first trained model LM1 based on the evaluation information D1A. 34 A of arithmetic circuits perform the production|generation method shown in FIG. 16, for example. FIG. 16 is a flow chart of an example of a generation method executed by the generation system 3A. The generation method of FIG. 16 includes determination processing S21A and generation processing S22A.

The determination processing S21A determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information D1A for the target feature among the one or more predetermined features.

A feature of interest is selected from one or more predetermined features based on whether it affects the result of inference of the second trained model LM2 in the usage environment of the second trained model LM2. For example, if the usage environment of the second trained model LM2 is an office or a factory, there is a high possibility that the objects appearing in the target images input to the second trained model LM2 are people wearing the same or similar clothes. . In such a case, the feature of the target's clothing color, such as the green of work clothes or the black of a suit, is a feature common to a plurality of targets, and does not affect the inference result of the second trained model LM2. less likely to give On the other hand, the feature of the color of shoes, the color of the inner shirt visible from the neck, the texture of the face, or the accessories worn are more specific to the object than the color of clothes. and is likely to affect the inference result of the second trained model LM2. By setting the validity level for such features unique to the object, it is possible to emphasize the components that emphasize the features unique to the object over the components that do not emphasize the characteristics unique to the object. Therefore, an improvement in the inference accuracy of the second trained model LM2 can be expected. The feature of the object may be determined by a human eye or the like from a plurality of predetermined features, or may be determined automatically.

For example, the effectiveness is multiplied by the corresponding component of the feature vector. Effectiveness is determined by the degree to which the components of the feature vector emphasize the feature of interest. The evaluation information D1A is used to determine whether the component of the feature vector emphasizes the feature of interest. In the present embodiment, the evaluation information D1A indicates the response rate to the change in the feature of interest for each of the plurality of components of the feature vector. For example, depending on whether the reaction ratio of the component is equal to or greater than a reference value, it is determined whether or not a given feature is emphasized. The degree of effectiveness is set to "1" for components whose reaction rate is equal to or greater than the reference value, and the degree of effectiveness is set to "0" for components whose reaction rate is less than the reference value. An efficacy of "1" means that the component is used, and an efficacy of "0" means that the component is not used. The reference value may be a fixed value, or may be set in consideration of the performance of the second trained model LM2. Since the reaction rate is a value between 0 and 1, changing the reference value from 0 to 1 changes the effectiveness of each of the plurality of components of the feature vector. Therefore, the effectiveness of each of the plurality of components of the feature vector can be determined by the reference value when the performance of the second trained model LM2 is the best.

The generation processing S22A corrects the feature vector V extracted from the input target image in the first trained model LM1 based on the effectiveness of each of the plurality of components of the feature vector V determined in the determination processing S21. A second trained model LM2 is generated from the first trained model LM1 by modifying the obtained corrected feature vector VA so as to output an inference result regarding the object. In the present embodiment, the generation processing S22A adds a correction unit F3 that corrects the feature vector extracted by the feature extraction unit F1 between the feature extraction unit F1 and the determination unit F2 of the first trained model LM1. , the second trained model LM2 is generated from the first trained model LM1 by changing the determination unit F2 to output the result of inference regarding the object based on the feature vector corrected by the correction unit F3. The generation processing S22A generates a trained model without additional learning.

In the generation system 3A, features that are not similar between objects (face, body shape, shoes, color of accessories, etc.) are selected from among the features of the objects, and components with a high response rate to the characteristics of the object are selected. can be added to the first trained model LM1 to generate the second trained model LM2. The second trained model LM2 including such a correction part F3 enables inference using features unique to objects that are not similar among objects, and thus improves performance.

[1.2.2 Effects, etc.]
In the evaluation system 2A described above, the first acquisition processing S11A inputs a plurality of first target images each showing a plurality of different first targets to the first trained model LM1 to obtain a plurality of first targets. a plurality of first feature vectors respectively corresponding to . In the second acquisition processing S12A, a plurality of second target images in which a plurality of second target objects having predetermined characteristics different from the plurality of first target objects are respectively input to the first trained model LM1 to obtain a plurality of second target images. A plurality of second feature vectors corresponding to the two objects are obtained. The evaluation process S13A includes a first extraction process S131A, a second extraction process S132A, and an arithmetic process S133A. The first extraction processing S131A extracts a component whose value in the first feature vector is equal to or greater than a threshold from the plurality of components of the feature vector for each of the plurality of first target images. The second extraction process S132A extracts a component whose difference between the value in the first feature vector and the value in the second feature vector is equal to or greater than a predetermined value among the components extracted in the first extraction process S131A. The arithmetic processing S133A obtains the ratio of the number of times of extraction in the second extraction process to the number of times of extraction in the first extraction process as a response rate to a change in a predetermined feature for each of the plurality of components of the feature vector. This configuration allows obtaining an estimate of the change in each of the multiple components of the feature vector.

Also, in the evaluation system 2A, the second acquisition process S12A acquires a second feature vector by inputting the second target image to the first trained model LM1 for each of a plurality of different predetermined features. The evaluation processing S13A evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature for each of the plurality of predetermined features. This configuration allows obtaining an estimate of the change in each of the multiple components of the feature vector for multiple predetermined features.

Also, in the evaluation system 2A, the threshold is set based on the representative value of the values in the first feature vector of the plurality of components of the feature vector. This configuration makes it possible to improve the accuracy of evaluation of changes in each of the plurality of components of the feature vector.

[2. Modification]
Embodiments of the present disclosure are not limited to the above embodiments. The above-described embodiment can be modified in various ways according to the design, etc., as long as the subject of the present disclosure can be achieved. Modifications of the above embodiment are listed below. Modifications described below can be applied in combination as appropriate.

In one variation, the information processing system 1 may include at least one of the evaluation system 2, the generation system 3, and the inference system 4. The program may be a program for causing an arithmetic circuit to execute at least one of an evaluation method, a generation method, and an inference method. This point also applies to the information processing system 1A.

In one variation, the result of inference is not particularly limited. The result of the inference may be the result of classification of objects appearing in the target image.

In a modified example, the second acquisition process S12 may acquire a second feature vector by inputting the second target image to the first trained model LM1 for each of a plurality of different predetermined features. That is, a plurality of second target images having different predetermined characteristics may be set for one first target image.

In one modification, in the information processing system 1, it is not essential that the evaluation system 2, the generation system 3, and the inference system 4 are implemented by different computer systems. At least two of the evaluation system 2, generation system 3, and reasoning system 4 may be implemented in a single computer system. This point also applies to the information processing system 1A.

In one variation, evaluation system 2 (2A), generation system 3 (3A), and reasoning system 4 need not include both input/

output devices

21, 31, 41 and

communication devices

22, 32, 42, respectively. Absent. This point is the same for the evaluation system 2A and the generation system 3A.

In one variation, each of the evaluation system 2, generation system 3, and reasoning system 4 may be implemented in multiple computer systems. In other words, it is not essential that multiple functions (components) in each of the evaluation system 2, the generation system 3, and the inference system 4 are integrated in one housing, and the evaluation system 2, the generation system 3, and the Each component of the inference system 4 may be distributed over a plurality of housings. Furthermore, even if at least some functions of each of the evaluation system 2, the generation system 3, and the inference system 4, for example, some functions of the

arithmetic circuits

24, 34, and 44 are realized by the cloud (cloud computing), etc. good. This point is the same for the evaluation system 2A and the generation system 3A.

[3. mode]
As is clear from the above embodiments and modifications, the present disclosure includes the following aspects. In the following, reference numerals are attached with parentheses only for the purpose of clarifying correspondence with the embodiments.

A first aspect is an evaluation system (2; 2A), which is a storage device ( 23; 23A) and an arithmetic circuit (24; 24A) for evaluating the learned model (LM1). The trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The arithmetic circuit (24; 24A) executes a first acquisition process (S11; S11A), a second acquisition process (S12; S12A), and an evaluation process (S13; S13A). The first acquisition process (S11; S11A) acquires a first feature vector corresponding to the first target by inputting a first target image showing the first target to the learned model (LM1). In the second acquisition process (S12; S12A), a second target image in which a second target having a predetermined characteristic different from that of the first target is input to the trained model (LM1) to obtain the second target. The evaluation process (S13; S13A) for obtaining a second feature vector corresponding to the object includes determining a plurality of the feature vectors for changes in the predetermined feature based on a comparison between the first feature vector and the second feature vector. Evaluate the change in each of the components of This aspect allows for improved inference accuracy without additional learning.

The second aspect is the evaluation system (2) based on the first aspect. In the second aspect, the evaluation process (S13) includes a first extraction process (S131) and a second extraction process (S132). The first extraction process (S131) extracts a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector. In the second extraction process (S132), the difference between the value of the first feature vector and the value of the second feature vector among the components extracted in the first extraction process (S131) is equal to or greater than a predetermined value. Extract a component. This aspect makes it possible to improve the accuracy of evaluating changes in each of the plurality of components of the feature vector.

The third aspect is a rating system (2; 2A) based on the first aspect. In the third aspect, the second acquisition process (S12; S12A) inputs the second target image to the first trained model (LM1) for each of the plurality of predetermined features different from each other, and Obtain a second feature vector. The evaluation process (S13; S13A) evaluates changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature for each of the plurality of predetermined features. This aspect allows obtaining an estimate of the change in each of the multiple components of the feature vector for multiple predetermined features.

The fourth aspect is the evaluation system (2A) based on the third aspect. In the fourth aspect, the first acquisition process (S11A) includes inputting a plurality of first target images each including a plurality of different first target objects to the second trained model (LM1). Obtaining a plurality of first feature vectors respectively corresponding to the plurality of first objects. The second acquisition process (S12A) includes obtaining the plurality of second target images, in which the plurality of second target objects having the predetermined characteristics different from the plurality of first target objects, are captured by the second trained model ( LM1) to obtain a plurality of second feature vectors corresponding to the plurality of second objects. The evaluation process (S13A) executes a first extraction process (S131A), a second extraction process (S132A), and an arithmetic process (S133). The first extraction process (S131A) extracts a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector for each of the plurality of first target images. In the second extraction process (S132A), the difference between the value of the first feature vector and the value of the second feature vector among the components extracted in the first extraction process (S131A) is equal to or greater than a predetermined value. Extract a component. In the evaluation process (S133A), with respect to each of the plurality of components of the feature vector, as a response rate to the change in the predetermined feature, Find the ratio of the number of times. This aspect allows obtaining an estimate of the change in each of the multiple components of the feature vector.

The fifth aspect is a rating system (2; 2A) based on the second or fourth aspect. In the fifth aspect, the threshold is set based on a representative value of values in the first feature vector of the plurality of components of the feature vector. According to this aspect, it is possible to improve the accuracy of evaluating the change in each of the plurality of components of the feature vector.

The sixth aspect is a rating system (2; 2A) based on any one of the first to fifth aspects. In a sixth aspect, the predetermined characteristic includes at least one of a color characteristic of the object and a shape characteristic of the object. According to this aspect, it is possible to improve the inference accuracy.

The seventh aspect is a rating system (2; 2A) based on the sixth aspect. In a seventh aspect, the color-related features of the object include hue, brightness, saturation, and contrast. The features related to the shape of the object include the aspect ratio of the object, the head and body of the object, and the body shape of the object. According to this aspect, it is possible to improve the inference accuracy.

The eighth aspect is a rating system (2; 2A) based on any one of the first to seventh aspects. In the eighth aspect, the inference result indicates whether or not the object appearing in the target image matches a specific object. According to this aspect, it is possible to improve the inference accuracy as to whether or not the object appearing in the target image matches the specific object.

A ninth aspect is an evaluation method for evaluating a trained model (LM1) that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. The evaluation method includes a first acquisition process (S11; S11A), a second acquisition process (S12; S12A), and an evaluation process (S13; S13A). The first acquisition process (S11; S11A) acquires a first feature vector corresponding to the first target by inputting a first target image showing the first target to the learned model (LM1). In the second acquisition process (S12; S12A), a second target image in which a second target having a predetermined characteristic different from that of the first target is input to the trained model (LM1) to obtain the second target. Obtain a second feature vector corresponding to the object. The evaluation process (S13; S13A) evaluates the change of each of the plurality of components of the feature vector with respect to the change of the predetermined feature based on the comparison between the first feature vector and the second feature vector. This aspect allows for improved inference accuracy without additional learning.

A tenth aspect is a generation system (3; 3A), which includes a first trained model (LM1) for outputting an inference result regarding the object in response to input of a target image in which the object is captured, and the first a storage device (33; 33A) for storing evaluation information (D1; D1A) of a trained model (LM1); and a second trained model (LM1) based on the evaluation information (D1; D1A). and an arithmetic circuit (34; 34A) for generating a trained model (LM2). The first trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. be. The evaluation information (D1; D1A) indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The arithmetic circuit (34; 34A) executes a determination process (S21; S21A) and a generation process (S22; S22A). The determination processing (S21; S21A) determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information (D1; D1A) for the target feature among the one or more predetermined features. The generating process (S22; S22A) converts the first trained model (LM1) into the feature vector extracted from the input target image, which is the feature vector determined in the determination process (S21; S21A). By changing to output the result of inference about the object based on the corrected feature vector obtained by correcting based on the effectiveness of each of the plurality of components, from the first trained model (LM1) Generate the second trained model (LM2). This aspect allows for improved inference accuracy without additional learning.

In an eleventh aspect, evaluation information ( D1; D1A) is a generation method for generating a second trained model (LM2). The first trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. be. The evaluation information (D1; D1A) indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The generation method includes a determination process (S21; S21A) and a generation process (S22; S22A). The determination processing (S21; S21A) determines effectiveness of a plurality of components of the feature vector based on the evaluation information for a target feature among the plurality of predetermined features. The generating process (S22; S22A) converts the first trained model (LM1) into the feature vector extracted from the input target image, which is the feature vector determined in the determination process (S21; S21A). By changing to output the result of inference about the object based on the corrected feature vector obtained by correcting based on the effectiveness of each of the plurality of components, from the first trained model (LM1) Generate the second trained model (LM2). This aspect allows for improved inference accuracy without additional learning.

A twelfth aspect is an inference system (4) comprising a storage device (43) storing a trained model (LM2) outputting an inference result regarding an object in response to an input of a target image showing the object. and an arithmetic circuit (44). The learned model (LM2) extracts a feature vector of an object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and corrects It is configured to output a result of inference about the object based on subsequent feature vectors. The effectiveness of each of the plurality of components of the feature vector is set based on the change of the plurality of components of the feature vector with respect to the change of a predetermined feature of the object. The arithmetic circuit (24) performs an acquisition process (S31) for acquiring a predetermined target image, and the learned target image stored in the storage device (43) for the predetermined target image acquired in the acquisition process (S31). and an inference process (S32) for inputting to the model (LM2) and obtaining an inference result regarding the object appearing in the predetermined target image. This aspect allows for improved inference accuracy without additional learning.

A thirteenth aspect is an inference method using a learned model (LM2) that outputs an inference result regarding an object in response to an input of a target image in which the object is captured. The learned model (LM2) extracts a feature vector of an object appearing in the input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and corrects It is configured to output a result of inference about the object based on subsequent feature vectors. The effectiveness of each of the plurality of components of the feature vector is set based on the change of the plurality of components of the feature vector with respect to the change of a predetermined feature of the object. The inference method includes acquisition processing (S31) for acquiring a predetermined target image, and inputting the predetermined target image acquired in the acquisition processing (S31) to the trained model (LM2) to obtain the predetermined target image. Inference processing (S32) for acquiring the result of inference about the object in the image. This aspect allows for improved inference accuracy without additional learning.

A fourteenth aspect is a trained model (LM2) that outputs a result of inference regarding an object in response to an input of a target image in which the object is shown, wherein the feature vector of the object shown in the input target image is correcting the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector; and outputting a result of inference about the object based on the corrected feature vector. . The validity of each of the plurality of components of the feature vector is set based on the change of each of the plurality of components of the feature vector with respect to the change of a predetermined feature of the object. This aspect allows for improved inference accuracy without additional learning.

In a fifteenth aspect, at least one of the evaluation method based on the ninth aspect, the generation method based on the eleventh aspect, and the inference method based on the thirteenth aspect is applied to arithmetic circuits (24, 24A, 34, 34A, 44) to be executed. This aspect allows for improved inference accuracy without additional learning.

A sixteenth aspect is an information processing system (1; 1A) comprising an evaluation system (2; 2A), a generation system (3; 3A), and an inference system (4). The evaluation system (2; 2A) generates evaluation information (D1; D1A) of a first trained model (LM1) that outputs an inference result regarding the object in response to an input of a target image showing the object. . The generating system (3; 3A) generates a second trained model (LM2) from the first trained model (LM1) based on the evaluation information (D1; D1A). The inference system (4) uses the second trained model (LM2) to output an inference result regarding the object in response to an input of a target image in which the object is shown. The first trained model (LM1) is configured to extract a feature vector of an object appearing in an input target image and output an inference result regarding the object based on the extracted feature vector. be. The evaluation system (2; 2A) executes a first acquisition process (S11; S11A), a second acquisition process (S12; S12A), and an evaluation process (S13; S13A). The first acquisition process (S11; S11A) acquires a first feature vector corresponding to the first target by inputting a first target image showing the first target to the first trained model (LM1). do. a second acquisition process (S12; S12A); 2. Obtain a second feature vector corresponding to the object. Evaluation processing (S13) evaluates changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature based on comparison between the first feature vector and the second feature vector, and outputs the evaluation information ( D1; D1A). The evaluation information (D1; D1A) indicates an evaluation of the change of each of the plurality of components of the feature vector with respect to each of the one or more predetermined features of the object with respect to the change of the predetermined feature. The generation system (3; 3A) executes a determination process (S21; S21A) and a generation process (S22; S22A). The determination processing (S21; S21A) determines the effectiveness of each of the plurality of components of the feature vector based on the evaluation information (D1; D1A) for the target feature among the one or more predetermined features. The generating process (S22; S22A) converts the first trained model (LM1) into the feature vectors extracted from the input target image, each of the plurality of components of the feature vectors determined in the determination process. By changing to output the result of inference about the object based on the corrected feature vector obtained by correcting based on the effectiveness, the second learned model (LM1) is changed from the first learned model (LM1) Generate a model (LM2). This aspect allows for improved inference accuracy without additional learning.

[4. the term]
In the present disclosure, terms related to machine learning are defined and used as follows.

A "learned model" is an "inference program" that incorporates "learned parameters".

"Trained parameters" refer to parameters (coefficients) obtained as a result of learning using the learning data set. A learned parameter is generated by inputting a learning data set to a learning program and mechanically adjusting it for a certain purpose. Although the trained parameters are adjusted according to the purpose of learning, they are merely parameters (information such as numerical values) by themselves, and they function as trained models only when they are incorporated into an inference program. For example, in the case of deep learning, among the learned parameters, the parameters used for weighting the links between nodes correspond to this.

"Inference program" refers to a program that can output certain results for input by applying the built-in learned parameters. For example, it is a program that defines a series of calculation procedures for applying learned parameters obtained as a result of learning to an image given as an input and outputting results (authentication and judgment) for the image. .

"Learning data set" is also known as a training data set. For raw data, preprocessing such as removal of missing values and outliers, addition of separate data such as label information (correct data), etc. Alternatively, it refers to secondary processed data generated to facilitate analysis by the target learning method by combining these and applying conversion/processing processing. The training data set may also contain data that has been "padded" by applying certain transformations to the raw data.

"Raw data" refers to data that is primarily obtained by users, vendors, other business operators, research institutions, etc., and that has been converted and processed so that it can be read into the database.

"Learning program" refers to a program that finds certain rules from a learning data set and executes an algorithm to generate a model that expresses those rules. Specifically, this corresponds to a program that defines a procedure to be executed by a computer in order to realize learning by the adopted learning method.

"Additional learning" means generating new learned parameters by applying a different training data set to an existing trained model and performing further learning.

"Reused model" means an inference program that incorporates learned parameters newly generated by additional learning.

The present disclosure relates to evaluation systems, evaluation methods, generation systems, generation methods, inference systems, inference methods, trained models, and programs. Specifically, an evaluation system and evaluation method that evaluates a pre-prepared trained model that outputs the result of inference about an object in response to an input of a target image in which the object is captured, and an input of a target image in which the target is captured. A generation system and generation method for generating a new trained model from a trained model that outputs the result of inference about the object, and a target image that shows the object using the trained model An inference system and inference method that outputs inference results regarding an object, a trained model that outputs inference results regarding an object in response to input of a target image in which the object appears, and an evaluation method, generation method, and inference method The present disclosure is applicable to a program for

1, 1A

Information processing system

2,

2A Evaluation system

23,

23A Storage device

24, 24A Arithmetic circuit 3,

3A Generation system

33,

33A Storage device

34, 34A Arithmetic circuit 4 Inference system 43 Storage device 44 Arithmetic circuit 61 First target image (target image)
62 second target image (target image)
71 first object (object)
72 second object (object)
LM1 trained model (first trained model)
LM2 trained model (second trained model)
D1, D1A Evaluation information S11, S11A First acquisition processing S12, S12A Second acquisition processing S13, S13A Evaluation processing S131, S131A First extraction processing S132, S132A Second extraction processing S133A Calculation processing S21, S21A Decision processing S22 Generation processing S31 Acquisition processing S32 Inference processing

Claims

a storage device that stores a trained model that outputs an inference result regarding the object in response to an input of a target image in which the object is captured;
an arithmetic circuit that evaluates the trained model;
with
The trained model extracts a feature vector of an object appearing in an input target image,
configured to output a result of inference about the object based on the extracted feature vector;
The arithmetic circuit is
a first acquisition process of acquiring a first feature vector corresponding to the first target by inputting a first target image in which the first target is shown to the trained model;
A second acquisition process of acquiring a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model. When,
an evaluation process for evaluating changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector;
run the
rating system.
The evaluation process includes:
a first extraction process for extracting a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector;
a second extraction process for extracting, from among the components extracted in the first extraction process, a component for which a difference between a value in the first feature vector and a value in the second feature vector is equal to or greater than a predetermined value;
including,
The evaluation system according to claim 1.
The second obtaining process obtains the second feature vector by inputting the second target image to the trained model for each of the plurality of predetermined features that are different from each other;
The evaluation process evaluates, for each of the plurality of predetermined features, changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature.
The evaluation system according to claim 1.
In the first acquisition process, a plurality of first target images each including a plurality of different first targets are input to the trained model, and a plurality of the first target images respectively corresponding to the plurality of first targets are obtained. obtain a first feature vector;
In the second acquisition process, a plurality of second target images in which a plurality of second targets having the predetermined characteristics different from the plurality of first targets are input to the trained model, respectively, and the Obtaining a plurality of second feature vectors corresponding to a plurality of second objects;
The evaluation process includes:
a first extraction process for extracting a component whose value in the first feature vector is equal to or greater than a threshold from a plurality of components of the feature vector for each of the plurality of first target images;
a second extraction process for extracting, from among the components extracted in the first extraction process, a component for which a difference between a value in the first feature vector and a value in the second feature vector is equal to or greater than a predetermined value;
calculating a ratio of the number of times extracted in the second extraction process to the number of times extracted in the first extraction process as a response rate to the change in the predetermined feature for each of the plurality of components of the feature vector; ,
including,
The evaluation system according to claim 3.
The threshold is set based on a representative value of values in the first feature vector of a plurality of components of the feature vector.
The evaluation system according to claim 2 or 4.
The predetermined feature includes at least one of a color feature of the object and a shape feature of the object.
The evaluation system according to any one of claims 1-5.
The color features of the object include hue, brightness, saturation, and contrast;
The features related to the shape of the object include the aspect ratio of the object, the head and body of the object, and the body shape of the object,
The evaluation system according to claim 6.
the result of the inference indicates whether or not the object in the target image matches a specific object;
The evaluation system according to any one of claims 1-7.
An evaluation method for evaluating a trained model that outputs an inference result regarding an object in response to an input of a target image in which the object is captured,
The trained model is configured to extract a feature vector of an object appearing in an input target image, and output an inference result regarding the object based on the extracted feature vector,
The evaluation method is
a first acquisition process of acquiring a first feature vector corresponding to the first target by inputting a first target image in which the first target is shown to the trained model;
A second acquisition process of acquiring a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model. When,
an evaluation process for evaluating changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature based on a comparison of the first feature vector and the second feature vector;
including,
Evaluation method.
a storage device that stores a first trained model that outputs an inference result regarding the object in response to an input of a target image showing the object and evaluation information of the first trained model;
an arithmetic circuit that generates a second trained model from the first trained model based on the evaluation information;
with
The first trained model is configured to extract a feature vector of an object appearing in an input target image, and output an inference result regarding the object based on the extracted feature vector,
The evaluation information indicates an evaluation of a change in each of a plurality of components of a feature vector with respect to each of the one or more predetermined features of the object with respect to a change in the predetermined feature;
The arithmetic circuit is
a determination process of determining effectiveness of each of the plurality of components of the feature vector based on the evaluation information for a target feature among the one or more predetermined features;
A post-correction obtained by correcting the first trained model based on the effectiveness of each of the plurality of components of the feature vector determined in the determination process for the feature vector extracted from the input target image a generating process for generating the second trained model from the first trained model by modifying the feature vector to output an inference result regarding the object;
run the
generation system.
Generation of generating a second trained model based on evaluation information of the first trained model from a first trained model that outputs an inference result regarding the object in response to an input of a target image showing the object. a method,
The first trained model is configured to extract a feature vector of an object appearing in an input target image, and output an inference result regarding the object based on the extracted feature vector,
The evaluation information indicates an evaluation of a change in each of a plurality of components of a feature vector with respect to each of the one or more predetermined features of the object with respect to a change in the predetermined feature;
The generating method is
a determination process of determining effectiveness of a plurality of components of the feature vector based on the evaluation information for a target feature among the plurality of predetermined features;
A post-correction obtained by correcting the first trained model based on the effectiveness of each of the plurality of components of the feature vector determined in the determination process for the feature vector extracted from the input target image a generating process for generating the second trained model from the first trained model by modifying the feature vector to output an inference result regarding the object;
including,
generation method.
a storage device that stores a trained model that outputs an inference result regarding the object in response to an input of a target image in which the object is captured;
an arithmetic circuit;
with
The learned model extracts a feature vector of an object appearing in an input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and corrects the feature after correction. configured to output a result of inference about the object based on the vector;
The effectiveness of each of the plurality of components of the feature vector is set based on a change in the plurality of components of the feature vector with respect to a change in a predetermined feature of the object;
The arithmetic circuit is
Acquisition processing for acquiring a predetermined target image;
an inference process of inputting the predetermined target image obtained by the obtaining process into the trained model stored in the storage device and obtaining an inference result regarding an object appearing in the predetermined target image;
run the
reasoning system.
An inference method using a trained model that outputs an inference result regarding an object in response to an input of a target image in which the object is captured,
The learned model extracts a feature vector of an object appearing in an input target image, corrects the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and corrects the feature after correction. configured to output a result of inference about the object based on the vector;
The effectiveness of each of the plurality of components of the feature vector is set based on a change in the plurality of components of the feature vector with respect to a change in a predetermined feature of the object;
The inference method includes:
Acquisition processing for acquiring a predetermined target image;
an inference process of inputting the predetermined target image acquired in the acquisition process into the trained model and acquiring an inference result regarding an object appearing in the predetermined target image;
including,
reasoning method.
A trained model that outputs an inference result regarding the object in response to an input of a target image in which the object is captured,
Extracting a feature vector of an object appearing in an input target image, correcting the extracted feature vector based on the effectiveness of each of a plurality of components of the feature vector, and correcting the feature vector based on the corrected feature vector configured to output the results of inferences about things,
The effectiveness of each of the plurality of components of the feature vector is set based on a change of each of the plurality of components of the feature vector with respect to a change in a predetermined feature of the object.
Trained model.
for causing an arithmetic circuit to execute at least one of the evaluation method according to claim 9, the generation method according to claim 11, and the inference method according to claim 13,
program.
an evaluation system that generates evaluation information of a first trained model that outputs an inference result regarding the object in response to an input of a target image showing the object;
a generation system that generates a second trained model from the first trained model based on the evaluation information;
an inference system that uses the second trained model to output an inference result regarding the object in response to input of a target image showing the object;
with
The first trained model is configured to extract a feature vector of an object appearing in an input target image, and output an inference result regarding the object based on the extracted feature vector,
The evaluation system includes a first acquisition process of inputting a first target image in which a first target appears in the first trained model and acquiring a first feature vector corresponding to the first target; a second acquisition process of acquiring a second feature vector corresponding to the second target by inputting a second target image showing a second target having a predetermined feature different from the first target to the trained model; and an evaluation process for generating the evaluation information by evaluating changes in each of the plurality of components of the feature vector with respect to changes in the predetermined feature based on a comparison between the first feature vector and the second feature vector. run,
The evaluation information indicates an evaluation of a change in each of a plurality of components of a feature vector with respect to each of the one or more predetermined features of the object with respect to a change in the predetermined feature;
The generation system includes a determination process for determining the effectiveness of each of the plurality of components of the feature vector based on the evaluation information for the target feature among the one or more predetermined features, and the first trained model. and correcting the feature vector extracted from the input target image based on the effectiveness of each of the plurality of components of the feature vector determined in the determination processing, and determining the target based on the corrected feature vector obtained and a generation process of generating the second trained model from the first trained model by modifying it to output a result of inference about an entity.
Information processing system.