WO2021053815A1 - 学習装置、学習方法、推論装置、推論方法、及び、記録媒体 - Google Patents
学習装置、学習方法、推論装置、推論方法、及び、記録媒体 Download PDFInfo
- Publication number
- WO2021053815A1 WO2021053815A1 PCT/JP2019/037007 JP2019037007W WO2021053815A1 WO 2021053815 A1 WO2021053815 A1 WO 2021053815A1 JP 2019037007 W JP2019037007 W JP 2019037007W WO 2021053815 A1 WO2021053815 A1 WO 2021053815A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image data
- case
- feature vector
- unit
- inference
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/771—Feature selection, e.g. selecting representative features from a multi-dimensional feature space
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Definitions
- the present invention relates to a technique for recognizing an object included in an image.
- Patent Document 1 and Non-Patent Document 1 describe an object recognition technique for learning and identifying using a neural network.
- Non-Patent Document 1 also describes that a predetermined threshold value is set for the identification score, and when the identification score is lower than the above threshold value, the identification result is rejected as if the object in the registration category could not be detected. Has been done.
- the above method only rejects the identification target of the unregistered category, and cannot identify it.
- the identification performance is significantly reduced except for the domain (environment) of the image used at the time of learning.
- One object of the present invention is to correspond to images acquired in various environments and to be able to output recognition results even for identification targets of unregistered categories.
- the learning device is A metric space learning unit that learns a metric space including a feature vector extracted from the attributed image data for each combination of different attributes using attributed image data to which attribute information is added. It includes a case storage unit that calculates a feature vector from case image data and stores it as a case associated with the measurement space.
- the learning method Using the attributed image data to which the attribute information is added, the metric space including the feature vector extracted from the attributed image data is learned for each combination of different attributes. A feature vector is calculated from the case image data and stored as a case associated with the measurement space.
- the recording medium is: Using the attributed image data to which the attribute information is added, the metric space including the feature vector extracted from the attributed image data is learned for each combination of different attributes.
- a program for calculating a feature vector from case image data and causing a computer to execute a process of storing it as a case associated with the measurement space is recorded.
- the inference device A case storage unit that stores feature vectors of case image data as cases by associating them with a plurality of metric spaces learned for each combination of different attributes.
- a metric space selection unit that evaluates the plurality of metric spaces using the feature vector of the image data for selection and selects one metric space.
- An identification unit that identifies the inference image data based on the feature vector extracted from the inference image data and the case associated with the one metric space.
- a result output unit for outputting the identification result by the identification unit is provided.
- the inference method A plurality of metric spaces are acquired from the case storage unit that stores the feature vector of the case image data as a case in association with the metric space learned for each combination of different attributes.
- the plurality of metric spaces are evaluated using the feature vector of the image data for selection, and one metric space is selected. Based on the feature vector extracted from the inference image data and the case associated with the one metric space, the inference image data is identified and the identification result is output.
- the recording medium is: A plurality of metric spaces are acquired from the case storage unit that stores the feature vector of the case image data as a case in association with the metric space learned for each combination of different attributes. The plurality of metric spaces are evaluated using the feature vector of the image data for selection, and one metric space is selected. A program that causes a computer to execute a process of identifying the inference image data and outputting the identification result based on the feature vector extracted from the inference image data and the case associated with the one metric space is recorded. To do.
- FIG. 1 shows a method of creating a case dictionary for a recognition target including a new class.
- the metric space is learned using the image data to which the attribute information and the like are added.
- the image data of the person to which the attribute information is given is acquired by using the public image data set of various people.
- the "attribute information” is a person attribute reflected in the image data, and examples thereof include the age, gender, height, and incidental items (such as belongings and wearing items) of the person.
- image data of various attributes are acquired for the "police officer", “pedestrian", and "firefighter” to be recognized.
- FIG. 1 shows a metric space 10 learned based on a certain person attribute.
- the metric space 10 is a space defined by a feature vector (metric) extracted from image data, and has the property that similar image data are located at a short distance and dissimilar image data are located at a distant distance.
- a public image data set of a person having a certain person attribute for example, wearing a hat
- a feature vector is calculated for them
- a metric space is learned based on the obtained feature vector.
- learning the metric space actually means preparing an identification model using a neural network or the like so that the feature vector generated by the model with respect to the input of each image data has the above-mentioned properties. In addition, it refers to learning the model. Further, the metric space obtained by learning is defined by the parameters of the trained discriminative model.
- a feature vector is generated from the image data of the existing class and embedded in the metric space 10 as an example.
- similar image data are located close to each other. Therefore, as shown in the figure, the image data of the existing class “police officer” are located close to each other on the weighing space 10 as indicated by the mark 11.
- the image data of the existing class "pedestrian” are located close to each other on the weighing space 10 as indicated by the mark 12.
- the "police officer” indicated by the mark 11 and the "pedestrian” indicated by the mark 12 are located apart from each other on the weighing space 10. In this way, the image data of the existing class is embedded in the weighing space 10 as an example.
- "embedding as an example” actually means that the feature vector extracted from the image is stored in association with the metric space 10.
- the new class embed a case in the weighing space 10 in the same way.
- a feature vector is extracted from the image data of the new class "firefighter” and embedded as an example in the measuring space 10.
- the image data of the new class "firefighter” is arranged close to each other on the weighing space 10 as shown by the mark 13, and is separated from other classes "police officer” and "pedestrian”. Is placed.
- cases of the same class are located close to each other, and cases of different classes are located apart from each other.
- cases are embedded in the weighing space 10 in this way, it becomes possible to identify the class of image data by referring to these cases. For example, as shown in FIG. 1, when the image data 15 of a certain person is input, the feature vector of the image data 15 is extracted and the position on the weighing space 10 is calculated. In the example of FIG. 1, since the feature vector of the image data 15 belongs to the area where the cases of the class "firefighter" are gathered, the class of the image data can be recognized as "firefighter". In this way, even when a new class is added as a recognition target, the new class can be recognized by embedding the cases of the existing class and the new class in the metric space and creating a case dictionary.
- FIG. 1 illustrates one metric space learned for a certain person attribute
- the metric space 10 is learned and learned for each of a plurality of combinations of different person attributes.
- a case dictionary is created by embedding a case in the space 10. Cases for a plurality of measuring spaces are registered in the case dictionary.
- FIG. 2 is a diagram illustrating a method of selecting an optimum weighing space.
- the case dictionary contains cases for multiple metric spaces that correspond to different combinations of person attributes.
- the case dictionary contains a weighing space 10a for the attributes "incidental” and “age”, a weighing space 10b for the attributes "incidental” and “gender", and an attribute "incidental”. It is assumed that examples are stored for each of the measuring space 10c for "height” and “height” and the measuring space 10d for the attributes "height", "age”, and "gender".
- these weighing spaces 10a to 10d are evaluated using a plurality of cases of the existing class.
- the evaluation data the evaluation data of the existing domain (source domain) for the existing class "police officer" and the minority data of the target domain, and the evaluation data of the existing domain for the existing class "pedestrian" are used.
- the above evaluation data is recognized by referring to the cases of the respective measuring spaces 10a to 10d, and the result is compared with the teacher label prepared in advance to calculate the degree of agreement.
- the weighing space having the highest degree of coincidence is selected as the optimum weighing space 10x.
- the recognition accuracy in the target domain can be improved.
- the image data of the target domain is recognized by using the discriminative model that defines the selected metric space.
- FIG. 3 is a block diagram showing a hardware configuration of the object recognition device according to the first embodiment.
- the object recognition device 100 includes an interface 102, a processor 103, a memory 104, a recording medium 105, a database (DB) 106, and a display unit 107.
- DB database
- Interface 102 inputs / outputs data to / from an external device. Specifically, image data used for learning and inference of the object recognition device 100 is input through the interface 102, and the recognition result by the object recognition device 100 is output to the external device through the interface 102.
- the processor 103 is a computer such as a CPU (Central Processing Unit) or a CPU and a GPU (Graphics Processing Unit), and controls the entire object recognition device 100 by executing a program prepared in advance. Specifically, the processor 103 executes the learning process and the inference process described later.
- a CPU Central Processing Unit
- a CPU and a GPU Graphics Processing Unit
- the memory 104 is composed of a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.
- the memory 104 stores a model for object recognition used by the object recognition device 100.
- the memory 104 stores various programs executed by the processor 103.
- the memory 104 is also used as a working memory during execution of various processes by the processor 103.
- the recording medium 105 is a non-volatile, non-temporary recording medium such as a disk-shaped recording medium or a semiconductor memory, and is configured to be removable from the object recognition device 100.
- the recording medium 105 records various programs executed by the processor 103. When the object recognition device 100 executes various processes, the program recorded on the recording medium 105 is loaded into the memory 104 and executed by the processor 103.
- Database 106 stores image data input from the outside. Specifically, image data or the like used for learning of the object recognition device 100 is stored. In addition, the database 106 stores a case dictionary created by the learning process.
- the display unit 107 is, for example, a liquid crystal display device, and displays the recognition result by the object recognition device 100, additional information related thereto, and the like.
- the object recognition device 100 may be provided with input devices such as a keyboard and a mouse for the user to give instructions and inputs.
- FIG. 4 is a block diagram showing a functional configuration of the object recognition device 100A for learning.
- the object recognition device 100A includes a label selection unit 111, a measurement space learning unit 112, an image perturbation unit 113, a measurement calculation unit 114, a feature perturbation unit 115, and a case embedding unit 116.
- Additional information 121, teacher label 122, and image data 123 are input to the object recognition device 100A as data for measurement learning.
- the "data for metric learning” is data for learning the metric space.
- the image data 123 is learning image data necessary for learning the metric space, and for example, the above-mentioned public image data set can be used.
- the teacher label 122 is a teacher label associated with the image data 123, and is, for example, person attribute information or class information.
- the attribute information includes age, gender, height, accessories, clothes, etc.
- the class information includes personal ID, occupation (police officer, firefighter), and the like.
- the additional information 121 is information that is added as additional information to assist in understanding the information when the image data 123 and the teacher label 122 are registered.
- Examples of the additional information 121 include information such as the shooting time and the depression angle of the camera used for shooting, environmental information (temperature, latitude / longitude, indoor / outdoor), and the like. As will be described later, the image data 123 for metric learning and the teacher label 122 are also used for case registration as needed.
- the teacher label 124, the image data 125, and the additional information 126 are input to the object recognition device 100A as the data for registering the case.
- Data for case registration is data for creating a case dictionary.
- the image data 125 is learning image data necessary for registering a case, and image data is prepared for each class to be identified.
- the teacher label 124 is a teacher label associated with the image data 125, and is, for example, class information.
- the additional information 126 is information that is added as additional information to assist in understanding the information when the image data 125 and the teacher label 124 are registered. Examples of the additional information 126 include information such as the shooting time and the depression angle of the camera used for shooting, environmental information (temperature, latitude / longitude, indoor / outdoor), and the like.
- the label selection unit 111 selects a teacher label indicating an attribute or the like from the teacher label 122 when learning the weighing space.
- the label selection unit 111 may randomly select a plurality of teacher labels, or select a plurality of teacher labels so that the teacher labels selected by using information entropy or the like become complementary information. You may.
- the label selection unit 111 outputs a set of selected combinations of teacher labels to the metric space learning unit 112.
- the label selection unit 111 is an example of the attribute determination unit of the present invention.
- the metric space learning unit 112 learns the metric space based on the image data 123 for metric learning and the teacher label selected by the label selection unit 111. Specifically, the metric space learning unit 112 learns a metric space in which each class of teacher labels selected by the label selection unit 111 can be best identified. That is, as shown in FIG. 1, the metric space learning unit 112 learns the metric space so that the same classes gather close to each other and different classes are located apart from each other. Actually, in the discriminative model in which features are extracted from image data by convolution and identified, the feature vector obtained at the stage immediately before the final identification may be used as a metric.
- a feature vector obtained in a fully connected layer in a CNN (Convolutional Neural Network) model such as VGG may be used.
- the metric space learned in this way is output to the metric calculation unit 114 and the case embedding unit 116.
- the parameters of the learned discriminative model are output as the metric space.
- Image data 123 and additional information 121 for metric learning, and image data 125 and additional information 126 for case registration are input to the image perturbation unit 113.
- the image data 123 for metric learning input to the image perturbation unit 113 is used for case registration.
- the image perturbation unit 113 perturbs the image data 123 for metric learning and the image data 125 for case registration.
- the image perturbation unit 113 gives a hostile perturbation to the original image by geometric deformation, image compression, addition of blur and noise, change of brightness and saturation, and the like. If the perturbation parameter can be estimated from the additional information, the image perturbation unit 113 may perturb the image only within the range of the parameter.
- the image perturbation unit 113 may perform the geometric deformation within the range of the parameter.
- Image perturbation can substantially increase the number of image data used for learning.
- the perturbed image data is output to the measurement calculation unit 114.
- the metric calculation unit 114 is given a metric space that has been learned from the metric space learning unit 112, and image data after perturbation is input from the image perturbation unit 113.
- the metric calculation unit 114 calculates a feature vector corresponding to the metric from the image data after the perturbation. That is, the metric calculation unit 114 uses each image data perturbed as an example as an example, and calculates the position of each case in the metric space learned by the metric space learning unit 112. As a result, the image data 125 for case registration is arranged on the weighing space as shown in FIG.
- the metric space learning unit 112 extracts a feature vector from each image data after perturbation by using an identification model showing the metric space learned by the metric space learning unit 112. The feature vector extracted from each image data after perturbation is output to the feature perturbation unit 115.
- the feature perturbation unit 115 perturbs the feature vector of each image data obtained by the measurement calculation unit 114. That is, the feature perturbation unit 115 newly obtains a feature vector existing at the farthest distance in the measurement space within a certain range of changes on the image from the feature vector of each image data obtained by the measurement calculation unit 114. Generate as an example. As a result, a plurality of cases can be added around the cases arranged on the measurement space by the measurement calculation unit 114, and the area of each class in the measurement space can be expanded.
- the feature perturbation unit 115 outputs the feature vector generated by the perturbation and the feature vector before perturbation, that is, the feature vector input from the metric calculation unit 114 to the case embedding unit 116.
- the case embedding unit 116 embeds the feature vector input from the feature perturbation unit 115, that is, the feature vector before and after the feature perturbation in the metric space as an example. Specifically, the case embedding unit 116 associates the feature vector input from the feature perturbation unit 115 with the metric space as a case, and registers it in the case dictionary 127. At that time, the case embedding unit 116 also registers the teacher labels 122 and 124 and the additional information 121 and 126 in association with each case. Further, the case embedding unit 116 may register representative image data as image data corresponding to the case embedded in the measuring space.
- a case dictionary 127 is created in which cases for the corresponding measuring spaces are registered for each combination of the plurality of labels (attributes).
- the case dictionary 127 stores information defining a plurality of measuring spaces and cases embedded in each measuring space.
- the "information defining the metric space” is actually a parameter of the learned discriminative model
- the "case embedded in each metric space” is a feature vector in the metric space.
- the case dictionary 127 is an example of the case storage unit of the present invention.
- FIG. 5 is a flowchart of learning processing by the object recognition device 100A for learning. This process is performed by the processor 103 shown in FIG. 3 executing a program prepared in advance.
- the label selection unit 111 selects a teacher label including attributes and classes (step S11).
- the metric space learning unit 112 learns the metric space for the combination of labels selected in step S11 using the image data 123 for metric learning and the teacher label 122 (step S12).
- the image perturbation unit 113 perturbs the image data 125 for case registration, and outputs the image data after the perturbation to the measurement calculation unit 114.
- the metric calculation unit 114 calculates the feature vector of the image data after perturbation (step S14), and the feature perturbation unit 115 perturbs the calculated feature vector (step S15). In this way, a plurality of feature vectors can be obtained from the image data for registration by the perturbation of the image and the perturbation of the features.
- the case embedding unit 116 creates a case dictionary 127 by storing the obtained feature vector as a case in association with the metric space (step S16). In this way, the learning process ends. As a result, cases are registered in the case dictionary 127 for the metric space for one combination of attributes.
- the object recognition device 100A By changing the label selected by the label selection unit 111, the object recognition device 100A also learns the metric space for another combination of attributes, embeds a case, and registers it in the case dictionary 127. In this way, as illustrated in FIG. 2, in the case dictionary 127, cases arranged on the measuring space corresponding to the combination of a plurality of attributes are registered.
- FIG. 6 is a block diagram showing a functional configuration of the object recognition device 100B for inference.
- the object recognition device 100B includes an image perturbation unit 131, a measurement calculation unit 132, a feature perturbation unit 133, a measurement space selection unit 134, an image perturbation unit 135, a measurement calculation unit 136, and a feature perturbation unit.
- a unit 137, an identification unit 138, and a result output unit 139 are provided.
- the object recognition device 100B uses image data 141 for dictionary selection, a teacher label 142 for dictionary selection, additional information 143 for dictionary selection teaching, image data 145 for inference, and a case dictionary 127.
- the case dictionary 127 is created by the above-mentioned learning process.
- the image data 141 for dictionary selection is image data used for selecting a case dictionary 127 corresponding to an optimum measurement space from a case dictionary 127 for a plurality of measurement spaces prepared in advance, and is basic.
- the properties are the same as the image data 123 for metric space learning described above.
- the teacher label 142 for dictionary selection is a teacher label associated with the image data 141 for dictionary selection, and its basic properties are the same as those of the teacher label 122 for measurement space learning.
- the additional information 143 for dictionary selection is additional information associated with the image data 141 for dictionary selection, and its basic properties are the same as those of the additional information 121 for metric space learning.
- the image data for inference is the image data to be recognized by the object recognition device 100B.
- the image perturbation units 131 and 135 are the same as the image perturbation unit 113 in the functional configuration for learning shown in FIG. 4, and the metric calculation units 132 and 136 are the same as the metric calculation unit 114 in the functional configuration for learning.
- the feature perturbation units 133 and 137 are similar to the feature perturbation unit 115 in the functional configuration for learning.
- the image perturbation unit 131, the measurement calculation unit 132, the feature perturbation unit 133, and the measurement space selection unit 134 store the image data 141 for dictionary selection, the teacher label 142, and the additional information 143 in the case dictionary 127. Performs a process of selecting the optimum weighing space from a plurality of weighing spaces. Specifically, the image perturbation unit 131 perturbs the image data 141 for dictionary selection.
- the metric calculation unit 132 acquires one metric space from a plurality of metric spaces stored in the case dictionary 127, and calculates a feature vector of image data after perturbation in the metric space.
- the feature perturbation unit 133 perturbs the feature vector calculated by the metric calculation unit 132, and generates a feature vector after the perturbation. In this way, a plurality of feature vectors are calculated from the image data 141 for dictionary selection. This process increases the number of image data used to select the optimal weighing space.
- the image perturbation unit 131, the measurement calculation unit 132, and the feature perturbation unit 133 perform the same processing on other measurement spaces and calculate the feature vector in those measurement spaces. In this way, a plurality of feature vectors are calculated for the plurality of metric spaces stored in the case dictionary 127 based on the image data 141 for dictionary selection.
- the weighing space selection unit 134 selects the optimum weighing space from the feature vector calculated from the image data 141 for dictionary selection and the corresponding teacher label 142 and additional information 143.
- the metric space selection unit 134 includes a teacher label, a feature vector on the metric space of image data 141 for dictionary selection, and a feature vector in a case embedded in the metric space stored in the case dictionary 127. Performance is evaluated for each metric space using techniques such as nearest neighbor recognition. That is, as shown in FIG. 2, the metric space selection unit 134 evaluates the performance of a plurality of metric spaces using the image data of the existing class, and selects the metric space having the highest performance.
- the measurement space selection unit 134 uses the additional information 143 to narrow down the measurement space to be selected in advance, and then performs the above-mentioned performance.
- the optimum weighing space may be selected by evaluation.
- the above-mentioned performance evaluation and selection using additional information may be performed at the same time.
- the metric space selected in this way is a metric space that enables the most accurate recognition of the attributes of the image data 141 for dictionary selection.
- the measurement space selection unit 134 outputs the selected measurement space to the measurement calculation unit 136 and the identification unit 138.
- the inference of the image data 145 for inference is performed using the metric space.
- the image perturbation unit 135 perturbs the image data 145 for inference, and outputs the image data after the perturbation to the measurement calculation unit 136.
- the metric calculation unit 136 calculates the feature vector of the image data after perturbation in the metric space selected by the metric space selection unit 134. Further, the feature perturbation unit 137 perturbs the feature vector calculated by the measurement calculation unit 136, and outputs the obtained plurality of feature vectors to the identification unit 138.
- the identification unit 138 includes a teacher label, a plurality of feature vectors obtained from the image data 145 for inference, and a large number of cases stored in the case dictionary 127 for the metric space selected by the metric space selection unit 134. The nearest neighbor recognition is performed between them, and the class of the image data 145 for inference is identified. The identification result is supplied to the result output unit 139.
- the result output unit 139 outputs, in addition to the class identification result by the identification unit 138, an image corresponding to a nearby case selected by the identification unit 138, a teacher label associated with the case, and additional information. Specifically, the result output unit 139 displays this information on the display unit 107 or the like shown in FIG. As a result, even if the recognition target included in the inference image data 145 is a new class, the user can use not only the identification result class but also the image, teacher label, additional information, etc. associated with the case close to the recognition target. Since you can see, it is possible to intuitively judge the validity of the recognition result.
- FIG. 7 is a flowchart of inference processing by the object recognition device for inference. This process is performed by the processor 103 shown in FIG. 3 executing a program prepared in advance.
- the image perturbation unit 131 perturbs the image data 141 for dictionary selection (step S21), and the metric calculation unit 132 calculates the feature vector of the perturbed image data for a plurality of metric spaces (step S22).
- the feature perturbation unit 133 perturbs the obtained feature vector to generate a plurality of feature vectors (step S23).
- the metric space selection unit 134 evaluates the performance using the plurality of feature vectors and the cases embedded in each metric space in the case dictionary 127, and selects the optimum metric space (step S24).
- the image data 145 for inference is then identified.
- the image perturbation unit 135 perturbs the image data 145 for inference (step S25), and the metric calculation unit 136 calculates the feature vector of the image data after perturbation for the metric space selected in step S24 (step S26).
- the feature perturbation unit 137 perturbs the obtained feature vector to generate a plurality of feature vectors (step S27), and the identification unit 138 recognizes the nearest neighbor to the case in the selected metric space.
- the class is identified by the method (step S28).
- the result output unit 139 outputs the class identification result together with the image data of the case used for the identification, the teacher label, the additional information, and the like (step S29). In this way, the inference process ends.
- the metric space selection unit 134 evaluates a plurality of metric spaces using the image data of the existing class as evaluation data, and selects the optimum metric space.
- the metric space selection unit 134 may use a new class of image data as evaluation data. In this case, it is possible that the correct label (correct class) is not prepared for the image data of the new class, but even in that case, multiple cases of the new class are combined with the cases of other existing classes in the metric space.
- a unit is formed at a distant position, it can be evaluated that the measuring space has appropriate performance.
- the set of cases of the new class to be targeted is gathered in a narrower area on the metric space, and the one that is far from the set other than the new class may be selected as the case dictionary with the best characteristics. .. More specifically, for example, for each case in the new class, the ratio of the average value A of the distance between the case and another case in the new class and the average value B in the distance between the case and the case in the existing class. , And select the one with a small ratio.
- the weighing space is learned using the person attribute data (incidental items, age, etc.) and the person class data (police officer, firefighter, etc.). Instead, the metric space is learned using only the person attribute data, each metric space obtained is used as the initial value, and after re-learning (fine tuning) using the person class data, the performance is evaluated and optimized. The weighing space may be selected.
- the metric space is learned based on the person attribute data and the person class data.
- the weight in the neural network may be shared by both the person attribute identification task and the person class identification task.
- weights may be set for the loss function of the person attribute identification task and the loss function of the person class identification task for learning. For example, regarding the loss function of the person attribute identification task and the loss function of the person class identification task, the contribution (coefficient) of either one is increased in the first half of the re-proposed, and the contribution (coefficient) in the loss function is increased in the second half of the optimization ( Coefficient) is reduced.
- the person attribute data can also be diverted, it is effective when the data of the person class and the person class is small.
- a public image data set or the like contains a large amount of person attribute data, but often has a small amount of person class data. Therefore, first, the weight of the person attribute identification task for the loss function is increased to start learning, and then the weight of the person class identification task for the loss function is increased to perform learning specialized for each person class. As a result, even in a situation where there is a large amount of person attribute data and there is little person class data, it is possible to effectively utilize the person class data and learn the metric space.
- the image data is perturbed by the image perturbation unit, but the following method may be used as the image perturbation method.
- a first method an image of a plurality of people is decomposed into partial areas such as body parts (head, torso, hands, feet, etc.), and these are pasted together to generate an image of the person.
- Image processing such as ⁇ -blending is applied to the boundaries of body parts.
- the second method first, the joint position of the body of the person included in the image data is detected by the key point detection.
- geometric transformations such as affine transformation, Helmart transformation, homography transformation, and B-spline interpolation are used to normalize the positions of key points and generate an image in which the positions of joints are aligned. Then, by adding noise or the like, the position of the key point is slightly shifted to give perturbation.
- the feature perturbation unit may generate micro-perturbation cases using hostile case generation. Specifically, when adding a minute noise to the input image, the case in which the distance between the case group in the same class as the class to which the target case belongs is the longest is adopted. That is, if the case obtained by applying minute noise to the input image is far from the existing case in the measuring space, it is adopted, and if it is close to the existing case, it is not adopted.
- the image and the feature vector are perturbed in the learning of the metric space and the selection of the metric space, but when a sufficient amount of image data can be prepared, the perturbation of the image and the feature vector. You do not have to do.
- FIG. 8A shows the configuration of the learning device 50 according to the second embodiment.
- the learning device 50 includes a measuring space learning unit 51 and a case storage unit 52.
- the metric space learning unit 51 learns a metric space including a feature vector extracted from the attributed image data for each combination of different attributes using the attributed image data to which the attribute information is added.
- the case storage unit 52 calculates a feature vector from the case image data and stores it as a case associated with the metric space. In this way, the metric space is learned for each combination of different attributes, and the case is stored in association with it.
- FIG. 8B shows the configuration of the inference device according to the second embodiment.
- the inference device 60 includes a case storage unit 61, a measurement space selection unit 62, an identification unit 63, and a result output unit 64.
- the case storage unit 61 stores the feature vector of the case image data as a case in association with a plurality of metric spaces learned for each combination of different attributes.
- the metric space selection unit 62 evaluates a plurality of metric spaces using the feature vector of the image data for selection, and selects one metric space.
- the identification unit 63 identifies the inference image data based on the feature vector extracted from the inference image data and the case associated with one metric space. Then, the result output unit 64 outputs the identification result by the identification unit 63. In this way, the inference image data can be identified by using the case stored in the case storage unit 61.
- a metric space learning unit that learns a metric space including a feature vector extracted from the attributed image data for each combination of different attributes using attributed image data to which attribute information is added.
- a case storage unit that calculates a feature vector from case image data and stores it as a case associated with the measurement space.
- a learning device equipped with
- Appendix 2 The learning device according to Appendix 1, further comprising an attribute determination unit that determines a combination of different attributes.
- a first image perturbation unit for perturbing the case image data is provided.
- the learning device according to Appendix 1 or 2, wherein the case storage unit stores a feature vector calculated from image data for a case after perturbation as a case.
- Appendix 5 The learning device according to any one of Appendix 1 to 4, wherein the case storage unit stores a teacher label and additional information of the case image data in association with the case.
- a recording medium that records a program that causes a computer to execute a process of calculating a feature vector from case image data and storing it as a case associated with the measurement space.
- a case storage unit that stores feature vectors of case image data as cases by associating them with a plurality of metric spaces learned for each combination of different attributes.
- a metric space selection unit that evaluates the plurality of metric spaces using the feature vector of the image data for selection and selects one metric space.
- An identification unit that identifies the inference image data based on the feature vector extracted from the inference image data and the case associated with the one metric space.
- a result output unit that outputs the identification result by the identification unit, and An inference device equipped with.
- the weighing space selection unit identifies the selection image data of the existing class using each of the plurality of weighing spaces, and selects the weighing space having the highest degree of coincidence with the teacher label of the selection image data of the existing class.
- the inference device according to Appendix 8, which determines the measurement space of the above.
- Appendix 10 The identification unit is described in Appendix 8 or 9 in which, among the cases stored in the case storage unit, the class of the case closest to the feature vector of the inference image data in the one measuring space is used as the identification result. Inference device.
- a second image perturbation unit that perturbs the inference image data is provided.
- the inference device according to any one of Supplementary note 8 to 11, wherein the identification unit identifies the inference image data by using a feature vector of the inference image data after perturbation.
- a second feature perturbation unit that perturbs the feature vector of the inference image data is provided.
- the inference device according to any one of Appendix 8 to 11, wherein the identification unit identifies image data for inference by using a feature vector after perturbation.
- a plurality of metric spaces are acquired from the case storage unit that stores the feature vector of the case image data as a case in association with the metric space learned for each combination of different attributes.
- the plurality of metric spaces are evaluated using the feature vector of the image data for selection, and one metric space is selected.
- An inference method that identifies the inference image data based on the feature vector extracted from the inference image data and the case associated with the one metric space, and outputs the identification result.
- a plurality of metric spaces are acquired from the case storage unit that stores the feature vector of the case image data as a case in association with the metric space learned for each combination of different attributes.
- the plurality of metric spaces are evaluated using the feature vector of the image data for selection, and one metric space is selected.
- a program that causes a computer to execute a process of identifying the inference image data and outputting the identification result based on the feature vector extracted from the inference image data and the case associated with the one metric space is recorded. Recording medium.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
属性情報が付与された属性付画像データを用いて、異なる属性の組み合わせ毎に、前記属性付画像データから抽出された特徴ベクトルを含む計量空間を学習する計量空間学習部と、
事例用画像データから特徴ベクトルを算出し、前記計量空間に関連付けた事例として記憶する事例記憶部と、を備える。
属性情報が付与された属性付画像データを用いて、異なる属性の組み合わせ毎に、前記属性付画像データから抽出された特徴ベクトルを含む計量空間を学習し、
事例用画像データから特徴ベクトルを算出し、前記計量空間に関連付けた事例として記憶する。
属性情報が付与された属性付画像データを用いて、異なる属性の組み合わせ毎に、前記属性付画像データから抽出された特徴ベクトルを含む計量空間を学習し、
事例用画像データから特徴ベクトルを算出し、前記計量空間に関連付けた事例として記憶する処理をコンピュータに実行させるプログラムを記録する。
異なる属性の組み合わせ毎に学習された複数の計量空間に関連付けて、事例用画像データの特徴ベクトルを事例として記憶した事例記憶部と、
選択用画像データの特徴ベクトルを用いて前記複数の計量空間を評価して、一の計量空間を選択する計量空間選択部と、
推論用画像データから抽出された特徴ベクトルと、前記一の計量空間に関連付けられた事例とに基づいて、当該推論用画像データを識別する識別部と、
前記識別部による識別結果を出力する結果出力部と、を備える。
異なる属性の組み合わせ毎に学習された計量空間に関連付けて、事例用画像データの特徴ベクトルを事例として記憶した事例記憶部から複数の計量空間を取得し、
選択用画像データの特徴ベクトルを用いて前記複数の計量空間を評価して、一の計量空間を選択し、
推論用画像データから抽出された特徴ベクトルと、前記一の計量空間に関連付けられた事例とに基づいて、当該推論用画像データを識別し、識別結果を出力する。
異なる属性の組み合わせ毎に学習された計量空間に関連付けて、事例用画像データの特徴ベクトルを事例として記憶した事例記憶部から複数の計量空間を取得し、
選択用画像データの特徴ベクトルを用いて前記複数の計量空間を評価して、一の計量空間を選択し、
推論用画像データから抽出された特徴ベクトルと、前記一の計量空間に関連付けられた事例とに基づいて、当該推論用画像データを識別し、識別結果を出力する処理をコンピュータに実行させるプログラムを記録する。
[基本原理]
まず、実施形態の物体認識方法の基本原理を説明する。本実施形態では、それまで認識対象としていたクラス(以下、「既存クラス」と呼ぶ。)に加えて、新たなクラス(以下、「新クラス」と呼ぶ。)を認識する必要が生じた場合に、新クラスに対応する事例を登録した事例データ(以下、「事例辞書」とも呼ぶ。)を作成し、事例辞書を参照して新クラスの対象を認識する。また、既存クラスの認識対象についても、新たな環境での認識精度の低下を防止するため、複数の計量空間を用意し、最適な計量空間を用いて認識を行う。
図1は、新クラスを含む認識対象について事例辞書を作成する方法を示す。いま、既存クラスとして「警察官」及び「歩行者」があり、新クラスとして「消防士」の認識を行いたいと仮定する。まず、属性情報などが付与された画像データを用いて、計量空間を学習する。具体的には、様々な人物の公開画像データセットなどを利用して、属性情報が付与された人物の画像データを取得する。なお、「属性情報」とは、画像データに写っている人物属性であり、例えば、その人物の年齢、性別、身長、付帯物(持ち物や身に着けている物など)が挙げられる。図1の例では、認識対象となる「警察官」、「歩行者」及び「消防士」について、様々な属性の画像データを取得する。
さて、作成された事例辞書を利用して物体認識を行う際には、そのときの環境(ドメイン)に最も適した計量空間を選択し、その計量空間を用いて物体認識を行う。図2は、最適な計量空間を選択する方法を説明する図である。前述のように、事例辞書は、異なる人物属性の組み合わせに対応する複数の計量空間についての事例を含む。いま、事例辞書には、図2に示すように、属性「付帯物」及び「年齢」についての計量空間10aと、属性「付帯物」及び「性別」についての計量空間10bと、属性「付帯物」及び「身長」についての計量空間10cと、属性「身長」、「年齢」、「性別」についての計量空間10dのそれぞれについて事例が記憶されているものとする。
次に、本発明の第1実施形態について説明する。
(ハードウェア構成)
図3は、第1実施形態に係る物体認識装置のハードウェア構成を示すブロック図である。図示のように、物体認識装置100は、インタフェース102と、プロセッサ103と、メモリ104と、記録媒体105と、データベース(DB)106と、表示部107と、を備える。
次に、物体認識装置100の学習のための機能構成について説明する。図4は、学習のための物体認識装置100Aの機能構成を示すブロック図である。図示のように、物体認識装置100Aは、ラベル選択部111と、計量空間学習部112と、画像摂動部113と、計量算出部114と、特徴摂動部115と、事例埋め込み部116とを備える。
次に、上記の学習処理の流れを説明する。図5は、学習のための物体認識装置100Aによる学習処理のフローチャートである。この処理は、図3に示すプロセッサ103が、予め用意されたプログラムを実行することにより実施される。
次に、物体認識装置100の推論のための機能構成について説明する。図6は、推論のための物体認識装置100Bの機能構成を示すブロック図である。図示のように、物体認識装置100Bは、画像摂動部131と、計量算出部132と、特徴摂動部133と、計量空間選択部134と、画像摂動部135と、計量算出部136と、特徴摂動部137と、識別部138と、結果出力部139とを備える。
次に、推論のための物体認識装置100Bによる推論処理について説明する。図7は、推論のための物体認識装置による推論処理のフローチャートである。この処理は、図3に示すプロセッサ103が予め用意されたプログラムを実行することにより実施される。
(1)上記の推論処理では、計量空間選択部134は、既存クラスの画像データを評価用データとして用いて複数の計量空間を評価し、最適な計量空間を選択している。これに加えて、計量空間選択部134は、新クラスの画像データを評価用データとして使用してもよい。この場合、新クラスの画像データについては正解ラベル(正解クラス)が用意されていないことが考えられるが、その場合でも、新クラスの複数の事例が、計量空間上で他の既存クラスの事例と離れた位置でまとまりを形成しているような場合には、その計量空間が適切な性能を有していると評価することができる。よって、対象となる新クラスの事例の集合が計量空間上でより狭い領域に集まっており、さらに新クラス以外の集合との距離が遠いものを、最良の特等を持つ事例辞書として選択すればよい。より具体的には、例えば、新クラスの各事例毎に、当該事例と新クラスの他の事例との距離の平均値Aと、当該事例と既存クラスの事例との距離の平均値Bの比を求め、この比が小さいものを選択すればよい。
次に、本発明の第2実施形態について説明する。図8(A)は、第2実施形態に係る学習装置50の構成を示す。学習装置50は、計量空間学習部51と、事例記憶部52とを備える。計量空間学習部51は、属性情報が付与された属性付画像データを用いて、異なる属性の組み合わせ毎に、属性付画像データから抽出された特徴ベクトルを含む計量空間を学習する。事例記憶部52は、事例用画像データから特徴ベクトルを算出し、計量空間に関連付けた事例として記憶する。こうして、異なる属性の組み合わせ毎に計量空間が学習され、それに関連付けて事例が記憶される。
属性情報が付与された属性付画像データを用いて、異なる属性の組み合わせ毎に、前記属性付画像データから抽出された特徴ベクトルを含む計量空間を学習する計量空間学習部と、
事例用画像データから特徴ベクトルを算出し、前記計量空間に関連付けた事例として記憶する事例記憶部と、
を備える学習装置。
前記異なる属性の組み合わせを決定する属性決定部を備える付記1に記載の学習装置。
前記事例用画像データを摂動させる第1の画像摂動部を備え、
前記事例記憶部は、摂動後の事例用画像データから算出された特徴ベクトルを事例として記憶する付記1又は2に記載の学習装置。
前記事例用画像データについて算出された特徴ベクトルを摂動する第1の特徴摂動部を備え、
前記事例記憶部は、摂動後の特徴ベクトルを事例として記憶する付記1乃至3のいずれか一項に記載の学習装置。
前記事例記憶部は、前記事例用画像データの教師ラベル及び付加情報を前記事例に紐づけて記憶する付記1乃至4のいずれか一項に記載の学習装置。
属性情報が付与された属性付画像データを用いて、異なる属性の組み合わせ毎に、前記属性付画像データから抽出された特徴ベクトルを含む計量空間を学習し、
事例用画像データから特徴ベクトルを算出し、前記計量空間に関連付けた事例として記憶する学習方法。
属性情報が付与された属性付画像データを用いて、異なる属性の組み合わせ毎に、前記属性付画像データから抽出された特徴ベクトルを含む計量空間を学習し、
事例用画像データから特徴ベクトルを算出し、前記計量空間に関連付けた事例として記憶する処理をコンピュータに実行させるプログラムを記録した記録媒体。
異なる属性の組み合わせ毎に学習された複数の計量空間に関連付けて、事例用画像データの特徴ベクトルを事例として記憶した事例記憶部と、
選択用画像データの特徴ベクトルを用いて前記複数の計量空間を評価して、一の計量空間を選択する計量空間選択部と、
推論用画像データから抽出された特徴ベクトルと、前記一の計量空間に関連付けられた事例とに基づいて、当該推論用画像データを識別する識別部と、
前記識別部による識別結果を出力する結果出力部と、
を備える推論装置。
前記計量空間選択部は、前記複数の計量空間の各々を用いて既存クラスの選択用画像データを識別し、当該既存クラスの選択用画像データの教師ラベルと最も一致度の高い計量空間を前記一の計量空間と決定する付記8に記載の推論装置。
前記識別部は、前記事例記憶部に記憶されている事例のうち、前記一の計量空間において前記推論用画像データの特徴ベクトルと最も近い事例のクラスを前記識別結果とする付記8又は9に記載の推論装置。
前記結果出力部は、前記識別結果に加えて、前記最も近い事例の教師ラベル、付加情報及び画像データを推論結果として出力する10に記載の推論装置。
前記推論用画像データを摂動する第2の画像摂動部を備え、
前記識別部は、摂動後の推論用画像データの特徴ベクトルを用いて、当該推論用画像データを識別する付記8乃至11のいずれか一項に記載の推論装置。
前記推論用画像データの特徴ベクトルを摂動する第2の特徴摂動部を備え、
前記識別部は、摂動後の特徴ベクトルを用いて、前記推論用の画像データを識別する付記8乃至11のいずれか一項に記載の推論装置。
異なる属性の組み合わせ毎に学習された計量空間に関連付けて、事例用画像データの特徴ベクトルを事例として記憶した事例記憶部から複数の計量空間を取得し、
選択用画像データの特徴ベクトルを用いて前記複数の計量空間を評価して、一の計量空間を選択し、
推論用画像データから抽出された特徴ベクトルと、前記一の計量空間に関連付けられた事例とに基づいて、当該推論用画像データを識別し、識別結果を出力する推論方法。
異なる属性の組み合わせ毎に学習された計量空間に関連付けて、事例用画像データの特徴ベクトルを事例として記憶した事例記憶部から複数の計量空間を取得し、
選択用画像データの特徴ベクトルを用いて前記複数の計量空間を評価して、一の計量空間を選択し、
推論用画像データから抽出された特徴ベクトルと、前記一の計量空間に関連付けられた事例とに基づいて、当該推論用画像データを識別し、識別結果を出力する処理をコンピュータに実行させるプログラムを記録した記録媒体。
100 物体認識装置
103 プロセッサ
111 ラベル選択部
112 計量空間学習部
113、131、135 画像摂動部
114、132、135 計量算出部
115、133、136 特徴摂動部
116 事例埋め込み部
127 事例辞書
170 端末装置
138 識別部
129 結果出力部
Claims (15)
- 属性情報が付与された属性付画像データを用いて、異なる属性の組み合わせ毎に、前記属性付画像データから抽出された特徴ベクトルを含む計量空間を学習する計量空間学習部と、
事例用画像データから特徴ベクトルを算出し、前記計量空間に関連付けた事例として記憶する事例記憶部と、
を備える学習装置。 - 前記異なる属性の組み合わせを決定する属性決定部を備える請求項1に記載の学習装置。
- 前記事例用画像データを摂動させる第1の画像摂動部を備え、
前記事例記憶部は、摂動後の事例用画像データから算出された特徴ベクトルを事例として記憶する請求項1又は2に記載の学習装置。 - 前記事例用画像データについて算出された特徴ベクトルを摂動する第1の特徴摂動部を備え、
前記事例記憶部は、摂動後の特徴ベクトルを事例として記憶する請求項1乃至3のいずれか一項に記載の学習装置。 - 前記事例記憶部は、前記事例用画像データの教師ラベル及び付加情報を前記事例に紐づけて記憶する請求項1乃至4のいずれか一項に記載の学習装置。
- 属性情報が付与された属性付画像データを用いて、異なる属性の組み合わせ毎に、前記属性付画像データから抽出された特徴ベクトルを含む計量空間を学習し、
事例用画像データから特徴ベクトルを算出し、前記計量空間に関連付けた事例として記憶する学習方法。 - 属性情報が付与された属性付画像データを用いて、異なる属性の組み合わせ毎に、前記属性付画像データから抽出された特徴ベクトルを含む計量空間を学習し、
事例用画像データから特徴ベクトルを算出し、前記計量空間に関連付けた事例として記憶する処理をコンピュータに実行させるプログラムを記録した記録媒体。 - 異なる属性の組み合わせ毎に学習された複数の計量空間に関連付けて、事例用画像データの特徴ベクトルを事例として記憶した事例記憶部と、
選択用画像データの特徴ベクトルを用いて前記複数の計量空間を評価して、一の計量空間を選択する計量空間選択部と、
推論用画像データから抽出された特徴ベクトルと、前記一の計量空間に関連付けられた事例とに基づいて、当該推論用画像データを識別する識別部と、
前記識別部による識別結果を出力する結果出力部と、
を備える推論装置。 - 前記計量空間選択部は、前記複数の計量空間の各々を用いて既存クラスの選択用画像データを識別し、当該既存クラスの選択用画像データの教師ラベルと最も一致度の高い計量空間を前記一の計量空間と決定する請求項8に記載の推論装置。
- 前記識別部は、前記事例記憶部に記憶されている事例のうち、前記一の計量空間において前記推論用画像データの特徴ベクトルと最も近い事例のクラスを前記識別結果とする請求項8又は9に記載の推論装置。
- 前記結果出力部は、前記識別結果に加えて、前記最も近い事例の教師ラベル、付加情報及び画像データを推論結果として出力する請求項10に記載の推論装置。
- 前記推論用画像データを摂動する第2の画像摂動部を備え、
前記識別部は、摂動後の推論用画像データの特徴ベクトルを用いて、当該推論用画像データを識別する請求項8乃至11のいずれか一項に記載の推論装置。 - 前記推論用画像データの特徴ベクトルを摂動する第2の特徴摂動部を備え、
前記識別部は、摂動後の特徴ベクトルを用いて、前記推論用の画像データを識別する請求項8乃至11のいずれか一項に記載の推論装置。 - 異なる属性の組み合わせ毎に学習された計量空間に関連付けて、事例用画像データの特徴ベクトルを事例として記憶した事例記憶部から複数の計量空間を取得し、
選択用画像データの特徴ベクトルを用いて前記複数の計量空間を評価して、一の計量空間を選択し、
推論用画像データから抽出された特徴ベクトルと、前記一の計量空間に関連付けられた事例とに基づいて、当該推論用画像データを識別し、識別結果を出力する推論方法。 - 異なる属性の組み合わせ毎に学習された計量空間に関連付けて、事例用画像データの特徴ベクトルを事例として記憶した事例記憶部から複数の計量空間を取得し、
選択用画像データの特徴ベクトルを用いて前記複数の計量空間を評価して、一の計量空間を選択し、
推論用画像データから抽出された特徴ベクトルと、前記一の計量空間に関連付けられた事例とに基づいて、当該推論用画像データを識別し、識別結果を出力する処理をコンピュータに実行させるプログラムを記録した記録媒体。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021546155A JP7338690B2 (ja) | 2019-09-20 | 2019-09-20 | 学習装置、学習方法、推論装置、推論方法、及び、プログラム |
US17/640,926 US12260330B2 (en) | 2019-09-20 | 2019-09-20 | Learning apparatus, learning method, inference apparatus, inference method, and recording medium |
PCT/JP2019/037007 WO2021053815A1 (ja) | 2019-09-20 | 2019-09-20 | 学習装置、学習方法、推論装置、推論方法、及び、記録媒体 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2019/037007 WO2021053815A1 (ja) | 2019-09-20 | 2019-09-20 | 学習装置、学習方法、推論装置、推論方法、及び、記録媒体 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021053815A1 true WO2021053815A1 (ja) | 2021-03-25 |
Family
ID=74884422
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2019/037007 WO2021053815A1 (ja) | 2019-09-20 | 2019-09-20 | 学習装置、学習方法、推論装置、推論方法、及び、記録媒体 |
Country Status (3)
Country | Link |
---|---|
US (1) | US12260330B2 (ja) |
JP (1) | JP7338690B2 (ja) |
WO (1) | WO2021053815A1 (ja) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12217485B2 (en) * | 2019-10-24 | 2025-02-04 | Nec Corporation | Object recognition device, method, and computer-readable medium |
EP4016377B1 (en) * | 2020-12-21 | 2024-10-09 | Axis AB | A device and a method for associating object detections between frames using a neural network |
JP2022158647A (ja) * | 2021-04-02 | 2022-10-17 | 日立造船株式会社 | 情報処理装置、判定方法、および判定プログラム |
CA3193358A1 (en) * | 2022-08-02 | 2023-01-05 | Mitsubishi Electric Corporation | Inference device, inference method, and non-transitory computer-readable medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4858612B2 (ja) | 2007-04-09 | 2012-01-18 | 日本電気株式会社 | 物体認識システム、物体認識方法および物体認識用プログラム |
JP5378909B2 (ja) * | 2009-08-12 | 2013-12-25 | Kddi株式会社 | サポートベクトルマシンの再学習方法 |
US11238362B2 (en) * | 2016-01-15 | 2022-02-01 | Adobe Inc. | Modeling semantic concepts in an embedding space as distributions |
CN106803063B (zh) * | 2016-12-21 | 2019-06-28 | 华中科技大学 | 一种行人重识别的度量学习方法 |
US12217485B2 (en) * | 2019-10-24 | 2025-02-04 | Nec Corporation | Object recognition device, method, and computer-readable medium |
-
2019
- 2019-09-20 JP JP2021546155A patent/JP7338690B2/ja active Active
- 2019-09-20 US US17/640,926 patent/US12260330B2/en active Active
- 2019-09-20 WO PCT/JP2019/037007 patent/WO2021053815A1/ja active IP Right Grant
Non-Patent Citations (2)
Title |
---|
LAMPERT, C. H. ET AL.: "Learning to detect unseen object classes by between-class attribute transfer", PROCEEDINGS OF THE 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 25 June 2009 (2009-06-25), pages 951 - 958, XP055652082, ISBN: 978-1-4244-3991-1, DOI: 10.1109/CVPR.2009.5206594 * |
MATSUKAWA, T. ET AL.: "Person re-identification using CNN features learned from combination of attributes", PROCEEDINGS OF THE 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR, 8 December 2016 (2016-12-08), pages 2428 - 2433, XP033085950, ISBN: 978-1-5090-4847-2, DOI: 10.1109/ICPR.2016.7900000 * |
Also Published As
Publication number | Publication date |
---|---|
US12260330B2 (en) | 2025-03-25 |
US20220335291A1 (en) | 2022-10-20 |
JP7338690B2 (ja) | 2023-09-05 |
JPWO2021053815A1 (ja) | 2021-03-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113362329B (zh) | 病灶检测模型的训练方法及识别图像中的病灶的方法 | |
US20250022092A1 (en) | Training neural networks for vehicle re-identification | |
CN109522942B (zh) | 一种图像分类方法、装置、终端设备和存储介质 | |
WO2021053815A1 (ja) | 学習装置、学習方法、推論装置、推論方法、及び、記録媒体 | |
KR20200075114A (ko) | 이미지와 텍스트간 유사도 매칭 시스템 및 방법 | |
WO2021079451A1 (ja) | 学習装置、学習方法、推論装置、推論方法、及び、記録媒体 | |
CN103714148B (zh) | 基于稀疏编码分类的sar图像检索方法 | |
CN113343863B (zh) | 融合表征网络模型训练方法、指纹表征方法及其设备 | |
Barman et al. | Shape: A novel graph theoretic algorithm for making consensus-based decisions in person re-identification systems | |
CN103150546A (zh) | 视频人脸识别方法和装置 | |
CN111626098B (zh) | 模型的参数值更新方法、装置、设备及介质 | |
CN108280481A (zh) | 一种基于残差网络的联合目标分类和三维姿态估计方法 | |
CN119295740A (zh) | 模型训练方法、红外弱小目标检测方法、装置及电子设备 | |
Dong et al. | Generative and contrastive combined support sample synthesis model for few-/zero-shot surface defect recognition | |
CN112102304B (zh) | 图像处理方法、装置、计算机设备和计算机可读存储介质 | |
KR20220083541A (ko) | 유사도 기반 객체 추적 방법 및 장치 | |
Kapoor et al. | Multi-sensor based object tracking using enhanced particle swarm optimized multi-cue granular fusion | |
Sekhar et al. | Automated face recognition using deep learning technique and center symmetric multivariant local binary pattern | |
Dewi et al. | Deep Learning for Advanced Similar Musical Instrument Detection and Recognition | |
CN114445775B (zh) | 训练方法、行人重识别方法、介质及电子设备 | |
CN114662581B (zh) | 对抗样本生成方法以及模型评估方法 | |
Wang et al. | Dynamic human object recognition by combining color and depth information with a clothing image histogram | |
CN119206836B (zh) | 跨年龄人脸识别方法 | |
US20250104262A1 (en) | Acquiring head dimensions using common devices | |
Chang et al. | Object recognition and tracking with maximum likelihood bidirectional associative memory networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19945644 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021546155 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19945644 Country of ref document: EP Kind code of ref document: A1 |
|
WWG | Wipo information: grant in national office |
Ref document number: 17640926 Country of ref document: US |