WO2020218314A1 - 特徴ベクトルを生成するシステム - Google Patents
特徴ベクトルを生成するシステム Download PDFInfo
- Publication number
- WO2020218314A1 WO2020218314A1 PCT/JP2020/017270 JP2020017270W WO2020218314A1 WO 2020218314 A1 WO2020218314 A1 WO 2020218314A1 JP 2020017270 W JP2020017270 W JP 2020017270W WO 2020218314 A1 WO2020218314 A1 WO 2020218314A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature vector
- class
- sample
- distance
- anchor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/907—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/907—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/908—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Definitions
- the present invention relates to a system that generates a feature vector.
- Neural networks can be trained in various ways (see, for example, Patent Document 1).
- One common method is to input an image into a neural network and calculate the activation of output neurons. If the activation is incorrect, the neural network will be updated according to the error.
- the method of using such a neural network is classification. Trained neural networks can properly classify image samples provided during training, but cannot easily extend to the classification of image data not included in the training data.
- This image feature vector can be used to train a model such as SVM (Support Vector Machine).
- the feature vectors of the second last layer or earlier layers are not trained to show a particular semantic relationship.
- the feature vectors of "cat” and “car” can be similar (small Euclidean distance), while the feature vectors of "cat” and “dog” can be dissimilar (large Euclidean distance).
- Triplet Loss fixes the anchor class and selects the image sample A of this anchor class A.
- the method selects additional image sample P from the same anchor class A and image sample N from different classes N.
- Sample P is a positive sample of Sample A and Sample N is a negative sample of Sample A.
- the neural network calculates and outputs the image feature vectors W (A), W (P) and W (N). For the output of the neural network, it is guaranteed (trained to be) that the following conditions are met:
- Triple Loss ensure that the feature vectors of the two samples of anchor class A are the same (within the margin) and that the feature vector of negative class N is different from the feature vector of anchor class A.
- Triplet Loss does not consider the similarity of meaning. Therefore, the feature vectors of "cat” and “dog” may be completely dissimilar, and the feature vectors of "cat” and “car” may be very similar.
- the system of one aspect of the present disclosure is stored in one or more storage devices that store a database including a plurality of samples, a machine learning model that outputs feature vectors of input samples, and one or more storage devices.
- the one or more processors obtain an anchor sample belonging to the first class from the database, obtain a positive sample belonging to the first class and different from the anchor sample from the database, and are different from the first class.
- a negative sample belonging to the second class is acquired, and the machine learning model is used to generate a feature vector of the anchor sample, a feature vector of the positive sample, and a feature vector of the negative sample, which are defined in advance.
- the machine learning model is trained to satisfy the above conditions.
- the condition is that the distance between the anchor sample feature vector and the positive sample feature vector is closer than the distance between the anchor sample feature vector and the negative sample feature vector, and the anchor sample feature.
- the range to be satisfied by the distance between the vector and the feature vector of the negative sample is defined based on the semantic distance between the first class and the second class in the predefined semantic space.
- a semantic feature vector of a sample can be generated.
- the logical configuration of the machine learning system is schematically shown. It is a flowchart which shows the outline of the operation of the system in Embodiment 1.
- a configuration example of a computer system constituting a machine learning system is shown.
- an example of the training image data stored in the training image database is shown.
- an example of the class is shown.
- the input data and the output data of the feature vector generation model in the training are shown.
- an example of a flowchart of training of the feature vector generation model by the model training unit is shown.
- an example of the classification result by the trained feature vector generation model is schematically shown.
- the first embodiment a more specific example of the new class is shown.
- an example of the classification result by the trained feature vector generation model is schematically shown.
- an example of using the feature vector generated by the system is shown.
- Embodiment 1 another example using the feature vector generated by the system is shown.
- the logical configuration of the machine learning system is schematically shown.
- the software configuration in the memory is shown.
- the operation of the feature vector generation model is schematically shown.
- the operation of the image generation model is schematically shown. It is a figure for demonstrating the training by the model training part of the feature vector generation model and the image generation model in Embodiment 3.
- FIG. 1 schematically shows the logical configuration of the machine learning system of the present embodiment.
- the system trains a machine learning model (feature vector generation model) that outputs a feature vector from an input image in consideration of the meaning of the input image.
- a machine learning model feature vector generation model
- An example of generating a feature vector from an input image will be described below, but the features of the present embodiment can be applied to a system that generates a feature vector from an input sample different from the image.
- Training uses Triplet Loss, a machine learning algorithm.
- the classification class (meaning) of the input image can be reflected in the relationship between the feature vectors output by the feature vector generation model.
- the system 1 includes a preprocessing unit 11, a feature vector generation model 13, an operation unit 14, and a model training unit 15.
- System 1 further includes a training image database 21, a semantic database 22, and an operational image database 23.
- FIG. 2 is a flowchart showing an outline of the operation of the system 1.
- the operation phase of the system 1 includes a training (learning) phase and an operation phase.
- the system 1 trains the feature vector generation model 13 in the training phase (S10).
- the feature vector generation model 13 outputs a feature vector to the input image.
- the training of the feature vector generation model 13 uses Triplet Loss and considers the meaning (class) of the input image.
- the system 1 generates the feature vector of the target image by using the trained feature vector generation model 13 in the operation phase (S20).
- the operation unit 14 executes a process based on the feature vector generated by the feature vector generation model 13 (S30). For example, the operation unit 14 can detect a danger from the relationship between the feature vectors of the monitoring image and give an operator warning, or can search for an image that matches the input text by the feature vector.
- the training image database 21 stores the training image data used in the training (learning) phase of the feature vector generation model 13.
- the training image database 21 is an example of a sample database for storing samples. As will be described later, the training image database 21 stores a plurality of image samples in association with a class, and stores a plurality of image samples belonging to each class of the plurality of classes.
- Semantic database 22 defines semantic similarity (semantic similarity) between classes. Semantic similarity can be expressed by the distance in the semantic space (semantic distance). There are several possible ways to calculate the semantic distance.
- An example of the semantic database 22 is a dictionary in which words (classes) are connected by a graph. The graph structure can be used to define the distance between a given word represented as a node.
- An example of such a dictionary is WordNet.
- the operation image database 23 stores the target image data input to the feature vector generation model 13 in the operation phase.
- the target image may be acquired in real time when captured by the camera without using the operational image database 23.
- the preprocessing unit 11 preprocesses the image data in order to input it to the feature vector generation model 13. For example, the preprocessing unit 11 extracts an area of interest (ROI) from an image acquired from the operation image database 23.
- ROI area of interest
- the model training unit 15 trains the feature vector generation model 13 and updates its parameters.
- the feature vector generation model 13 is a model trained (updated) by machine learning.
- the feature vector generation model 13 can have an arbitrary configuration capable of generating a feature vector from an image, and is, for example, a CNN (Convolutional Neural Network). As will be described later, the feature vector generation model 13 generates a feature vector representing a class of input images.
- FIG. 3 shows a configuration example of a computer system constituting the machine learning system 1.
- the computer system includes a training server 100, a user terminal 150, and an operating device 160, which can communicate via a network.
- the training server 100 trains the feature vector generation model 13 in the training (learning) phase.
- the operation device 160 executes a specific process by using the feature vector generation model 13 trained by the training server 100.
- the user terminal 150 is a terminal for the user to access the training server 100 or the operation device 160.
- the training server 100 includes a processor 110, a memory 120, an auxiliary storage device 130, and a network (NW) interface 145.
- the above components are connected to each other by a bus.
- the memory 120, the auxiliary storage device 130, or a combination thereof is a storage device including a non-transient storage medium.
- the network interface 145 is an interface for connecting to a network.
- the memory 120 is composed of, for example, a semiconductor memory, and is mainly used for holding programs and data.
- the program stored in the memory 120 includes a preprocessing program 121, a feature vector generation model program 123, and a model training program 125, in addition to an operating system (not shown).
- the processor 110 executes various processes according to the program stored in the memory 120.
- various functional units are realized.
- the processor 110 operates as a preprocessing unit 11, a feature vector generation model 13, and a model training unit 15 according to each of the above programs.
- the auxiliary storage device 130 stores the training image database 21 and the semantic database 22.
- the auxiliary storage device 130 is composed of a large-capacity storage device such as a hard disk drive or a solid state drive, and is used for holding programs and data for a long period of time.
- the program and data stored in the auxiliary storage device 130 are loaded into the memory 120 at startup or when necessary, and the processor 110 executes the program to execute various processes of the training server 100. Therefore, the processing executed by the functional unit in the following is the processing by the program, the processor, the computer, or the computer system.
- the operating device 160 can have a computer configuration, for example, like the training server 100.
- the operating device 160 includes a processor 161, a memory 162, an auxiliary storage device 163, and a network (NW) interface 165.
- the above components are connected to each other by a bus.
- the program stored in the memory 162 includes an operating program 124 in addition to an operating system (not shown).
- the memory 162 may store the feature vector generation model program 123 trained in the training server 100, which is transmitted from the training server 100.
- the auxiliary storage device 163 stores the operation image database 23.
- the operation program 124 uses the feature vector generation model program 123 stored in the training server 100 or the operation device 160 to generate the feature vector of the image stored in the operation image database 23, and uses the feature vector. And execute a predetermined process.
- the processor 161 operates as the operation unit 14 according to the operation program 124.
- the trained feature vector generation model program 123 When executed, it functions as the feature vector generation model 13.
- the user terminal 150 has, for example, a general computer configuration, and includes an input device and a display device (output device).
- An input device is a hardware device for a user to input instructions, information, and the like.
- the display device is a hardware device that displays various images for input / output.
- the training server 100, the user terminal 150, the operating device 160, and a combination thereof are computer systems including one or more processors and one or more storage devices, respectively.
- the user terminal 150 may be omitted, and the input device and the display device may be connected to the training server 100 or the operation device 160 without going through the network.
- the functions of the training server 100 or the operating device 160 may be distributed to a plurality of computers communicating via a network, or a plurality of user terminals 150 may be included in the system.
- FIG. 4 shows an example of training image data stored in the training image database 21.
- the training image database 21 associates each image with a corresponding class.
- the training image database 21 stores image data of a plurality of classes, and stores a plurality of images of each class.
- FIG. 4 shows, as an example, a class A image group 212A, a class B image group 212B, and a class C image group 212C. Classes A, B and C are different classes and have different meanings.
- FIG. 5 shows an example of information held in the semantic database 22.
- the semantic database 22 defines the relationships between classes.
- the semantic database 22 stores a graph showing the relationships between the classes.
- the nodes in the graph correspond to the classes, and the links between the nodes show the relationships between the classes.
- the distance between classes can be defined, for example, by the number of links present in forming the path between those classes.
- the semantic database 22 may define the relationships between the classes in a manner different from the example shown in FIG.
- FIG. 5 shows KINGQ100, MANQ200, QUEENQ300, WOMANQ400, MONARCHQ500, and HUMANQ600 as examples of classes.
- Each class is represented by a vector (consisting of one or more elements).
- the links (arrows) between the classes indicate the direct connection (DIRECT) between the classes and the relationship between the upper class and the lower class.
- the start point of the link arrow is the lower class and the end point is the upper class.
- the upper class includes the lower class.
- the lower classes are indirectly connected through the same upper class.
- the relationship Q120 between KINGQ100 and MANQ200 is a direct relationship
- MANQ200 is the upper class
- KINGQ100 is the lower class
- Relationship between QUEENQ300 and WOMANQ400 Q340 is a direct relationship
- WOMANQ400 is a higher class
- QUEENQ300 is a lower class.
- Relationship between WOMAN Q400 and HUMAN 600 Q460 is a direct relationship, with HUMAN 600 being the upper class and WOMAN Q400 being the lower class.
- Relationship between MANQ200 and HUMAN600 Q260 is a direct relationship, with HUMAN600 being the upper class and MANQ200 being the lower class.
- Relationship between KINGQ100 and MONARCHQ500 Q150 is a direct relationship, MONARCHQ500 is the upper class, and KINGQ100 is the lower class.
- Relationship between QUEENQ300 and MONARCHQ500 Q350 is a direct relationship, with MONARCHQ500 being the upper class and QUEENQ300 being the lower class.
- KINGQ100 and QUEENQ300 have an indirect relationship Q130 via MONARCHQ500.
- MANQ200 and WOMANQ400 have an indirect relationship Q240 via HUMANQ600.
- FIG. 6 shows the input data and the output data of the feature vector generation model 13 in the training.
- the feature vector generation model 13 is a neural network.
- the feature vector generation model 13 generates a feature vector W (A) 215A, a feature vector W (P) 215P, and a feature vector W (N) 215N from each of the input anchor image 213A, positive image 213P, and negative image 213N.
- the anchor image 213A, the positive image 213P, and the negative image 213N are samples selected from the training image database 21, and are images different from each other.
- the anchor image 213A and the positive image 213P belong to the same class, and the negative image 213N belongs to a different class from the anchor image 213A and the positive image 213P.
- FIG. 7 shows an example of a flowchart of training of the feature vector generation model 13 by the model training unit 15.
- the model training unit 15 trains the feature vector generation model 13 with a plurality of sets of anchor images, positive images, and negative images.
- FIG. 7 shows the flow of updating the feature vector generation model 13 using one set of anchor image, positive image, and negative image.
- the model training unit 15 executes the process shown in FIG. 7 for each of the plurality of sets.
- the model training unit 15 selects an anchor class from the training image database 21, and further selects an anchor image 213A belonging to the anchor class (S101).
- the model training unit 15 selects the positive image 213P belonging to the anchor class from the training image database 21 (S102).
- the positive image 213P is an image different from the anchor image 213A.
- the model training unit 15 selects a negative class different from the anchor class from the training image database 21, and further selects a negative image 213N belonging to the negative class (S103).
- the model training unit 15 sequentially inputs the anchor image 213A, the positive image 213P, and the negative image 213N into the feature vector generation model 13, and the feature vector W (A) 215A, the feature vector W (P) 215P, and the feature vector W (N). ) 215N is generated (S104).
- the model training unit 15 determines the distance between the generated feature vectors (S105). Specifically, the model training unit 15 has a distance D (W (A), W (P)) between the feature vector W (A) 215A and the feature vector W (P) 215P, and a feature vector W (A). The distance D (W (A), W (N)) between the feature vector W (N) 215N and the distance D between the feature vector W (P) 215P and the feature vector W (N) 215N. (W (P), W (N)) is determined. For example, the distance between feature vectors is the L2 norm of the feature vector represented in Euclidean space. Other space or distance calculation methods may be used.
- the model training unit 15 determines the semantic distance (similarity) S (A, P) between the anchor image 213A and the positive image 213P, and the semantic distance S (similarity) S (A, P) between the anchor image 213A and the negative image 213N. A and N) are determined based on the image class (S106). In this example, the model training unit 15 refers to the semantic database 22 to determine the semantic distances S (A, P) and S (A, N).
- the semantic database 22 defines the relationships between classes by means of a graph structure.
- the model training unit 15 can determine the distance between classes by, for example, the number of links of paths between classes. In this example, the semantic distance S (A, P) is zero.
- the model training unit 15 may determine the inter-class distance in different ways using semantic databases of different structures. The distance between each class represents the similarity of the meanings of the classes.
- the model training unit 15 determines whether the distance between feature vectors and the semantic distance between classes satisfy predetermined conditions (S107).
- the predetermined condition is that, for example, the distance between the feature vector of the anchor image and the feature vector of the positive image is closer than the distance between the feature vector of the anchor image and the feature vector of the negative image.
- the range to be satisfied by the distance between the feature vector of the anchor image and the feature vector of the negative image is defined based on the semantic distance between the classes in the predefined semantic space.
- a more specific example of the predetermined condition is specified by the following function.
- T is a margin, which is a preset positive threshold.
- the proportional constants (scaling factors) of the linear function, K and L, are preset positive constants, and K> L.
- the distance D (W (A), W (P)) between the feature vector W (A) 215A of the anchor image and the feature vector W (P) 215P of the positive image is a margin (threshold value). Guarantee below T. Margin T is usually a small fixed value greater than zero.
- the distance D (W (A), W (N)) between the feature vector W (A) 215A of the anchor image and the feature vector W (N) 215N of the negative image is a margin (threshold value). Guarantee to be greater than T.
- the distance D (W (A), W (P)) between the feature vector W (A) 215A of the anchor image and the feature vector W (P) 215P of the positive image ) Is smaller than the distance (W (A), W (N)) between the feature vector W (A) 215A of the anchor image and the feature vector W (N) 215N of the negative image.
- Condition (1) and condition (2) are required by Triplet Loss.
- Condition (3) and condition (4) introduce a new semantic distance S in addition to the condition of Triple Loss.
- Condition (3) sets the maximum value of the distance D (W (A), W (N)) between the feature vector W (A) 215A of the anchor image and the feature vector W (N) 215N of the negative image as a class. Defined based on the semantic distance between.
- the condition (3) is that the distance D (W (A), W (N)) between the feature vector W (A) 215A of the anchor image and the feature vector W (N) 215N of the negative image is set.
- K is a scaling factor.
- the condition (4) defines the minimum value of the distance D (W (A), W (N)) between the feature vector W (A) 215A of the anchor image and the feature vector W (N) 215N of the negative image.
- the condition (4) is that the distance D (W (A), W (N)) between the feature vector W (A) 215A of the anchor image and the feature vector W (N) 215N of the negative image is set.
- condition (2) is a scaling factor and is smaller than K. If the semantic distance between the classes is positive and condition (4) is met, then condition (2) is always met, that is, condition (2) can be omitted.
- the model training unit 15 ends the processing of the input image set.
- the model training unit 15 updates the feature vector generation model 13 based on the unsatisfied condition (S108). An appropriate feature vector can be generated under the above conditions.
- the model training unit 15 updates the parameters of the feature vector generation model 13 based on the loss due to the loss function determined from each of the conditions (1) to (4).
- the model training unit 15 can optimize the feature vector generation model 13 so as to satisfy the conditions (1) to (4) by repeatedly updating the feature vector generation model 13. Under the above conditions, it is possible to appropriately generate a feature vector based on the image class (meaning).
- the distance D (W) between the feature vector W (A) of the anchor image and the feature vector W (N) of the negative image by the feature vector generation model 13 after training (A), W (N)) will be represented by a linear function of the semantic distance S (A, N) and will be linear with respect to the semantic distance S (A, N). Further, the distance D (W (A), W (P)) between the feature vector W (A) of the anchor image and the feature vector W (P) of the positive image is represented by a constant.
- the model training unit 15 can use other conditions for updating the feature vector generation model 13. For example, condition (4) may be omitted. According to the conditions (3) and (4), the feature vectors of different classes can be arranged in the vector space in a more appropriate positional relationship according to the class. The model training unit 15 can use another function instead of the linear function of the semantic distance in order to determine the feature vector according to the class.
- i_ca_s be any sample of class ca
- i_cb_t be any sample of class cb
- i_cd_u be any sample of class cd.
- the magnitude relationship between the semantic distance S (i_ca_s, i_cb_t) and the semantic distance (i_ca_s, i_cd_u) coincides with the magnitude relationship between the feature vector distance D (i_ca_s, i_cab_t) and the feature vector distance D (i_ca_s, i_cd_u).
- the feature vector distance D (i_ca_s, i_ccb_t)> the feature vector distance D (i_ca_s, i_cd_u) is established. That is, the magnitude relationship between the class pairs of semantic distances between samples matches the magnitude relationship between the class pairs of feature vector distances between samples.
- the semantic distance is determined based on the path between classes.
- class attributes may be used to determine the semantic distance between classes.
- the attribute indicates more detailed classification information of the class. For example, instead of simply classifying a person as a "person,” give detailed information such as “male person, wearing a black suit, wearing brown shoes, having a blue bag.” Can be applied. For example, information on each item of "male”, “wearing black clothes”, “wearing a suit”, “wearing brown shoes”, “having a bag”, and "blue bag”. Is an attribute.
- this example can be divided into a plurality of attributes.
- the distance (similarity) of these samples can be determined, for example, based on the number of matching attributes in a plurality of attributes. For example, assuming that each attribute of the first sample example is "1" if it is the same, and "0" if it is different, the first sample example (male) is a vector (1, 1, 1, 1, 1, It is represented by 1, 1). From the same point of view, the other sample example (female) is represented by a vector (0, 1, 0, 0, 1, 1).
- the number of attributes with the same value is 3, and the number of attributes with different values is 3. Since the three attributes are different, the difference in similarity (distance) between the two sample examples is 3. The minimum difference between the two samples is 0, where all attributes are the same, and the maximum difference is 6 when all six attributes are different.
- FIG. 8 schematically shows an example of the classification result by the trained feature vector generation model 13.
- the feature vector generated by the feature vector generation model 13 includes the feature vector groups 301, 302 and 303, and one feature vector 304.
- the feature vector groups 301, 302 and 303 correspond to different classes of training data, respectively.
- the feature vector 304 is a sample feature vector of a new class that does not belong to any class.
- the feature vectors of the same class generated by the feature vector generation model 13 exist at positions close to each other in space. Also, the distances of feature vectors of similar classes are close.
- the feature vector 304 has a predetermined positional relationship with the feature vector groups 301, 302, and 303 belonging to a known class.
- the operation unit 14 can estimate a new class of the feature vector 304 from the positional relationship between the feature vector 304 and the feature vector groups 301, 302, and 303.
- FIG. 9 shows a more specific example of the new class. Feature vectors were calculated for the new sample and analyzed as shown in FIG.
- the feature vector generation model 13 generates the feature vector 304 from the new image sample 314.
- the feature vector 304 of the new image sample 314 is close to the feature vectors 311 and 313 of "KING” and “YOUNG", but far from the feature vector 312 of "OLD".
- the operation unit 14 can correctly classify the image sample 314 as a "prince” by referring to a semantic database that defines relationships between classes such as WordNet.
- the above description can be applied not only to new classes not included in the training data, but also to samples of classes with a small number of samples in training.
- FIG. 10 schematically shows an example of the classification result by the trained feature vector generation model 13.
- the feature vector generated by the feature vector generation model 13 includes the feature vector groups 331, 332 and 333.
- the feature vector groups 331, 332 and 333 correspond to different classes of training data, respectively. Feature vectors of the same class are placed close to each other in vector space. Therefore, the ID of the class can be determined as, for example, the position of the center of gravity of the feature vector group (cluster).
- a class can be defined by a combination of divisions of each of a plurality of sensor data items.
- a video class can be defined.
- FIG. 11 shows an example of using the feature vector generated by the system of this embodiment.
- the image of the surveillance camera is analyzed, and a warning is issued when a danger is detected.
- the preprocessing unit 11 selects ROI 401 and 402 from the surveillance video frame 400 by the surveillance camera, and the feature vector generation model 13 generates the feature vectors of each of the ROI 401 and 402.
- ROI401 is an image of a welding site
- ROI402 is an image of a gasoline tank.
- the image of the welding site has a semantic relationship with fire.
- the feature vector generated by the feature vector generation model 13 from the ROI 401 is located close to the feature vector of the fire class.
- the feature vector generated by the feature vector generation model 13 from the image 402 of the gasoline tank is located close to the feature vector of the combustible material class. From the combination of the feature vectors of ROI 401 and 402, the operation unit 14 determines that the surveillance image represents a dangerous situation, and notifies the operator of the dangerous situation.
- FIG. 12 shows another example using the feature vector generated by the system of this embodiment.
- an appropriate illustration 413 is selected from the image database 412 for the document 411 created by a person.
- the image database 412 stores new images that have not been used for training the feature vector generation model 13.
- the operation unit 14 analyzes the document 411 created by a person and determines that, for example, an illustration of a "back view of the remote controller" is necessary.
- the feature vector generation model 13 is trained to generate feature vectors representing the meanings of the "remote controller” and the "back view” respectively.
- the operation unit 14 uses the feature vector generation model 13 to generate the feature vector of the image in the image database 412, and selects the image 413 having the feature vector of the “back view of the remote controller”.
- the first embodiment makes it possible to generate a feature vector at a position close to each other in a vector space with respect to images that are semantically similar.
- the feature vector generation model 13 is trained so as to realize an arithmetic operation capable of showing the relationship between specific classes in the feature vector space. As a result, the range of processing by the feature vector can be expanded.
- Such semantic relationships can be discovered by analyzing the graphs of the classes as shown in FIG.
- Such graphs may be provided, for example, by WordNet.
- anchor class Q100 is selected.
- the anchor class Q100 is directly connected to the class Q200.
- Class Q200 is a higher class than class Q100.
- the anchor class Q100 and the class Q300 are indirectly connected.
- Class Q300 is directly connected to class Q400.
- Class Q400 is a higher class than class Q300.
- Class Q400 is indirectly connected to class Q200.
- the above condition (5) is included in the training conditions of the feature vector generation model 13 by the model training unit 15.
- the model training unit 15 updates the feature vector generation model 13 according to the error.
- the relationship between the three classes can be represented by a feature vector formula.
- Class Q100 is directly connected to class Q200 and class Q500.
- Classes Q200 and Q500 are higher classes of class Q100.
- Class Q100 exists in the center of class Q200 and class Q500.
- the above condition (6) is included in the training conditions of the feature vector generation model 13 by the model training unit 15.
- the model training unit 15 updates the feature vector generation model 13 according to the error.
- the calculation operation showing the relationship between the feature vectors of the present embodiment can be applied to, for example, a moving image.
- ⁇ Embodiment 3> as the feature vector of the input image, a visual feature vector is generated in addition to the semantic feature vector as described in the first and second embodiments.
- the combination of the semantic feature vector and the visual feature vector constitutes the feature vector output from the input image.
- FIG. 13 schematically shows the logical configuration of the machine learning system of this embodiment.
- the machine learning system includes an image generation model 16 in addition to the configuration of the first embodiment shown in FIG.
- FIG. 14 shows the software configuration in the memory 120.
- the memory 120 stores the image generation model program 126.
- the processor 110 operates as the image generation model 16 according to the image generation model program 126.
- FIG. 15 schematically shows the operation of the feature vector generation model 13.
- the feature vector generation model 13 generates a feature vector including a semantic vector and a visual vector from the input image.
- the feature vector generation model 13 generates feature vectors 511 to 514 from the input images 501 to 504, respectively.
- the above condition (5) is satisfied between the semantic feature vectors W.
- the feature vector has a semantic feature vector W and a visual feature vector W of the input image.
- a feature vector generated from two different input images of class KING has the same semantic feature vector W indicating KING and a different visual feature vector V.
- FIG. 16 schematically shows the operation of the image generation model 16.
- the image generation model 16 generates an image from the input vector.
- the image generation model 16 is, for example, a neural network.
- the image generation model 16 is input with the feature vector generated by the feature vector generation model 13 and generates a corresponding image.
- the feature vectors 601A and 601B generated by the feature vector generation model 13 are input to the image generation model 16.
- the image 611A is generated from the input feature vector 601A
- the image 611B is generated from the input feature vector 601B.
- the feature vectors 601A and 611B both have a KING semantic feature vector W and different visual feature vectors V.
- the generated images 611A and 611B are images of different KINGs.
- the visual feature vector V can be used to manipulate the appearance of the generated image sample.
- the semantic feature vector W and the visual feature vector V can be used, for example, to generate images having the same class of features and different appearances.
- the visual feature vector V holds the visual information necessary to generate image samples of the class indicated by the semantic feature vector W with different appearances.
- FIG. 17 is a diagram for explaining training by the model training unit 15 of the feature vector generation model 13 and the image generation model 16.
- the feature vector generated by the feature vector generation model 13 from the input image includes a semantic feature vector W and a visual feature vector V.
- the image generation model 16 generates an image from the feature vector generated by the feature vector generation model 13.
- the model training unit 15 can train the feature vector generation model 13 for the feature vector W under the conditions described in the first and second embodiments.
- the model training unit 15 trains the feature vector generation model 13 based on the comparison result between the input image to the feature vector generation model 13 and the image generated by the image generation model 16 for the feature vector V.
- the model training unit 15 trains the image generation model 16 based on the comparison result of the images.
- the model training unit 15 inputs the feature vector 511 generated by the feature vector generation model 13 from the image 501 into the image generation model 16.
- the image generation model 16 generates an image 651 from the input feature vector 511.
- the model training unit 15 compares the image 651 and the image 501, and trains the feature vector generation model 13 and the image generation model 16 based on the comparison result.
- Comparison of the two images can determine the scale of similarity or difference for updating the model.
- the model training unit 15 compares the color information of each pixel between the two images 501 and 651, and the color information of each pixel in the generated image sample 651 is the original pixel of the input image sample 501.
- the distance (error) from the color information of is calculated, and the two models 13 and 16 are updated based on the calculation.
- the information of the comparison result between the two images is used to update the feature vector generation model 13 to generate a better visual feature vector for generating a more accurate image sample. be able to.
- the feature vector generation model 13 may be composed of a network that generates a semantic feature vector W and a network that generates a visual feature vector V.
- the (parameter) update of the feature vector generative model 13 updates the two networks individually.
- the present invention is not limited to the above-described embodiment, and includes various modifications.
- the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations.
- it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment and it is also possible to add the configuration of another embodiment to the configuration of one embodiment.
- each of the above configurations, functions, processing units, etc. may be realized by hardware, for example, by designing a part or all of them with an integrated circuit. Further, each of the above configurations, functions, and the like may be realized by software by the processor interpreting and executing a program that realizes each function. Information such as programs, tables, and files that realize each function can be placed in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card or an SD card.
- SSD Solid State Drive
- control lines and information lines indicate what is considered necessary for explanation, and not all control lines and information lines are necessarily shown on the product. In practice, it can be considered that almost all configurations are interconnected.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2019-085609 | 2019-04-26 | ||
| JP2019085609A JP7262290B2 (ja) | 2019-04-26 | 2019-04-26 | 特徴ベクトルを生成するシステム |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020218314A1 true WO2020218314A1 (ja) | 2020-10-29 |
Family
ID=72942556
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/017270 Ceased WO2020218314A1 (ja) | 2019-04-26 | 2020-04-21 | 特徴ベクトルを生成するシステム |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP7262290B2 (https=) |
| WO (1) | WO2020218314A1 (https=) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112949384A (zh) * | 2021-01-23 | 2021-06-11 | 西北工业大学 | 一种基于对抗性特征提取的遥感图像场景分类方法 |
| CN113408299A (zh) * | 2021-06-30 | 2021-09-17 | 北京百度网讯科技有限公司 | 语义表示模型的训练方法、装置、设备和存储介质 |
| CN114399666A (zh) * | 2022-01-18 | 2022-04-26 | 深圳市东汇精密机电有限公司 | 基于小样本识别的细胞分级系统 |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024079820A1 (ja) * | 2022-10-12 | 2024-04-18 | 日本電気株式会社 | 学習装置、学習方法、プログラム、および分類装置 |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170228641A1 (en) * | 2016-02-04 | 2017-08-10 | Nec Laboratories America, Inc. | Distance metric learning with n-pair loss |
| US20190065957A1 (en) * | 2017-08-30 | 2019-02-28 | Google Inc. | Distance Metric Learning Using Proxies |
-
2019
- 2019-04-26 JP JP2019085609A patent/JP7262290B2/ja active Active
-
2020
- 2020-04-21 WO PCT/JP2020/017270 patent/WO2020218314A1/ja not_active Ceased
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170228641A1 (en) * | 2016-02-04 | 2017-08-10 | Nec Laboratories America, Inc. | Distance metric learning with n-pair loss |
| US20190065957A1 (en) * | 2017-08-30 | 2019-02-28 | Google Inc. | Distance Metric Learning Using Proxies |
Non-Patent Citations (1)
| Title |
|---|
| NI, JIAZHI ET AL.: "Fine-grained Patient Similarity Measuring using Deep Metric Learning, Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM), ACM (Association for Computing Machinery", ACM DIGITA LIBRARY, 6 November 2017 (2017-11-06), pages 1189 - 1198, XP055758902, Retrieved from the Internet <URL:https://dl.acm.org/doi/10.1145/3132847.3133022> [retrieved on 20200622] * |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112949384A (zh) * | 2021-01-23 | 2021-06-11 | 西北工业大学 | 一种基于对抗性特征提取的遥感图像场景分类方法 |
| CN112949384B (zh) * | 2021-01-23 | 2024-03-08 | 西北工业大学 | 一种基于对抗性特征提取的遥感图像场景分类方法 |
| CN113408299A (zh) * | 2021-06-30 | 2021-09-17 | 北京百度网讯科技有限公司 | 语义表示模型的训练方法、装置、设备和存储介质 |
| CN113408299B (zh) * | 2021-06-30 | 2022-03-25 | 北京百度网讯科技有限公司 | 语义表示模型的训练方法、装置、设备和存储介质 |
| US12591744B2 (en) | 2021-06-30 | 2026-03-31 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method for training semantic representation model, device and storage medium |
| CN114399666A (zh) * | 2022-01-18 | 2022-04-26 | 深圳市东汇精密机电有限公司 | 基于小样本识别的细胞分级系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2020181486A (ja) | 2020-11-05 |
| JP7262290B2 (ja) | 2023-04-21 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11741361B2 (en) | Machine learning-based network model building method and apparatus | |
| WO2020218314A1 (ja) | 特徴ベクトルを生成するシステム | |
| US9911032B2 (en) | Tracking hand/body pose | |
| US9864933B1 (en) | Artificially intelligent systems, devices, and methods for learning and/or using visual surrounding for autonomous object operation | |
| US20240036832A1 (en) | Program predictor | |
| CN113326726B (zh) | 行为识别方法、行为识别设备和计算机可读记录介质 | |
| US11468290B2 (en) | Information processing apparatus, information processing method, and non-transitory computer-readable storage medium | |
| JP2018200685A (ja) | 完全教師あり学習用のデータセットの形成 | |
| US10102449B1 (en) | Devices, systems, and methods for use in automation | |
| Oliveira et al. | 3D object perception and perceptual learning in the RACE project | |
| CN118691568A (zh) | 图像处理方法、计算机可读存储介质以及计算机终端 | |
| US12293555B2 (en) | Method and device of inputting annotation of object boundary information | |
| CN113822144B (zh) | 一种目标检测方法、装置、计算机设备和存储介质 | |
| US20210209473A1 (en) | Generalized Activations Function for Machine Learning | |
| CN110059528A (zh) | 物体间关系识别设备、学习模型、识别方法和计算机可读介质 | |
| CN119866493A (zh) | 使用查询图的自动化查询选择度预测 | |
| Devin et al. | Plan arithmetic: Compositional plan vectors for multi-task control | |
| US20240281648A1 (en) | Performing semantic matching in a data fabric using enriched metadata | |
| Ng et al. | Syntable: A synthetic data generation pipeline for unseen object amodal instance segmentation of cluttered tabletop scenes | |
| KR102943072B1 (ko) | 동작 유사도 평가 장치 및 동작 유사도 평가 방법 | |
| He et al. | Facial landmark localization by part-aware deep convolutional network | |
| US20240403601A1 (en) | Method for inductive knowledge graph embedding using relation graphs and system thereof | |
| US20240169269A1 (en) | Deploying simplified machine learning models to resource-constrained edge devices | |
| Yuan et al. | Exploring the reliability of foundation model-based frontier selection in zero-shot object goal navigation | |
| WO2024174723A1 (zh) | 模型训练的方法、装置及存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20793975 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20793975 Country of ref document: EP Kind code of ref document: A1 |