WO2023107753A1 - Échantillonnage de perte exclusive mutuelle et pseudo-négative pour apprentissage multi-étiquette - Google Patents
Échantillonnage de perte exclusive mutuelle et pseudo-négative pour apprentissage multi-étiquette Download PDFInfo
- Publication number
- WO2023107753A1 WO2023107753A1 PCT/US2022/054178 US2022054178W WO2023107753A1 WO 2023107753 A1 WO2023107753 A1 WO 2023107753A1 US 2022054178 W US2022054178 W US 2022054178W WO 2023107753 A1 WO2023107753 A1 WO 2023107753A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- class
- image
- label classification
- label
- classification
- Prior art date
Links
- 238000005070 sampling Methods 0.000 title claims abstract description 19
- 238000000034 method Methods 0.000 claims abstract description 47
- 230000008569 process Effects 0.000 claims abstract description 31
- 230000009471 action Effects 0.000 claims abstract description 23
- 230000000977 initiatory effect Effects 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 31
- 238000003066 decision tree Methods 0.000 claims description 2
- 238000007477 logistic regression Methods 0.000 claims description 2
- 238000007637 random forest analysis Methods 0.000 claims description 2
- 238000012706 support-vector machine Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 2
- 238000004891 communication Methods 0.000 description 17
- 238000012545 processing Methods 0.000 description 15
- 238000012549 training Methods 0.000 description 8
- 230000005540 biological transmission Effects 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 241001465754 Metazoa Species 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 241000282575 Gorilla Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 229920000747 poly(lactic acid) Polymers 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Definitions
- the disclosed technology generally relates to artificial intelligence comprising a model trained in image processing.
- the disclosed technology includes methods and systems for image classification implementing pseudo-negative sampling for multi-label learning and classification of such images.
- a standard classification system may estimate a classification for an image but have difficulty in determining other information about the image when multiple classifications would be applicable.
- Some systems rely on human annotators to help classify these images, yet human annotators find it difficult to determine all applicable labels for each image when more than one classification applies.
- detection may be intrinsically difficult. These difficult situations may include finding one or more small object instances in a high resolution image.
- FIG. 1 illustrates a multi-label modeling system in accordance with some examples of the disclosure.
- FIG. 2 is an illustrative image with multiple labels and probabilities in accordance with some examples of the disclosure.
- FIG. 3 is an illustrative image with multiple labels and probabilities in accordance with some examples of the disclosure.
- FIG. 4 is an illustrative image with multiple labels and probabilities in accordance with some examples of the disclosure.
- FIG. 5 is an example computing component that may be used to implement various features of embodiments described in the present disclosure.
- FIG. 6 depicts a block diagram of an example computer system in which various of the embodiments described herein may be implemented.
- Examples of the disclosed technology describe multi-label modeling systems and methods that can constrain the effects of false negatives while determining one or more labels of the data for an improved data classification process. These systems are configured to receive a digital image, determine or receive a multiclass single label classification of the image, determine a multi-class multi-label classification of the image using the single label classification by initiating a pseudo- negative sampling process and/or determining a mutually exclusive loss; and perform an action based on the multi-class multi-label classification of the image.
- the disclosed technology can improve on single-label classification systems that are used in various technical fields, including natural language processing, audio classification, information retrieval, and computer vision.
- Single-label classification systems can train deep convolution neural networks with multiple output predictions, where each prediction corresponds with a class of interest.
- standard single-label classification, binary cross-entropy, or SoftMax cross-entropy losses can be used to train the network.
- the set of available labels are often incomplete at training time.
- Traditional systems can assume the missing labels are negative, which can introduce a non-negligible number of false negatives. These false negatives can significantly reduce the classification accuracy.
- the systems and methods can reduce the determination of false negatives typically identified in traditional systems using the multi-label process described herein, which provides for more accurate classification predictions and fewer errors overall.
- FIG. 1 illustrates a multi-label modeling system in accordance with some examples of the disclosure.
- multi-label modeling system 100 is configured to determine one or more labels of a dataset using processor 104 and store the labels and data in memory 105.
- Processor 104 may comprise a general-purpose or special-purpose processing engine such as, for example, a microprocessor, controller, or other control logic. Processor 104 may be connected to a bus, although any communication medium can be used to facilitate interaction with other components of multi-label modeling system 100 or to communicate externally.
- a general-purpose or special-purpose processing engine such as, for example, a microprocessor, controller, or other control logic.
- Processor 104 may be connected to a bus, although any communication medium can be used to facilitate interaction with other components of multi-label modeling system 100 or to communicate externally.
- Memory 105 may comprise random-access memory (RAM) or other dynamic memory for storing information and instructions to be executed by processor 104. Memory 105 might also be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 104. Memory 105 may also comprise a read only memory (“ROM”) or other static storage device coupled to a bus for storing static information and instructions for processor 104.
- RAM random-access memory
- ROM read only memory
- Machine readable media 106 may comprise one or more interfaces, circuits, and modules for implementing the functionality discussed herein.
- Machine readable media 106 may carry one or more sequences of one or more instructions processor 104 for execution.
- Such instructions embodied on machine readable media 106 may enable multi-label modeling system 100 to perform features or functions of the disclosed technology as discussed herein.
- the interfaces, circuits, and modules of machine readable media 106 may comprise, for example, data processing module 108, multi-class single-label classification module 1 10, multi-class multi-label classification module 1 12, pseudo-negative and mutual exclusive loss engine 114, and user interface simulation engine 1 16.
- Data received or generated from these modules and engines may be stored in one or more data stores 120, including training dataset data store 122 and multi-label dataset data store 124.
- Data processing module 108 is configured to receive one or more digital images from a user device. Alternatively, the images may be received via an online photo album or content sharing platform. The images may be provided to a multi-class single-label classification module, like multi-class single-label classification module 1 10, to assign a first label to the image. The image and single-label classification may be stored in training dataset data store 122.
- Data processing module 108 is also configured to receive one or more images each with a multi-class single-label classification, although this implementation is not required.
- a third party system may generate a single-label classification using various classification methods, including Logistic Regression, Naive Bayes, Stochastic Gradient Descent, K-Nearest Neighbors, Decision Tree, Random Forest, or Support Vector Machine.
- multi-class single-label classification module 110 may identify the single-label received with the image rather than assigning one.
- Multi-class single-label classification module 1 10 is configured to determine a single label for an image.
- the label may be associated with multiple classes, where each class corresponds with a probability that the image is associated with that particular class.
- the classification may include more than two classes (e.g., dog or car when both objects are present in the image) and multi-class single-label classification module 110 may classify a set of images to one of those two classes.
- Multi-class single-label classification includes the assumption that each image is assigned to one and only one label (e.g., the image can be either a dog or a car, but not both at the same time).
- multi-class single-label classification module 1 10 may first define an input x e X and a target y E Y, where X and Y compose a dataset D (e.g., stored in training dataset data store 122).
- D e.g., stored in training dataset data store 122.
- X is an image set
- Y ⁇ 0, 1 , 0 ⁇ K, where 0 is an annotation of “unknown”, i.e. unobserved label, and K is the number of categories.
- SP ⁇ i
- yi 1 ⁇
- SN ⁇ i
- yi 0 ⁇
- S0 ⁇ i
- yi 0 ⁇ .
- the single-label classification process may assume a negative value for the label, where the unknown labels are regarded as negative. The closer the value reaches “1 ” then the corresponding label is more accurate for the image. As such, the values corresponding with “0” to “1 ” may represent the ground truth of the accuracy of the classification of the training data set to the classification labels.
- Multi-class single-label classification module 1 10 may let y AN correspond with the ground truth or the target for training or validating a machine learning model with a labeled dataset (e.g., the single-label classification for each image). During inference, the model may predict a label, which can be compared with the ground truth label, if it is available.
- the ground truth is defined as:
- multi-class single-label classification module 110 may determine a value for a class to “1 ” and the rest of the classes may be set to “0.” In the single-label classification, this may determine a single class for the image.
- the loss function may be minimized using the following formula, which incorporates the set of ground truths from above:
- Multi-class multi-label classification module 1 12 may incorporate the single-label classification into the process of determining the multi-class multi-label classification process. For example, using the modified target y AN , the engine, which may be based on the convolutional neural network, may determine the loss using either or both of pseudo-negative loss function and a mutual exclusive loss function (by pseudo-negative and mutual exclusive loss engine 1 14).
- the layers of the CNN may consist of the convolution layer, the rectified linear units (ReLU) layer, the pooling layer, the fully connected layer, and the loss layer.
- the convolution layer may use a set of multidimensional filters to extract picture attributes from the input images. Each time, the multidimensional filters may process the pictures in blocks as opposed to individually processing each pixel. This method of data processing is not only capable of preserving the continuity information of the pixels, but also of capturing the pictures' inherent characteristics.
- the ReLU layer may include a nonlinear activation function.
- Pseudo-negative and mutual exclusive loss engine 1 14 is configured to introduce non-negligible numbers of false negatives to reduce the number of false negatives in the classification in a pseudo-negative sampling of the image in relation to its classification with each of the available class labels.
- the labels chosen as false negatives may be a subset of the class labels chosen at random. For example, each time (xn, yn) occurs in a batch, choose r number of the unknown labels uniformly at random and treat it as a negative value. We repeat this step each time the pair (x n , yn) appears in a batch.
- pseudo-negative and mutual exclusive loss engine 114 may initiate a randomly chosen pseudo-negative sampling of the dataset.
- the sampling may randomly select items from the subset of the data chosen at random to be set as a false negative value.
- the selection of a random sampling of items may assume that the selection of randomly chosen values may be a true negative more often than not true.
- the loss function may comprise a multi-step process, including determining loss from a pseudo-negative loss function and a mutually exclusive loss function using pseudo-negative and mutual exclusive loss engine 1 14.
- Pseudo-negative and mutual exclusive loss engine 1 14 can determine the pseudo-negative loss value (LPM) using the following formula:
- the value fn corresponds with the prediction value of the multi-class multi-label classification that is used in determining this LPN loss value.
- LPN can effectively reduce the impact of false negatives by down-weighting terms in the loss corresponding to negative labels, and significantly improving the training process.
- Pseudo-negative and mutual exclusive loss engine 1 14 is also configured to implement a mutually exclusive loss function.
- the multiclass single-label of an image e.g., determined by multi-class single-label classification module 110
- the singlelabel may be considered a prior knowledge of the likelihood of the classification for a particular label, and other labels within the same sub-classification may be removed from the set of potential labels in a multi-label classification (e.g., determined by multiclass multi-label classification module 1 12).
- the selection of labels in the sub-class may be mutually exclusive and determined through a mutual exclusive loss function described herein.
- a label dataset may include one or more parent classes 210 (e.g., an “animal” label), with various classification groups 220 (e.g., dog label 220A, cat label 220B, and through gorilla label 220N).
- classification groups 220 e.g., dog label 220A, cat label 220B, and through gorilla label 220N.
- Various sub-groups may be defined within the classification groups 220, including for example, a first class 230A (bulldog), a second class 230B (golden retriever), and up to “N” number of classes 230N (e.g., husky, pug, etc.) such that each of the items in the dataset belong to the same animal classification group 220 (e.g., dog).
- the other classes may be excluded from the determination, including the second class 230B (golden retriever) and other classes 230N (e.g., husky, pug, etc.). Based on the prior knowledge of the single-label, the other classes in the subclass may be excluded to determine a true negative for the particular class (e.g., at 100% confidence).
- Pseudo-negative and mutual exclusive loss engine 1 14 may be configured to perform multiple iterations of sub-group classifications as labels are defined for each group.
- the lowest level sub-group classification 230 e.g., the set of bulldog, golden retriever, husky, etc.
- the labels that are mutually exclusive to each other may be the labels that are mutually exclusive to each other.
- the set of labels may each correspond with an array of confidence values that associate each of the available labels with the image with a zero to one value.
- the array values may determine a single true value and set the remaining values in the sub-class as false, or [0.0, 1 .0, 0.0].
- the values corresponding with the parent class “animal” may be mutually exclusive, but the values corresponding with other parent classes, like “time of day,” “location,” or “sport” may be set to true or false values.
- the input image may include a picture of a dog in a park during the day.
- the array of confidence values may include array with positions labeled as the type of dog (e.g., bulldog, golden retriever, husky, etc.), time of day, and the like as array values closer to “1 .0” and all other values may be closer to “0.0” with the array values as [0.0, 1 .0, 1.0,... 0.0],
- the mutual exclusive loss function may use the following formula: [0039] where is assigned 1 if the i-th class is in the same sub-class group with any other positive labels.
- the value fn corresponds with the prediction value of the multi-class multi-label classification. When the value fn is a small value,
- the calculation of the final aggregated loss may be defined as:
- 1/abs(D) is a fraction of the total loss value when the pseudo-negative labels and the mutually exclusive labels are considered and aggregated.
- the abs(D) refers to the positive value after taking the average of all the values from the single-label dataset, which is then summed to the total loss value. This may generate an average of the total dataset loss.
- the total loss value may be a combination of the pseudo-negative loss and the mutually exclusive loss, which is used to determine the multi-class multi-label classification.
- User interface simulation engine 1 16 is configured to perform an action associated with the determined multi-class labels.
- the probabilities and multi-label classification may be provided to a display of a user device or stored to multi-label dataset data store 124.
- the labels may be compared to a search query received from a user interface.
- the search terms may be compared the the multiple labels associated with each image. Any image that matches the search terms may be returned as search results to the query.
- the action may correspond with a search query.
- the user device may submit a search query to a search engine associated with a data collection of images to identify one or more images that correspond with the multi-class label, where each image may correspond with more than one label.
- the search query may include a term that matches one or more multi-class labels.
- the search results may return multiple images that include the multi-class label among a set of labels that are matched to the image.
- the labels correspond with a probability
- the images that correspond with the top number of probabilities for each label may be returned (e.g., top 10).
- the probabilities that exceed a threshold value may be returned as the search results (e.g., each of the images with a multi-class label probability exceeding 98%).
- the action may correspond with a social media platform.
- a post to the social media platform may include an image (e.g., meme or photograph) and the image may be provided to multi-label modeling system 100 to determine the multi-class labels for the image.
- the action may correspond with restricting access to the image on the social media platform and sending the image to a second review process for further analysis.
- the action may correspond with sharing the image with that user (e.g., in their news feed or as an advertisement).
- the multi-class labels may be associated with interests of the user or added to the profile of the user.
- FIG. 3 is an illustrative image with multiple labels and probabilities in accordance with some examples of the disclosure.
- multi-label modeling system 100 may receive image 310 and process the image (via data processing module 108 and multi-class single-label classification module 110). The processing of the image may identify a multi-class single-label that accompanies the transmission of image 310 or multi-label modeling system 100 may determine the single-label internally. Using the single-label classification, multi-label modeling system 100 may determine a multi-label classification for the image (via multi-class multi-label classification module 1 12), which incorporates one or more loss functions (via pseudo-negative and mutual exclusive loss engine 1 14). The output of multi-class multi-label classification module 1 12 may include multiple labels to help classify image 310 as well as the probabilities that each label corresponds with the image.
- image 310 corresponds with a nighttime volleyball game.
- Multilevel modeling system 100 may analyze and label image 310 to generate multiple label classifications 320.
- the labels may include people, night queue, sports field, and volleyball, with corresponding probabilities of 0.9052, 0.9262, 0.8566, and 0.8877, respectively.
- FIG. 4 is an illustrative image with multiple labels and probabilities in accordance with some examples of the disclosure.
- multi-label modeling system 100 may receive image 410 and process the image (via data processing module 108 and multi-class single-label classification module 110). The processing of the image may identify a multi-class single-label that accompanies the transmission of image 410 or multi-label modeling system 100 may determine the single-label internally. Using the single-label classification, multi-label modeling system 100 may determine a multi-label classification for the image (via multi-class multi-label classification module 1 12), which incorporates one or more loss functions (via pseudo-negative and mutual exclusive loss engine 1 14). The output of multi-class multi-label classification module 1 12 may include multiple labels to help classify image 410 as well as the probabilities that each label corresponds with the image.
- image 410 corresponds with an airplane on an airport runway.
- Multilevel modeling system 100 may analyze and label image 410 to generate multiple label classifications 420.
- the labels may include meadow, airplane, airport, and sky, with corresponding probabilities of 0.9074, 0.9912, 0.9942, and 0.8225, respectively.
- FIG. 5 illustrates an example computing component that may be used to implement the multi-class multi-label classification system in accordance with various embodiments.
- computing component 500 may be, for example, a server computer, a controller, or any other similar computing component capable of processing data.
- the computing component 500 includes a hardware processor 502, and machine-readable storage medium for 504.
- Hardware processor 502 may be one or more central processing units (CPUs), semiconductor-based microprocessors, and/or other hardware devices suitable for retrieval and execution of instructions stored in machine-readable storage medium 504. Hardware processor 502 may fetch, decode, and execute instructions, such as instructions 506- 516, to control processes or operations for implementing the dynamically modular and customizable computing systems. As an alternative or in addition to retrieving and executing instructions, hardware processor 502 may include one or more electronic circuits that include electronic components for performing the functionality of one or more instructions, such as a field programmable gate array (FPGA), application specific integrated circuit (ASIC), or other electronic circuits.
- FPGA field programmable gate array
- ASIC application specific integrated circuit
- a machine-readable storage medium such as machine-readable storage medium 504, may be any electronic, magnetic, optical, or other physical storage device that contains or stores executable instructions.
- machine-readable storage medium 504 may be, for example, Random Access Memory (RAM), non-volatile RAM (NVRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage device, an optical disc, and the like.
- RAM Random Access Memory
- NVRAM non-volatile RAM
- EEPROM Electrically Erasable Programmable Read-Only Memory
- machine-readable storage medium 504 may be a non-transitory storage medium, where the term "non-transitory" does not encompass transitory propagating signals.
- machine-readable storage medium 504 may be encoded with executable instructions, for example, instructions 506- 516.
- Hardware processor 502 may execute instruction 506 to receive an image.
- the system may receive one or more digital images from a user device.
- the image may be received via an online photo album or content sharing platform.
- Hardware processor 502 may execute instruction 508 to determine or receive a single-label for the image.
- the received digital image may correspond with a first label for the image that is determined by a multi-class singlelabel classification system (either internal or external to computing component 500, including multi-class single-label classification module 110 of FIG. 1 ).
- Hardware processor 502 may execute instruction 510 to determine a multi-class multi-label classification. For example, the image and single-label classification may be provided as input to determine the multi-class multi-label classification.
- the multi-class multi-label classification may be associated with instructions 512 and 514 in order to determine the loss using either or both of pseudonegative loss function and a mutual exclusive loss function.
- Hardware processor 502 may execute instruction 512 to initiate a pseudo-negative sampling. For example, non-negligible numbers of false negatives may be introduced to reduce the number of false negatives in the classification in a pseudo-negative sampling of the image in relation to its classification with each of the available class labels.
- the labels chosen as false negatives may be a subset of the class labels chosen at random.
- pseudo-negative loss value using the following formula, as described herein:
- Hardware processor 502 may execute instruction 514 to determine a mutually exclusive loss.
- the pseudo negative loss function can reduce the impact of false negatives by down-weighting terms in the loss corresponding to negative labels.
- Multiple iterations of sub-group classifications may be initiated as labels are defined for each group, with the lowest level sub-group classification including labels that are mutually exclusive to each other.
- the mutual exclusive loss function may use the following formula, as described herein:
- a set of labels may be associated with the processed image.
- the set of labels may each correspond with an array of confidence values that associate each of the available labels with the image with a zero to one value.
- Hardware processor 502 may execute instruction 516 to perform an action based on the determined multi-class multi-label classification. Various actions are possible without diverting from the scope of the disclosure.
- the action may comprise displaying probabilities and multi-label classification.
- the labels may be compared to a search query received from a user interface.
- the search terms may be compared the the multiple labels associated with each image. Any image that matches the search terms may be returned as search results to the query.
- the action may correspond with a search query.
- the user device may submit a search query to a search engine associated with a data collection of images to identify one or more images that correspond with the multi-class label, where each image may correspond with more than one label.
- the search query may include a term that matches one or more multi-class labels.
- the search results may return multiple images that include the multi-class label among a set of labels that are matched to the image.
- the labels correspond with a probability
- the images that correspond with the top number of probabilities for each label may be returned (e.g., top 10).
- the probabilities that exceed a threshold value may be returned as the search results (e.g., each of the images with a multi-class label probability exceeding 98%).
- the action may correspond with a social media platform.
- a post to the social media platform may include an image (e.g., meme or photograph), label, or classification.
- an image e.g., meme or photograph
- label e.g., label, or classification.
- a restricted category e.g., nudity or other categories that violate terms of service for the social media platform
- the action may correspond with restricting access to the image on the social media platform and sending the image to a second review process for further analysis.
- the action may correspond with sharing the image with that user (e.g., in their news feed or as an advertisement).
- the multi-class labels may be associated with interests of the user or added to the profile of the user.
- FIG. 6 depicts a block diagram of an example computer system 600 in which various of the embodiments described herein may be implemented.
- the computer system 600 includes a bus 602 or other communication mechanism for communicating information, one or more hardware processors 604 coupled with bus 602 for processing information.
- Hardware processor(s) 604 may be, for example, one or more general purpose microprocessors.
- the computer system 600 also includes a main memory 606, such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 602 for storing information and instructions to be executed by processor 604.
- Main memory 606 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 604.
- Such instructions when stored in storage media accessible to processor 604, render computer system 600 into a special-purpose machine that is customized to perform the operations specified in the instructions.
- the computer system 600 further includes a read only memory (ROM) 608 or other static storage device coupled to bus 602 for storing static information and instructions for processor 604.
- ROM read only memory
- a storage device 610 such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 602 for storing information and instructions.
- the computer system 600 may be coupled via bus 602 to a display 612, such as a liquid crystal display (LCD) (or touch screen), for displaying information to a computer user.
- a display 612 such as a liquid crystal display (LCD) (or touch screen)
- An input device 614 is coupled to bus 602 for communicating information and command selections to processor 604.
- cursor control 616 is Another type of user input device
- cursor control 616 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 604 and for controlling cursor movement on display 612.
- the same direction information and command selections as cursor control may be implemented via receiving touches on a touch screen without a cursor.
- the computing system 600 may include a user interface module to implement a GUI that may be stored in a mass storage device as executable software codes that are executed by the computing device(s).
- This and other modules may include, by way of example, components, such as software components, object- oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables.
- the word “component,” “engine,” “system,” “database,” data store,” and the like, as used herein, can refer to logic embodied in hardware or firmware, or to a collection of software instructions, possibly having entry and exit points, written in a programming language, such as, for example, Java, C or C++.
- a software component may be compiled and linked into an executable program, installed in a dynamic link library, or may be written in an interpreted programming language such as, for example, BASIC, Perl, or Python. It will be appreciated that software components may be callable from other components or from themselves, and/or may be invoked in response to detected events or interrupts.
- Software components configured for execution on computing devices may be provided on a computer readable medium, such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution).
- a computer readable medium such as a compact disc, digital video disc, flash drive, magnetic disc, or any other tangible medium, or as a digital download (and may be originally stored in a compressed or installable format that requires installation, decompression or decryption prior to execution).
- Such software code may be stored, partially or fully, on a memory device of the executing computing device, for execution by the computing device.
- Software instructions may be embedded in firmware, such as an EPROM.
- hardware components may be comprised of connected logic units, such as gates and flip-flops, and/or may be comprised of programmable units, such as programmable gate arrays or processors.
- the computer system 600 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 600 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 600 in response to processor(s) 604 executing one or more sequences of one or more instructions contained in main memory 606. Such instructions may be read into main memory 606 from another storage medium, such as storage device 610. Execution of the sequences of instructions contained in main memory 606 causes processor(s) 604 to perform the process steps described herein. In alternative embodiments, hardwired circuitry may be used in place of or in combination with software instructions.
- non-transitory media refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media.
- Non-volatile media includes, for example, optical or magnetic disks, such as storage device 610.
- Volatile media includes dynamic memory, such as main memory 606.
- non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH- EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.
- Non-transitory media is distinct from but may be used in conjunction with transmission media.
- Transmission media participates in transferring information between non-transitory media.
- transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 602.
- transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
- the computer system 600 also includes a communication interface 618 coupled to bus 602.
- Communication interface 618 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks.
- communication interface 618 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
- ISDN integrated services digital network
- communication interface 618 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN).
- LAN local area network
- Wireless links may also be implemented.
- communication interface 618 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
- a network link typically provides data communication through one or more networks to other data devices.
- a network link may provide a connection through local network to a host computer or to data equipment operated by an Internet Service Provider (ISP).
- ISP Internet Service Provider
- the ISP in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet.”
- Internet Internet
- Local network and Internet both use electrical, electromagnetic or optical signals that carry digital data streams.
- the signals through the various networks and the signals on network link and through communication interface 618, which carry the digital data to and from computer system 600, are example forms of transmission media.
- the computer system 600 can send messages and receive data, including program code, through the network(s), network link and communication interface 618.
- a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the communication interface 618.
- the received code may be executed by processor 604 as it is received, and/or stored in storage device 610, or other non-volatile storage for later execution.
- Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code components executed by one or more computer systems or computer processors comprising computer hardware.
- the one or more computer systems or computer processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS).
- SaaS software as a service
- the processes and algorithms may be implemented partially or wholly in application-specific circuitry.
- the various features and processes described above may be used independently of one another, or may be combined in various ways. Different combinations and subcombinations are intended to fall within the scope of this disclosure, and certain method or process blocks may be omitted in some implementations.
- a circuit might be implemented utilizing any form of hardware, software, or a combination thereof.
- processors, controllers, ASICs, PLAs, PALs, CPLDs, FPGAs, logical components, software routines or other mechanisms might be implemented to make up a circuit.
- the various circuits described herein might be implemented as discrete circuits or the functions and features described can be shared in part or in total among one or more circuits. Even though various features or elements of functionality may be individually described or claimed as separate circuits, these features and functionality can be shared among one or more common circuits, and such description shall not require or imply that separate circuits are required to implement such features or functionality.
- a circuit is implemented in whole or in part using software, such software can be implemented to operate with a computing or processing system capable of carrying out the functionality described with respect thereto, such as computer system 600.
- the term “or” may be construed in either an inclusive or exclusive sense. Moreover, the description of resources, operations, or structures in the singular shall not be read to exclude the plural. Conditional language, such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or steps.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne des systèmes et des procédés pour mettre en œuvre un système de classification multi-étiquette multi-classe qui limite les effets de faux négatifs tout en déterminant une ou plusieurs étiquettes des données pour un processus de classification de données amélioré. Ces systèmes sont configurés pour recevoir une image numérique, déterminer ou recevoir une classification d'étiquette unique multi-classe de l'image, déterminer une classification multi-étiquette multi-classe de l'image à l'aide de la classification d'étiquette unique en initiant un processus d'échantillonnage pseudo-négatif et/ou en déterminant une perte mutuellement exclusive ; et effectuer une action sur la base de la classification multi-étiquette multi-classe de l'image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2022/054178 WO2023107753A1 (fr) | 2022-12-28 | 2022-12-28 | Échantillonnage de perte exclusive mutuelle et pseudo-négative pour apprentissage multi-étiquette |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2022/054178 WO2023107753A1 (fr) | 2022-12-28 | 2022-12-28 | Échantillonnage de perte exclusive mutuelle et pseudo-négative pour apprentissage multi-étiquette |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023107753A1 true WO2023107753A1 (fr) | 2023-06-15 |
Family
ID=86731152
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2022/054178 WO2023107753A1 (fr) | 2022-12-28 | 2022-12-28 | Échantillonnage de perte exclusive mutuelle et pseudo-négative pour apprentissage multi-étiquette |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023107753A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116883765A (zh) * | 2023-09-07 | 2023-10-13 | 腾讯科技(深圳)有限公司 | 图像分类方法、装置、电子设备及存储介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014172777A1 (fr) * | 2013-04-22 | 2014-10-30 | Fans Entertainment Inc. | Système et procédé d'identification personnelle d'individus dans des images |
US20200320769A1 (en) * | 2016-05-25 | 2020-10-08 | Metail Limited | Method and system for predicting garment attributes using deep learning |
US20200356799A1 (en) * | 2019-05-06 | 2020-11-12 | Rovi Guides, Inc. | Systems and methods for determining whether to modify content |
-
2022
- 2022-12-28 WO PCT/US2022/054178 patent/WO2023107753A1/fr unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014172777A1 (fr) * | 2013-04-22 | 2014-10-30 | Fans Entertainment Inc. | Système et procédé d'identification personnelle d'individus dans des images |
US20200320769A1 (en) * | 2016-05-25 | 2020-10-08 | Metail Limited | Method and system for predicting garment attributes using deep learning |
US20200356799A1 (en) * | 2019-05-06 | 2020-11-12 | Rovi Guides, Inc. | Systems and methods for determining whether to modify content |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116883765A (zh) * | 2023-09-07 | 2023-10-13 | 腾讯科技(深圳)有限公司 | 图像分类方法、装置、电子设备及存储介质 |
CN116883765B (zh) * | 2023-09-07 | 2024-01-09 | 腾讯科技(深圳)有限公司 | 图像分类方法、装置、电子设备及存储介质 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kaymak et al. | A brief survey and an application of semantic image segmentation for autonomous driving | |
US11631029B2 (en) | Generating combined feature embedding for minority class upsampling in training machine learning models with imbalanced samples | |
CN112819023B (zh) | 样本集的获取方法、装置、计算机设备和存储介质 | |
CN109766557B (zh) | 一种情感分析方法、装置、存储介质及终端设备 | |
CN112749274B (zh) | 基于注意力机制和干扰词删除的中文文本分类方法 | |
CN110825969B (zh) | 数据处理方法、装置、终端和存储介质 | |
CN114358188A (zh) | 特征提取模型处理、样本检索方法、装置和计算机设备 | |
CN113590819B (zh) | 一种大规模类别层级文本分类方法 | |
CN113051911B (zh) | 提取敏感词的方法、装置、设备、介质及程序产品 | |
CN112069884A (zh) | 一种暴力视频分类方法、系统和存储介质 | |
CN111898704B (zh) | 对内容样本进行聚类的方法和装置 | |
Aziguli et al. | A robust text classifier based on denoising deep neural network in the analysis of big data | |
CN114266897A (zh) | 痘痘类别的预测方法、装置、电子设备及存储介质 | |
Hou et al. | Squared earth movers distance loss for training deep neural networks on ordered-classes | |
WO2023107753A1 (fr) | Échantillonnage de perte exclusive mutuelle et pseudo-négative pour apprentissage multi-étiquette | |
CN112364198A (zh) | 一种跨模态哈希检索方法、终端设备及存储介质 | |
CN116385791A (zh) | 基于伪标签的重加权半监督图像分类方法 | |
CN115456421A (zh) | 工单的分派方法及装置、处理器和电子设备 | |
US20240152749A1 (en) | Continual learning neural network system training for classification type tasks | |
CN117456232A (zh) | 一种基于多尺度特征的半监督少样本图像分类方法 | |
CN117152438A (zh) | 一种基于改进DeepLabV3+网络的轻量级街景图像语义分割方法 | |
Zhao et al. | Sta-gcn: Spatio-temporal au graph convolution network for facial micro-expression recognition | |
Hiriyannaiah et al. | Deep learning for multimedia data in IoT | |
Gong et al. | Erroneous pixel prediction for semantic image segmentation | |
Barbhuiya et al. | Gesture recognition from RGB images using convolutional neural network‐attention based system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22905214 Country of ref document: EP Kind code of ref document: A1 |