WO2017063126A1 - Method and system for clustering objects labeled with attributes - Google Patents

Method and system for clustering objects labeled with attributes Download PDF

Info

Publication number
WO2017063126A1
WO2017063126A1 PCT/CN2015/091771 CN2015091771W WO2017063126A1 WO 2017063126 A1 WO2017063126 A1 WO 2017063126A1 CN 2015091771 W CN2015091771 W CN 2015091771W WO 2017063126 A1 WO2017063126 A1 WO 2017063126A1
Authority
WO
WIPO (PCT)
Prior art keywords
attributes
cluster
objects
splitting
label
Prior art date
Application number
PCT/CN2015/091771
Other languages
English (en)
French (fr)
Inventor
Xiaogang Wang
Wanli OUYANG
Hongyang Li
Xingyu ZENG
Original Assignee
Xiaogang Wang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaogang Wang filed Critical Xiaogang Wang
Priority to PCT/CN2015/091771 priority Critical patent/WO2017063126A1/en
Priority to CN201580084335.9A priority patent/CN108351971B/zh
Publication of WO2017063126A1 publication Critical patent/WO2017063126A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns

Definitions

  • the present application relates to a method and a system for clustering objects labeled with attributes in a feature learning system of an object detection device.
  • the Sun attribute database is for scene recognition. Other datasets describe the attributes of objects from different aspects. There are also many datasets that provide attributes per sample.
  • Deep learning is shown to be effective on large-scale object detection and recognition. It is found that the features learned from largescale classification data can be applied to many other vision tasks. However, the use of attributes in improving feature learning for object detection is not investigated in prior art.
  • ImageNet object detection dataset Based on the ImageNet object detection dataset, it annotates the rotation, viewpoint, object part location, part occlusion, part existence, common attributes, and class-specific attributes. Then the present application proposes to use this dataset to train deep representations and extensively evaluate how these attributes are useful on the general object detection task. In order to make better use of the attribute annotations, a deep learning scheme is proposed by modeling the relationship of attributes and hierarchically clustering them into semantically meaningful mixture types.
  • one of the semantic factors is independently selected to split the cluster during each time of the splitting.
  • a feature learning method comprising:
  • training image features based on the artificially obtained object class label, the obtained object cluster label, the predicted object class label, and the predicted cluster label.
  • a system for clustering objects labeled with attributes comprising:
  • an obtaining unit configured for obtaining attributes for a plurality of objects
  • a summarizing unit electronically communicated with the obtaining unit and configured for summarizing the obtained attributes into a plurality of semantic factors
  • splitting unit electronically communicated with the summarizing unit and configured for splitting the objects into more than one clusters
  • the splitting unit is further configured to split at least one of the clusters for one or more times;
  • splitting unit comprises a selector, which is configured for independently selecting one of the semantic factors to split the cluster during each time of the splitting.
  • a feature learning system comprising
  • a classing unit configured for dividing objects labeled with attributes into one or more object classes to obtain a object class label for each object
  • an attribute clustering unit configured for clustering independently objects in each of the object classes into different clusters using the system of claim 11 so as to obtain an object cluster label for each object
  • a predictive unit configured for predicting, for a given image, a predicted object class and a predicted cluster label
  • an obtaining unit configured for obtaining an artificially object class label of the given image
  • a training unit configured for training image features based on the artificially obtained object class label, the obtained object cluster label, the predicted object class label, and the predicted cluster label.
  • a system for clustering objects labeled with attributes comprising:
  • a processor configured to execute the instructions to,
  • one of the semantic factors is independently selected to split the cluster during each time of the splitting.
  • Fig. 1 shows examples of object detection.
  • Fig. 2 shows the overall pipeline of the feature learning system in some embodiments.
  • Fig. 3 shows attribute annotation samples for lion, otter, and car. Best viewed in color. Rotation is quantized into 8 directions (a) . Viewpoint is a 6 dimensional vector (b) , where front means main flat side. The prototypes for orientation and viewpoint are defined (c) . And then each bounding box is annotated (d) . Outdoor/indoor, interaction with person, tight shot, and see inside are common attributes for all classes. Female for lion, floating on water for otter, and old fashioned for car are class-specific attributes for single or small groups of classes.
  • Fig. 4 is Algorithm 1 showing the factor guided hierarchical clustering.
  • Fig. 5 shows factor guided hierarchical clustering for the object class bus. For splitting samples into clusters, veiwpoint is used first, then part existence is used, and then rotataion is used.
  • Fig. 6 shows the predictive unit at the training stage.
  • Fig. 7 shows the predictive unit at the testing stage.
  • Fig. 8 shows the train unit.
  • Fig. 9 shows investigation on different approaches in using attributes on ILSVRC2014 val 2 .
  • Fig. 10 shows investigation on using multiple attribute mixture sets on ILSVRC2014 val 2 .
  • Fig. 11 shows visualizing feature maps that are most correlated to the object classes otter and person.
  • the feature maps learned with attributes are better in handling the appearance variation and distinguishing objects from background. Best viewed in color.
  • Fig. 12 shows Examples with high prediction scores on attribute mixture types. The images are cropped so the attributes can be seen better. Best viewed in color.
  • Fig. 13 is an illustration of an example computing device, all arranged in accordance with at least some embodiments of the present disclosure.
  • the disclosure relates to object detection, of which the aim is to automatically detect objects on a given face image, such as person, dog, and chair.
  • a given face image such as person, dog, and chair.
  • Object representations are vital for object recognition and detection. There is remarkable evolution on representations for objects, scenes, and humans. Much of this progress was sparked by the creation of datasets. In the disclosures of the present application, a large-scale object attribute dataset is constructed. The motivation is two-folds.
  • this database provides labels that facilitate analysis on the appearance variation of images.
  • the intra-class variation is one of the most important factors that influence the accuracy in object detection and recognition.
  • Objects of the same class are very different in appearance due to the variation in rotation, viewpoint, part deformation, part existence, background complexity, interaction with other objects, and other factors.
  • the viewpoint change and part existence by using aspect ratio.
  • images of the same aspect ratio can be very different in appearance because of the factors mentioned above.
  • a direct way of revealing the factors that influence appearance variation is to explicitly annotate them. Therefore, the ImageNet object detection data are annotated, which has been most widely used in generic object detection nowadays, with these attributes.
  • Attributes are correlated, e.g. rotation is related to part location, and should be modeled jointly. Samples are clustered into attribute groups, which leads to different attribute mixture types. The deep model is trained to predict the attribute mixture types.
  • a hierarchical cluster tree is constructed by selecting a single attribute factor for division at each time. From the top to the bottom of the hierarchical tree, it is easy to rank the importance of the attribute factors that cause variation.
  • an attribute label system which investigates the attributes that facilitate analysis on the appearance variation of images. It is well-known that the intra-class variation is one of the most important factors that influence the accuracy in object detection and recognition. Attributes provided are factors that influence appearance variation.
  • a feature learning system which allows us to learn feature representation from large-scale attribute dataset. The learned feature is then used for detecting objects.
  • the attributes of objects provides information for reasoning the appearance variation of objects.
  • the attributes of objects include rotation, locations of object parts (wheel for a car) , existence of object parts in the image, and other attributes (wings open for bird) .
  • Feature representations are learned from attributes are better in describing objects.
  • the system 2000 may comprise an attribute clustering unit 201, a first predictive unit 202, and a training unit 203 and a second predictive unit 204, as shown in Fig. 2.
  • the attribute clustering unit 201 The attribute clustering unit 201
  • the attribute clustering unit 201 is a unity configured for practicing the attribute clustering method of the disclosure.
  • the algorithm may comprise:
  • a cluster denoted by C, is chosen from the cluster set V to be split, and then one of the semantic attribute groups is chosen for splitting the C into several clusters.
  • step ii is repeated until all clusters C satisfy one of the two conditions: 1) the number of samples in C are smaller than a threshold T or 2) the cluster C is divided by more than D times.
  • the final cluster set V is used as the cluster label of training samples.
  • the objects are divided into more than one object classes, and the obtaining, the summarizing, and the splitting are performed separately for each object class.
  • the attributes are summarized into one or more of the following semantic factors:
  • viewpoint attributes which are the out-of-plane rotation of an object
  • class-specific attributes specifically used for a single class or a small group of classes
  • a depth, as defined by times of splitting used for obtaining a cluster from the set of all objects, of any one cluster is no more than a maximum depth.
  • a size, as defined by the number of objects in a cluster, of any one cluster is no less than a minimum size.
  • each of the semantic factors is selected such that the splitting has the best uniformness.
  • each of semantic factors may be selected by:
  • each of semantic factors may be selected by:
  • C k, i denotes the number of elements in C k, i , N denotes number of clusters in one candidate split, k and are running indexes used in the summations;
  • each splitting is achieved by agglomerative clustering on a directed graph, affinity propagation, spectral clustering, or normalized cut.
  • each splitting is achieved by agglomerative clustering on a directed graph, wherein
  • the directed graph is constructed using K-nearest neighbor, wherein each object is a node, the directed edge from the mth node is used for measuring the similarity between the mth sample and the nth sample;
  • the closeness measure of clusters is defined via the indegree and outdegree on the graph.
  • the splitting is achieved by agglomerative clustering on a directed graph, wherein
  • directed graph is constructed using K-nearest neighbor (K-NN) , wherein each sample is a node, the directed edge from the mth node is used for measuring the similarity between the mth sample and the nth sample as follows:
  • ⁇ 2 is the mean Euclidean distance of all f i n and f i m are the ith attribute factors for the nth and mth sample respectively;
  • the closeness measure of clusters is defined via the indegree and outdegree on the graph.
  • system of the disclosure further comprises a dividing unit for dividing the objects into more than one object classes, and the obtaining, the summarizing, and the splitting are performed separately for each object class.
  • the attributes are summarized into one or more of the following semantic factors:
  • viewpoint attributes which are the out-of-plane rotation of an object
  • class-specific attributes specifically used for a single class or a small group of classes
  • the selector comprises
  • an obtaining unit configured for obtaining a candidate split using each of the semantic factors
  • a calculating unit configured for calculating an evaluation score for each candidate split
  • a comparing unit configured for comparing the evaluation scores to find the maximum evaluation score and the corresponding semantic factor using which the candidate split having the maximum evaluation score is obtained.
  • the selector comprises
  • a calculating unit for calculating the evaluation score E (S i ) of ith candidate split S i (C 1, i , ..., C N, i ) by
  • C k, i denotes the number of elements in C k, i , N denotes number of clusters in one candidate split, k and are running indexes used in the summations;
  • the following procedure is used to train deep models:
  • M-class for example, 1000-class
  • the M-class ImageNet classification and localization dataset is used for pretraining the model because it is found to be effective for object detection.
  • L o hinge loss for classifying objects as one of the 200 classes or background.
  • w o, c is the classifier for object class c
  • h n is the feature from the deep model for the nth sample.
  • ⁇ b j L a, j is the loss for attribute estimation
  • b j is the pre-defined weight for the loss ⁇ b j L a, j .
  • label y j, n is continuous, e.g. part location, the square loss is used in (1) for its prediction for the jth attribute loss and nth sample.
  • label y j, n is discrete, e.g. part existence or attribute mixture type
  • the cross-entropy loss is used.
  • the deep model degenerates to the normal object detection framework without attributes.
  • the deep model not only needs to distinguish the object classes from background for the loss Lo but also needs to predict the labels from attributes for the loss ⁇ b j L a, j . Samples without attribute labels are constrained to not having loss L a, j so that they will not influence the attribute learning.
  • the training samples of an object class are divided into many attribute mixture types using the attributes. Then the deep model is used for predicting the attribute mixture type of training samples using the cross-entropy loss.
  • f rot denotes rotation
  • f view denotes viewpoint
  • f com denotes common attributes
  • f spec denotes class-specific attributes
  • f loc denotes object part location and occlusion
  • f ext denotes object part existence.
  • a hierarchical clustering tree is built for the samples of an object class.
  • the algorithm is summarized in Figure 4.
  • the clustering is done in a divisive way. There is only one cluster that contains all samples initially. Splits are performed recursively as one moves down on the hierarchical tree. At each stage, a cluster C is chosen to be split, and then one of the 6 semantic attribute factors is chosen for splitting the C into several clusters. Then the other cluster is selected for further splitting until no cluster satisfies the requirement on depth and sample size in a cluster.
  • the clustering result obtained for the class bus is shown in Fig. 5.
  • clustering is done separately for each class so that different classes can choose different semantic attribute factors.
  • the selected sample set C is split into several clusters such that samples in the same cluster are more similar to each other than to those in other clusters (lines 5-7 in Algorithm as shown in Fig. 4) .
  • the clustering approach used for splitting C into N clusters constructs directed graph using K-nearest neighbor (K-NN) .
  • K-NN K-nearest neighbor
  • each sample is a node
  • the directed edge from the nth node to the mth node is used for measuring the similarity between the mth sample and the nth sample as follows:
  • ⁇ 2 is the mean Euclidean distance of all f i n and f i m are the ith attribute factors for the nth and mth sample respectively.
  • the closeness measure of clusters is defined via the indegree and outdegree on the graph. In some embodiments, this approach is preferred. In other embodiments, affinity propagation, spectral clustering, and normalized cut on many benchmark image datasets can also be employed.
  • the candidate split with the maximum evaluation score E (S i ) among the six candidate splits ⁇ S 1 , ..., S 6 ⁇ is selected for splitting C (lines 8-9 in Figure 4).
  • E (S i ) is the entropy of the split as follows:
  • C k, i denotes the number of elements in C k, i .
  • E (S i ) measures the quality of the candidate split.
  • the reason of dividing samples into clusters is to group samples that are similar in appearance.
  • the candidate splits are obtained for small within-cluster dissimilarity.
  • uniformness of the clusters is important but not considered.
  • the ImageNet classification dataset has almost the same number of samples (for example, 1300 samples for 90%classes) in each class for training.
  • the training samples are constrained to be not larger than 1000 for training the deep model on ImageNet detection dataset.
  • the entropy is used in our algorithm for measuring the uniformness of cluster size. The larger the entropy, the more uniform the cluster size, and thus the better the captured variation in attributes.
  • candidate group S 1 have the C split into clusters having samples with percentage 30%, 35%and 35%
  • candidate group S 2 have the C split into samples with percentage 90%, 9%and 1%.
  • Candidate group S 2 is considered as worse than S 1 .
  • S 2 has 90%samples within a cluster and does not capture the main factor in variation.
  • cluster with 2%samples in S 2 has too few samples to be learned well while the cluster with 90%samples will dominate the feature learning. Therefore, the S 1 is a better choice and will be chosen by our approach in this case.
  • the returned cluster number will be one and has the minimum entropy. Therefore, these attribute factors will not be selected for clustering.
  • the cluster C for splitting is constrained to have more than M samples and tree depth less than D.
  • D, M, and N are used for controlling the number of samples within a cluster. If the number of samples within a cluster is too small, it is hard to be well trained.
  • a training data is a triplet (I, y, a) , I denotes the input image, y denotes object class label of the image, a denotes the attributes of the image.
  • the attributes of objects are used as features for clustering objects of the same class into several clusters.
  • bus is clustered into three clusters: 1) bus with horizontal view and all parts existing in the image; 2) bus with horizontal view and only the frontal half part existing in the image; 3) bus with tilted view and all parts existing in the image.
  • the output is the cluster label for each training sample.
  • the first predictive unit 201 predicts the object class label and the cluster label.
  • the label of the image obtained from the attribute clustering unit and label of the image predicted from the first predictive unit 201 are used by the training unit 203 for training.
  • the parameters trained from the training unit 203 are used for extracting features from a given image.
  • the second predictive unit 204 is configured to use the extracted features to predict the class label of the given image.
  • the attribute clustering system 2000 may be provided and configured to cluster an object class into several clusters with the guidance of attributes.
  • the proposed system 2000 is input with attribute label of training images.
  • the attributes may be summarized into different groups.
  • one group of attributes is rotation. It corresponds to in-plane rotation of an object, as shown in Fig. 3 (a) . Rotation is discretized into a plurality of (for example, 8) directions.
  • one group of attributes is viewpoint. It corresponds to out-of-plane rotation of an object, as shown in Fig. 3 (b) .
  • Viewpoint can be multi-valued. For example, one can see both front and left side of a car.
  • the reference object orientation is chosen such that in most cases the objects undergo no rotation, in frontal view, and have most of their parts not self-occluded.
  • the viewpoint has semantic meaning on whether a person or animal is facing the camera.
  • one group of attributes includes common attributes. These attributes are shared across all the object classes. Two examples of these attributes are: 1) Indoor or outdoor, which is scene-level contextual attribute. 2) Complex or simple background, which is a background attribute. 3) Tight shot, in which the camera is very close to the object and leads to perspective view change. In this case, usually most object parts do not exist. 4) Internal shot, which is true for images captured in a car and false for images captured out of a car. 5) Almost all parts occluded, in which more than 70%of an object is hidden in the bounding box. 6) Interaction with person, which is an important context for objects like crutch, stretcher, horse, harmonica, and bow.
  • one group of attributes includes class-specific attributes. It refers to attributes specifically used for a single class or a small group of classes. Attributes are chosen that result in large appearance variation. For example, binary attributes “long ear” and “fluffy” for dog, “mouth open” for hippopotamus, “switched on with content on screen” for monitor, “wings open” for dragon fly and bird, “with lots of books” for bookshelf, and “floating on the water” for whale.
  • Fig. 3 shows some class specific attributes. There are 314 class-specific attributes defined in total. Common attributes and class-specific attributes provide rich semantic information for describing objects.
  • one group of attributes includes object part location and occlusion.
  • object part location Different object classes have different parts. For example, for lions and otters as shown in Fig. 3, the parts are mouth, neck, hip, and four legs. For cars as shown in Fig. 3, the parts are the four wheels and the four corners of the car roof. Variation in part location corresponds to deformation of object parts. It is found on 6 animal classes that part location supervision is helpful. The part location not only is useful in disentangling the factors that influence appearance variation, but also facilitates further applications like action recognition, animation, content based video and image retrieval. Object parts may be occluded, which results in distortion of the visual cues of an object. Therefore, the occlusions of object parts are annotated and represented by gray circles in Fig. 3.
  • one group of attributes is object part existence.
  • its parts may not be in the bounding box because of occlusion or tight-shot.
  • a lion image with only head is labeled as lion and a lion image with the full body is also labeled as lion.
  • these two images have large appearance variation.
  • the appearance mixtures like half body and full body for persons correspond to different object part existence.
  • the attribute clustering system output cluster label of training samples.
  • the predictive unit at the training stage is input with training images, and it outputs the predicted cluster label and object class label.
  • step S601 input image is cropped by a bounding box and warped into the predefined size required by the convolutional neural network; at step S602, given an input image cropped by bounding box, the features are extracted from a convolutional neural network; at step S603, the features are used for predicting the cluster label and object class label.
  • the predictive unit at the training stage is shown in Fig. 6.
  • the predictive unit at the testing stage is input with testing images, and it outputs the predicted cluster label and object class label.
  • input image is cropped by a bounding box and warped into the predefined size required by the convolutional neural network.
  • the features are extracted from a convolutional neural network.
  • the features are used for predicting the cluster label and object class label
  • the unit at the testing stage does not predict the cluster label.
  • the training unit is input with image, ground-truth cluster label and object class label, and the predicted cluster label and cluster label. It outputs learned parameters of the convolutional neural network, and finally trained parameters used by the predictive unit. As shown in Fig. 8, training steps of predictive unit comprise
  • attributes are useful in discriminating intra-class variation and improving feature learning.
  • the deep representation learned with attributes as supervision improves object detection accuracy on large-scale object detection datasets.
  • Different ways of using attributes are investigated through extensive experiments. It is found that it is more effective to learn feature representations by predicting attribute mixture types than predicting attributes directly.
  • the factor guided hierarchical clustering that constructs semantically meaningful attribute mixture types.
  • the attributes are grouped into several attribute factors.
  • the attribute factor that best represents the appearance variation is selected for dividing the samples into clusters.
  • the importance of attributes in representing variation can be ranked.
  • Fig. 13 is a block diagram illustrating an example computing device 900 in accordance with various implementations of the present disclosure.
  • computing device 900 typically includes one or more processors 910 and system memory 920.
  • a memory bus 930 can be used for communicating between the processor 910 and the system memory 920.
  • system memory 920 can be of any type including but not limited to volatile memory (such as RAM) , non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof.
  • System memory 920 typically includes an operating system 921, one or more applications 922, and program data 924.
  • Application 922 may include instructions 923 that are arranged to perform the functions as described herein including the actions described with respect to the flow charts shown in Figs 2, 4, and 6-8.
  • Program Data 924 may include electro-remediation (ER) data 925, such as voltages, voltage pulsing schemes etc. that may be useful for implementing instructions 923.
  • application 922 can be arranged to operate with program data 924 on an operating system 921 such that implementations of the present disclosure, as described herein, may be provided. This described basic configuration is illustrated in FIG. 9 by those components within dashed line 901.
  • Computing device 900 can have additional features or functionality, and additional interfaces to facilitate communications between the basic configuration 901 and any required devices and interfaces.
  • a bus/interface controller 940 can be used to facilitate communications between the basic configuration 901 and one or more data storage devices 950 via a storage interface bus 941.
  • the data storage devices 950 can be removable storage devices 951, non-removable storage devices 952, or a combination thereof. Examples of removable storage and nonremovable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD) , optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD) , and tape drives to name a few.
  • Example computer storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 900. Any such computer storage media can be part of device 900.
  • Computing device 900 can also include an interface bus 942 for facilitating communication from various interface devices (e.g., output interfaces, peripheral interfaces, and communication interfaces) to the basic configuration 901 via the bus/interface controller 940.
  • Example output interfaces 960 include a graphics processing unit 961 and an audio processing unit 962, which can be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 963.
  • Example peripheral interfaces 960 include a serial interface controller 971 or a parallel interface controller 972, which can be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 973.
  • An example communication interface 980 includes a network controller 981, which can be arranged to facilitate communications with one or more other computing devices 990 over a network communication via one or more communication ports 982.
  • a network communication connection is one example of
  • Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media.
  • a "modulated data signal" can be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media can include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF) , infrared (IR) and other wireless media.
  • RF radio frequency
  • IR infrared
  • the term computer readable media as used herein can include both storage media and communication media.
  • Computing device 900 can be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, smart phone, a personal data assistant (PDA) , a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
  • a small-form factor portable (or mobile) electronic device such as a cell phone, smart phone, a personal data assistant (PDA) , a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
  • Computing device 900 can also be implemented as a personal computer including both laptop computer and non-laptop computer configurations or implemented in a workstation or a server configuration.
  • a signal bearing medium examples include, but are not limited to, the following: a recordable type medium such as a flexible disk, a hard disk drive (HDD) , a Compact Disc (CD) , a Digital Versatile Disk (DVD) , a digital tape, a computer memory, etc. ; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.) .
  • a recordable type medium such as a flexible disk, a hard disk drive (HDD) , a Compact Disc (CD) , a Digital Versatile Disk (DVD) , a digital tape, a computer memory, etc.
  • a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.) .
  • a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and non-volatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities) .
  • a typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing/communication systems.
  • any two components so associated can also be viewed as being “operably connected” , or “operably coupled” , to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being “operably couplable′′ , to each other to achieve the desired functionality.
  • operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.
  • the term “optimize” may include maximization and/or minimization.
  • the term “minimization”and/or the like as used herein may include a global minimum, a local minimum, an approximate global minimum, and/or an approximate local minimum.
  • the term “maximization” and/or the like as used herein may include an global maximum, a local maximum, an approximate global maximum, and/or an approximate local maximum.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
PCT/CN2015/091771 2015-10-12 2015-10-12 Method and system for clustering objects labeled with attributes WO2017063126A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2015/091771 WO2017063126A1 (en) 2015-10-12 2015-10-12 Method and system for clustering objects labeled with attributes
CN201580084335.9A CN108351971B (zh) 2015-10-12 2015-10-12 对标记有属性的对象进行聚类的方法和系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/091771 WO2017063126A1 (en) 2015-10-12 2015-10-12 Method and system for clustering objects labeled with attributes

Publications (1)

Publication Number Publication Date
WO2017063126A1 true WO2017063126A1 (en) 2017-04-20

Family

ID=58517741

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/091771 WO2017063126A1 (en) 2015-10-12 2015-10-12 Method and system for clustering objects labeled with attributes

Country Status (2)

Country Link
CN (1) CN108351971B (zh)
WO (1) WO2017063126A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177954B (zh) * 2021-04-28 2022-07-26 中南大学 一种图像处理方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8391618B1 (en) * 2008-09-19 2013-03-05 Adobe Systems Incorporated Semantic image classification and search
CN103186538A (zh) * 2011-12-27 2013-07-03 阿里巴巴集团控股有限公司 一种图像分类方法和装置、图像检索方法和装置
US20150206315A1 (en) * 2014-01-21 2015-07-23 Adobe Systems Incorporated Labeling Objects in Image Scenes
US9141883B1 (en) * 2015-05-11 2015-09-22 StradVision, Inc. Method, hard negative proposer, and classifier for supporting to collect hard negative images using a similarity map

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8229957B2 (en) * 2005-04-22 2012-07-24 Google, Inc. Categorizing objects, such as documents and/or clusters, with respect to a taxonomy and data structures derived from such categorization
US7716251B2 (en) * 2003-12-02 2010-05-11 International Business Machines Corporation Systems and method for indexing, searching and retrieving semantic objects
US7774288B2 (en) * 2006-05-16 2010-08-10 Sony Corporation Clustering and classification of multimedia data
CN102314519B (zh) * 2011-10-11 2012-12-19 中国软件与技术服务股份有限公司 一种基于公安领域知识本体模型的信息搜索方法
CN103810266B (zh) * 2014-01-27 2017-04-05 中国电子科技集团公司第十研究所 语义网络目标识别判证方法
CN104156438A (zh) * 2014-08-12 2014-11-19 德州学院 一种基于置信度和聚类的未标记样本选择的方法
CN104516975B (zh) * 2014-12-29 2019-03-22 中国科学院电子学研究所 面向多元数据的自动关联方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8391618B1 (en) * 2008-09-19 2013-03-05 Adobe Systems Incorporated Semantic image classification and search
CN103186538A (zh) * 2011-12-27 2013-07-03 阿里巴巴集团控股有限公司 一种图像分类方法和装置、图像检索方法和装置
US20150206315A1 (en) * 2014-01-21 2015-07-23 Adobe Systems Incorporated Labeling Objects in Image Scenes
US9141883B1 (en) * 2015-05-11 2015-09-22 StradVision, Inc. Method, hard negative proposer, and classifier for supporting to collect hard negative images using a similarity map

Also Published As

Publication number Publication date
CN108351971A (zh) 2018-07-31
CN108351971B (zh) 2022-04-22

Similar Documents

Publication Publication Date Title
US11816888B2 (en) Accurate tag relevance prediction for image search
US11210470B2 (en) Automatic text segmentation based on relevant context
Nguyen et al. Weakly supervised discriminative localization and classification: a joint learning process
US9639746B2 (en) Systems and methods of detecting body movements using globally generated multi-dimensional gesture data
Bhaumik et al. Hybrid soft computing approaches to content based video retrieval: A brief review
US10430649B2 (en) Text region detection in digital images using image tag filtering
CN113158023B (zh) 基于混合推荐算法的公共数字生活精准分类服务方法
US20170236055A1 (en) Accurate tag relevance prediction for image search
Hoai et al. Learning discriminative localization from weakly labeled data
Malgireddy et al. Language-motivated approaches to action recognition
WO2017139764A1 (en) Zero-shot event detection using semantic embedding
CN110942011B (zh) 一种视频事件识别方法、系统、电子设备及介质
Stern et al. Most discriminating segment–Longest common subsequence (MDSLCS) algorithm for dynamic hand gesture classification
CN110008365B (zh) 一种图像处理方法、装置、设备及可读存储介质
CN113553906A (zh) 基于类中心域对齐的判别无监督跨域行人重识别方法
Ouyang et al. Learning deep representation with large-scale attributes
Pouyanfar et al. Semantic concept detection using weighted discretization multiple correspondence analysis for disaster information management
Amir et al. A multi-modal system for the retrieval of semantic video events
Putra et al. Analysis K-Nearest Neighbor Method in Classification of Vegetable Quality Based on Color
Liu et al. Transfer latent SVM for joint recognition and localization of actions in videos
CN105740879A (zh) 基于多模态判别分析的零样本图像分类方法
Adly et al. Issues and challenges for content-based video search engines a survey
US9014420B2 (en) Adaptive action detection
WO2017063126A1 (en) Method and system for clustering objects labeled with attributes
CN111507243B (zh) 一种基于格拉斯曼流形分析的人体行为识别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15906011

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15906011

Country of ref document: EP

Kind code of ref document: A1