WO2017124336A1 - Method and system for adapting deep model for object representation from source domain to target domain - Google Patents
Method and system for adapting deep model for object representation from source domain to target domain Download PDFInfo
- Publication number
- WO2017124336A1 WO2017124336A1 PCT/CN2016/071501 CN2016071501W WO2017124336A1 WO 2017124336 A1 WO2017124336 A1 WO 2017124336A1 CN 2016071501 W CN2016071501 W CN 2016071501W WO 2017124336 A1 WO2017124336 A1 WO 2017124336A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- objects
- fine
- criterions
- deep model
- target domain
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
Definitions
- the disclosures relate to a method and a system for adapting a deep model for object representation from a source domain to a target domain.
- Deep learning approaches have achieved substantial advances for object (e.g., face, dogs, basketball) recognition.
- contemporary deep models for example, deep convolution networks
- the annotated data in the unseen target domain is usually not sufficient for training a new deep model.
- These problems limit the deep learning in the applications, such as object tracking, retrieval, and clustering in unseen images/videos.
- face clustering in movies i.e., grouping detected faces into different subsets according to different characters. Clustering faces in movies is extremely challenging since characters’ a ppearance may vary drastically under different scenes as the story progresses.
- Deep learning approaches have achieved substantial advances for object representation learning. These methods could arguably provide a more robust representation to object recognition.
- contemporary deep models for object recognition are trained with web images or photos from albums. These models overfit to the training data distributions and thus will not be directly generalisable to application in different target domain.
- a method for adapting a deep model for object representation from a source domain to a target domain comprising: extracting, by a deep model for the source domain, features for objects from input images for the target domain; inferring group labels for objects according to the extracted features; discovering criterions based on target domain priors derived from the input images and the inferred group labels, wherein the criterions contain information indicating which objects should not be inferred to have a same group label; and fine-tuning the deep model for the source domain according to the discovered criterions, wherein the fine-tuned deep model is outputted as a deep model for the target domain.
- the extracting, the inferring, the discovering, and the fine-tuning are implemented in an iterative feedback loop that is performed for predetermined times, wherein in starting iteration of the iterative feedback loop, the features for objects are extracted from input images for the target domain by the deep model for the source domain, in iterations following the starting iteration, the features for objects are extracted from input images for the target domain by the fine-tuned deep model fine-tuned in a previous iteration of the iterative feedback loop.
- the inferring comprises: computing, according to the exacted features of the objects, a judgment score for each of candidate group label distributions for the objects; determining a candidate group label distribution having highest judgment score; and inferring, based on the determined distribution, group labels for objects, wherein the higher the similarity between the features of the objects having same group label is, the higher the judgment score is.
- the target domain prior comprises information on the objects in the input images or relationship between objects in the input images.
- the discovering comprises: computing degrees of difference between objects that are inferred to have the same group label; and choosing pairs of object, having a degree of difference larger than a threshold, as the criterions.
- the discovering comprises: choosing pairs of object from the objects, which is inferred to have the same group label but should have different group labels according to the target domain prior as the criterions.
- the fine-tuning comprises: computing a fine-tuning score for each of candidate parameter adjustments according to the discovered criterions; determining the candidate parameter adjustment having highest fine-tuning score; and fine-tuning the deep model with the determined parameter adjustment, wherein the fine-tuning score indicates the similarity between the objects having a same group label, and the higher the similarity is, the higher the fine-tuning score is.
- a system for adapting a deep model for object representation from a source domain to a target domain comprising: a feature extraction unit configured to receive the deep model for the source domain and use the deep model to extract features for objects from input images for the target domain; an inference unit configured to infer group labels for objects according to the extracted features; a criterions discovery unit configured to discover criterions based on target domain priors derived from the input images and the inferred group labels, wherein the criterions contain information indicating which objects should not be inferred to have a same group label; and a training unit configured to fine-tune the deep model for the source domain according to the discovered criterions, wherein the fine-tuned deep model is outputted as the deep model for the target domain.
- a system for adapting a deep model for object representation from a source domain to a target domain comprising: a memory that stores executable components; and a processor electrically coupled to the memory to execute the executable components for: extracting, by a deep model for the source domain, features for objects from input images for the target domain; inferring group labels for objects according to the extracted features; discovering criterions based on target domain priors derived from the input images and the inferred group labels, wherein the criterions contain information indicating which objects should not be inferred to have a same group label; and fine-tuning the deep model for the source domain according to the discovered criterions, wherein the fine-tuned deep model is outputted as the deep model for the target domain.
- Fig. 1 shows the overall pipeline of the system for adapting a deep model for object representation from a source domain to a target domain according to some embodiments of the present application
- Fig. 2 shows the steps used for the inference unit according to some embodiments of the present application
- Fig. 3 shows the steps used for the criterions discovery unit according to some embodiments of the present application.
- Fig. 4 shows the steps used for the training unit according to some embodiments of the present application.
- Fig. 1 shows the overall pipeline of the system for adapting a deep model for object representation from a source domain to a target domain according to some embodiments of the present application.
- the deep model may be a deep convolution network (DCN) .
- the system for adapting a deep model for object representation from a source domain to a target domain 100 comprises a feature extraction unit 101, an inference unit 102, a criterions discovery unit 103 and a training unit 104.
- DCN deep convolution network
- the feature extraction unit 101 is configured to extract features for objects from input images for the target domain by a deep model for the source domain; the inference unit 102 is configured to infer group labels for objects according to the extracted features; the criterions discovery unit 103 is configured to discovery criterions based on derived target domain priors derived from the input images and the inferred group labels; and the training unit 104 is configured to fine-tune the deep model for the source domain according to the discovered criterions and outputting the fine-tuned deep model as the deep model for the target domain.
- the criterions may contain information indicating which objects should not be inferred to have a same group label.
- the group label may indicate the property, name, classification and the like of the objects. For example, if the system is used for face recognition in a movie, the group label may be the name of the role. If the system is used for object detection in the photo, the group label may be the classification of the object, such as “chair” , “table” and the like.
- the system 100 runs to carry out its functions in an iterative way.
- the units 101-104 may be implemented as an iterative feedback loop.
- the feature extraction unit 101 extracts the features from the input images.
- the inference unit 102 infers group labels for objects according to the extracted features based on the extracted features.
- the criterions discovery unit 103 discovers criterions from the inferred group labels.
- the training unit 104 fine-tunes the deep model according to the discovered criterions. Then the next iteration is performed. This iterative feedback loop ends when the desired performance is achieved or the predetermined running time is reached.
- the deep model is fine-tuned for several times and become more suitable for the target domain.
- the features for objects are extracted from input images for the target domain by the deep model for the source domain; in iterations following the starting iteration, the features for objects are extracted from input images by the deep model fine-tuned in the previous iteration of the iterative feedback loop.
- the deep model fine-tuned in the last iteration is outputted
- the feature extraction unit 101 may be configured with a deep convolutional network (DCN) that consists of successive convolutional filter banks. That is, the deep convolutional network is used as the deep model.
- the DCN may be initialized by training on a large source domain for image classification/recognition (e.g., large-scale image classification dataset IMAGENET, or large scale face dataset) , or received from other unit, or inputted by user.
- a large source domain for image classification/recognition e.g., large-scale image classification dataset IMAGENET, or large scale face dataset
- the pre-trained DCN may be a DCN used in DeepID2+.
- the input may be, for example, 55 ⁇ 47 RGB face image.
- the DCN has a plurality of, for example four, successive convolution layers followed by one fully connected layer.
- Each convolution layer contains learnable filters and is followed by a 2 ⁇ 2 max-pooling layer and Rectified Linear Unites (ReLUs) as the activation function. Then, in this embodiment, the number of feature map generated by each convolution layer will be 128, and the dimension of the face representation generated by the final fully connected layer will be 512.
- the DCN is pre-trained on CelebFace (as an example) , with around 290,000 faces images from 12,000 identities. The training process is conducted by back-propagation using both the identification and verification loss functions. It should be appreciated that other database with the different number of trained faces images may be applicable.
- Fig. 2 shows the steps used for the inference unit according to some embodiments of the present application.
- the extracted features are fed into the inference unit 102, then the inference unit 102 is operated to find an appropriate group label distribution for each objects in the input images according to the extracted features, i.e., infers the group label for each object according to the features thereof.
- the process of inference may be implemented by the following steps.
- a judgment score for each of candidate group label distributions for the objects is computed according to the features of the objects, wherein the higher the similarity between the features of the objects having same group label is, the higher the judgment score is, i.e., the judgment score presents the degree of appropriateness of the distribution thereof.
- the judgment scores of different distributions are compared with each other, then the candidate group label distribution having highest judgment score is determined.
- group labels for objects are inferred based on the determined distribution.
- the judgment score may be a value of a function that contains variables related to the features of the objects, the relation of the features or the like.
- the group label of each in X is denoted as that may be inferred by maximizing a function p (X, Y) :
- ⁇ ( ⁇ , ⁇ ) is a pre-computed function that encodes the relation between any pair of features and where positive relation (i.e. ⁇ ( ⁇ , ⁇ ) > 0) means that the features are likely from the same character. Otherwise, they belong to different characters.
- the computation of v is a combination of the similarity between appearances of a pair of features (i.e., the similarity between features of a pair of objects) ; and the pairwise spatial and temporal criterions of the features, which may be obtained from input images.
- the group label distribution making the Eqn. (1) having highest value may be considered as the most appropriate distribution, and may be determined as the resulting group label distribution, then group label for the objects can be inferred.
- Fig. 3 shows the steps used for the criterions discovery unit 103 according to some embodiments of the present application.
- the resulting group labels for objects as well as the input images are fed into the criterions discovery unit 103.
- the criterions discovery unit 103 the following steps are performed.
- the degrees of difference between objects that are inferred to have the same group label are computed.
- the object pairs having a degree of difference larger than a threshold are chosen as the criterions.
- the object pairs that are inferred with the same group label but should have different group labels according to the target domain prior are chosen as the criterions.
- These criterions will be used in the training unit 104 to fine-tune the DCN of the feature extraction unit 101.
- step S302 may be omitted; in some embodiments, step S303 may be omitted.
- the degrees of difference between objects that are inferred to have the same group label may be obtained by calculating distance between the features of each pair of objects in the feature space, for example, by calculating L2-distance between features of two objects. Then the top 20%or other percentage of object pairs with the largest degree of difference (for example, L2-distance) are chosen as the criterions, that is, the object pairs having a degree of difference larger than a threshold are chosen as the criterions. For example, in the scenario where the 20%object pairs with the largest degree of difference (for example, L2-distance) are chosen as the criterions, the threshold is the shortest L2-distance in the top 20%of all L2-distances.
- the large L2-distance means that two objects likely belong to different group label, so the inference of two objects having large L2-distance is likely error, the DCN used to extract features should be corrected, and the information on “these two objects belong to different labels” will be used as the criterion in the correction process. So, at step S302, the object pairs having a degree of difference larger than a threshold are chosen as the criterions.
- the whole similarity degree of all objects having same group label may be firstly calculated, for example, trace of the covariance matrix i.e. trace ( ⁇ l ) , wherein ⁇ l denotes the covariance matrix of the Gaussian of the l-th group label, the lower the whole similarity degree is, the larger the trace ( ⁇ l ) is. Then only the objects with group label whose trace ( ⁇ l ) is larger than a threshold are considered during calculating the degree of difference between objects that are inferred to have the same group label.
- the target domain prior comprises information on the objects in the input images or relationship between objects in the input images.
- the target domain prior can be the context extracted from the subtitle that helps to identify the character’s face.
- Other similar prior can be in a pairwise form: faces appearing in the same frame of a video/movie unlikely belong to the same person (negative pair) while any two faces in the same location between neighboring frames more likely belong to the same person (positive pair) .
- step S303 object pairs that are inferred to have the same group label but should have different group labels according to the target domain prior are chosen as the criterions.
- the criterions may contain the information on which pair of objects that is distributed to have same group label actually are not same object.
- Fig. 4 shows the steps used for the training unit 104 according to some embodiments of the present application.
- the original DCN or DCN used in the previous iteration is fine-tuned according to the discovered criterions.
- the parameters of DCN are adjusted in order to make the extracted features are more consistent with the criterions.
- a fine-tuning score for each of the candidate parameter adjustments is computed according to the discovered criterions; at step S402, the candidate parameter adjustment having highest fine-tuning score is determined as the resulting parameter adjustments of the deep model; and at step S403, the deep model is fine-tuned with the determined parameter adjustment, then the fine-tuned deep model for the target domain is outputted.
- the fine-tuning score may be inversely proportional to a value of a function that contains variables related to the features of the objects, the relation of features or the like.
- the function may be contrastive loss function that encourages features of the objects of the same group label to be close and that of the different group labels to be far away from each other.
- the formulation of the contrastive loss may be:
- the features extracted by DCN with different parameter adjustments are different, and the different E c are obtained, the more consistent with the criterions, the smaller the value of E c is. Through minimizing E c , the most appropriated parameter adjustment may be obtained, or the appropriated parameter adjustment make E c smallest is the most appropriated parameter adjustment.
- the candidate parameter adjustments may be included in a parameter adjustment set.
- the process of minimizing E c may an iterative process
- the candidate parameter adjustment may be obtained by modifying the parameter adjustment in the previous iteration
- the deep model may be fine-tuned with the determined parameter adjustment.
- the triplet loss or other loss functions may also be used, which learn an embedding in which the distances between the positive pairs are smaller than that of the negative pairs.
- the present application may be embodied as a system, a method or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment and hardware aspects that may all generally be referred to herein as a “unit” , “circuit, ” “module” or “system. ”
- ICs integrated circuits
- the present invention may take the form of an entirely software embodiment (including firmware, resident software, micro-code, etc. ) or an embodiment combining software.
- the system may comprise a memory that stores executable components and a processor, electrically coupled to the memory to execute the executable components to perform operations of the system, as discussed in reference to Figs. 1-4.
- the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer-usable program code embodied in the medium.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
A method for adapting a deep model for object representation from a source domain to a target domain, comprises: extracting, by the deep model for the source domain, features for objects from input images for the target domain; inferring group labels for objects according to the extracted features; discovering criterions based on target domain priors derived from the input images and the inferred group labels, wherein the criterions contain information indicating which objects should not be inferred to have a same group label; and fine-tuning the deep model for the source domain according to the discovered criterions, wherein the fine-tuned deep model is outputted as a deep model for the target domain. A system for adapting a deep model for object representation from a source domain to a target domain is also enclosed.
Description
The disclosures relate to a method and a system for adapting a deep model for object representation from a source domain to a target domain.
Deep learning approaches have achieved substantial advances for object (e.g., face, dogs, basketball) recognition. However, contemporary deep models, for example, deep convolution networks, usually overfit to the training data distributions, and thus will not be directly generalisable to other unseen target domain. In addition, the annotated data in the unseen target domain is usually not sufficient for training a new deep model. These problems limit the deep learning in the applications, such as object tracking, retrieval, and clustering in unseen images/videos. One example is face clustering in movies, i.e., grouping detected faces into different subsets according to different characters. Clustering faces in movies is extremely challenging since characters’ a ppearance may vary drastically under different scenes as the story progresses. In addition, the various cinematic styles in different movies make it difficult to learn a universal face representation for all movies. Conventional techniques that assume fixed handcrafted features for clustering is infeasible to this problem, however, handcrafted features are susceptible to large appearance, illumination, and viewpoint variations, and thus cannot cope with the drastic appearance changes in movies.
Deep learning approaches have achieved substantial advances for object representation learning. These methods could arguably provide a more robust representation to object recognition. However, contemporary deep models for object recognition are trained with web images or photos from albums. These models overfit
to the training data distributions and thus will not be directly generalisable to application in different target domain.
Therefore, it is desired to provide a method for adapting a deep model from the source domain to the target domain automatically.
Summary
The following presents a simplified summary of the disclosure in order to provide a basic understanding of some aspects of the disclosure. This summary is not an extensive overview of the disclosure. It is intended to neither identify key or critical elements of the disclosure nor delineate any scope of particular embodiments of the disclosure, or any scope of the claims. Its sole purpose is to present some concepts of the disclosure in a simplified form as a prelude to the more detailed description that is presented later.
In an aspect, disclosed is a method for adapting a deep model for object representation from a source domain to a target domain, comprising: extracting, by a deep model for the source domain, features for objects from input images for the target domain; inferring group labels for objects according to the extracted features; discovering criterions based on target domain priors derived from the input images and the inferred group labels, wherein the criterions contain information indicating which objects should not be inferred to have a same group label; and fine-tuning the deep model for the source domain according to the discovered criterions, wherein the fine-tuned deep model is outputted as a deep model for the target domain.
In one embodiment of the present application, the extracting, the inferring, the discovering, and the fine-tuning are implemented in an iterative feedback loop that is performed for predetermined times, wherein in starting iteration of the iterative feedback loop, the features for objects are extracted from input images for the target domain by the deep model for the source domain, in iterations following the starting
iteration, the features for objects are extracted from input images for the target domain by the fine-tuned deep model fine-tuned in a previous iteration of the iterative feedback loop.
In one embodiment of the present application, the inferring comprises: computing, according to the exacted features of the objects, a judgment score for each of candidate group label distributions for the objects; determining a candidate group label distribution having highest judgment score; and inferring, based on the determined distribution, group labels for objects, wherein the higher the similarity between the features of the objects having same group label is, the higher the judgment score is.
In one embodiment of the present application, the target domain prior comprises information on the objects in the input images or relationship between objects in the input images.
In one embodiment of the present application, the discovering comprises: computing degrees of difference between objects that are inferred to have the same group label; and choosing pairs of object, having a degree of difference larger than a threshold, as the criterions.
In one embodiment of the present application, the discovering comprises: choosing pairs of object from the objects, which is inferred to have the same group label but should have different group labels according to the target domain prior as the criterions.
In one embodiment of the present application, the fine-tuning comprises: computing a fine-tuning score for each of candidate parameter adjustments according to the discovered criterions; determining the candidate parameter adjustment having highest fine-tuning score; and fine-tuning the deep model with the determined parameter adjustment, wherein the fine-tuning score indicates the similarity between
the objects having a same group label, and the higher the similarity is, the higher the fine-tuning score is.
In an aspect, disclosed is a system for adapting a deep model for object representation from a source domain to a target domain, comprising: a feature extraction unit configured to receive the deep model for the source domain and use the deep model to extract features for objects from input images for the target domain; an inference unit configured to infer group labels for objects according to the extracted features; a criterions discovery unit configured to discover criterions based on target domain priors derived from the input images and the inferred group labels, wherein the criterions contain information indicating which objects should not be inferred to have a same group label; and a training unit configured to fine-tune the deep model for the source domain according to the discovered criterions, wherein the fine-tuned deep model is outputted as the deep model for the target domain.
In an aspect, disclosed is a system for adapting a deep model for object representation from a source domain to a target domain, comprising: a memory that stores executable components; and a processor electrically coupled to the memory to execute the executable components for: extracting, by a deep model for the source domain, features for objects from input images for the target domain; inferring group labels for objects according to the extracted features; discovering criterions based on target domain priors derived from the input images and the inferred group labels, wherein the criterions contain information indicating which objects should not be inferred to have a same group label; and fine-tuning the deep model for the source domain according to the discovered criterions, wherein the fine-tuned deep model is outputted as the deep model for the target domain.
Brief Description of the Drawing
Exemplary non-limiting embodiments of the present application are described below with reference to the attached drawings. The drawings are illustrative
and generally not to an exact scale. The same or similar elements on different figures are referenced with the same reference numbers.
Fig. 1 shows the overall pipeline of the system for adapting a deep model for object representation from a source domain to a target domain according to some embodiments of the present application;
Fig. 2 shows the steps used for the inference unit according to some embodiments of the present application;
Fig. 3 shows the steps used for the criterions discovery unit according to some embodiments of the present application; and
Fig. 4 shows the steps used for the training unit according to some embodiments of the present application.
Reference will now be made in detail to some specific embodiments of the invention including the best modes contemplated by the inventors for carrying out the invention. Examples of these specific embodiments are illustrated in the accompanying drawings. While the invention is described in conjunction with these specific embodiments, it will be appreciated by one skilled in the art that it is not intended to limit the invention to the described embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. The present application may be practiced without some or all of these specific details. In other instances, well-known process operations have not been described in detail in order not to unnecessarily obscure the present application.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a” , “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising” , when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Fig. 1 shows the overall pipeline of the system for adapting a deep model for object representation from a source domain to a target domain according to some embodiments of the present application. In some embodiments, the deep model may be a deep convolution network (DCN) . The system for adapting a deep model for object representation from a source domain to a target domain 100 comprises a feature extraction unit 101, an inference unit 102, a criterions discovery unit 103 and a training unit 104. The feature extraction unit 101 is configured to extract features for objects from input images for the target domain by a deep model for the source domain; the inference unit 102 is configured to infer group labels for objects according to the extracted features; the criterions discovery unit 103 is configured to discovery criterions based on derived target domain priors derived from the input images and the inferred group labels; and the training unit 104 is configured to fine-tune the deep model for the source domain according to the discovered criterions and outputting the fine-tuned deep model as the deep model for the target domain.
In some embodiments of the present application, the criterions may contain information indicating which objects should not be inferred to have a same group label. The group label may indicate the property, name, classification and the like of the objects. For example, if the system is used for face recognition in a movie, the group label may be the name of the role. If the system is used for object detection in the photo, the group label may be the classification of the object, such as “chair” ,
“table” and the like.
In some embodiments, the system 100 runs to carry out its functions in an iterative way. In other words, the units 101-104 may be implemented as an iterative feedback loop. Specifically, in each iteration, the feature extraction unit 101 extracts the features from the input images. After that, the inference unit 102 infers group labels for objects according to the extracted features based on the extracted features. Then the criterions discovery unit 103 discovers criterions from the inferred group labels. With the discovered criterions, the training unit 104 fine-tunes the deep model according to the discovered criterions. Then the next iteration is performed. This iterative feedback loop ends when the desired performance is achieved or the predetermined running time is reached. In this way, the deep model is fine-tuned for several times and become more suitable for the target domain. During the iterative feedback loop, in starting iteration of the iterative feedback loop, the features for objects are extracted from input images for the target domain by the deep model for the source domain; in iterations following the starting iteration, the features for objects are extracted from input images by the deep model fine-tuned in the previous iteration of the iterative feedback loop. In the end of iterative feedback loop, the deep model fine-tuned in the last iteration is outputted
In some embodiments, the feature extraction unit 101 may be configured with a deep convolutional network (DCN) that consists of successive convolutional filter banks. That is, the deep convolutional network is used as the deep model. The DCN may be initialized by training on a large source domain for image classification/recognition (e.g., large-scale image classification dataset IMAGENET, or large scale face dataset) , or received from other unit, or inputted by user. For example, when the system 100 is used in the face recognition, the pre-trained DCN may be a DCN used in DeepID2+. Specifically, the input may be, for example, 55×47 RGB face image. The DCN has a plurality of, for example four, successive convolution layers followed by one fully connected layer. Each convolution layer
contains learnable filters and is followed by a 2×2 max-pooling layer and Rectified Linear Unites (ReLUs) as the activation function. Then, in this embodiment, the number of feature map generated by each convolution layer will be 128, and the dimension of the face representation generated by the final fully connected layer will be 512. The DCN is pre-trained on CelebFace (as an example) , with around 290,000 faces images from 12,000 identities. The training process is conducted by back-propagation using both the identification and verification loss functions. It should be appreciated that other database with the different number of trained faces images may be applicable.
Fig. 2 shows the steps used for the inference unit according to some embodiments of the present application. In the embodiments, the extracted features are fed into the inference unit 102, then the inference unit 102 is operated to find an appropriate group label distribution for each objects in the input images according to the extracted features, i.e., infers the group label for each object according to the features thereof. The process of inference may be implemented by the following steps.
At step S201, a judgment score for each of candidate group label distributions for the objects is computed according to the features of the objects, wherein the higher the similarity between the features of the objects having same group label is, the higher the judgment score is, i.e., the judgment score presents the degree of appropriateness of the distribution thereof. At step S202, the judgment scores of different distributions are compared with each other, then the candidate group label distribution having highest judgment score is determined. At S203, group labels for objects are inferred based on the determined distribution.
In a specific example, the judgment score may be a value of a function that contains variables related to the features of the objects, the relation of the features or the like. For extracted featureswhereindenotes feature of the j-th object of the i-th cluster and the cluster may be predetermined, the group label of each
in X is denoted asthat may be inferred by maximizing a function p (X, Y) :
wheresignifies a set of input images, which are the neighbors ofin the space of feature. represents the probability of distributing l and l′ group labels toandrespectively, i.e., the probability of distributing l and l′ group labels to the objects to whichandcorrespond. And Gaussian distribution may employed to model the first term in Eqn. (1)
where μl and Σl denote the mean and covariance matrix of the Gaussian of the l-th character, which are obtained and updated in the learning process. For the second term in Eqn. (1) , it is defined as
wherein 1 (·) is the indicator function and α is a trade-off coefficient between Eqn. (2) and (3) . Furthermore, υ (·,·) is a pre-computed function that encodes the relation between any pair of featuresandwhere positive relation (i.e. υ (·,·) > 0) means that the features are likely from the same character. Otherwise, they belong to different characters. Specifically, the computation of v is a combination of the similarity between appearances of a pair of features (i.e., the similarity between features of a pair of objects) ; and the pairwise spatial and temporal criterions of the features, which may be obtained from input images. For instance, when the system is used in face representation learning and clustering in movies, face images in two successive frames belong to the same character, while face images appearing in the same frame belong to different characters. In general, Eqn. (3) encourages face
images with positive relation to be the same character. For example, if
andl=l′,thenHowever, ifbut l≠l′, thenindicating the group label distribution is violating the pairwise criterions. The group label distributionmaking the Eqn. (1) having highest value may be considered as the most appropriate distribution, and may be determined as the resulting group label distribution, then group label for the objects can be inferred.
Fig. 3 shows the steps used for the criterions discovery unit 103 according to some embodiments of the present application. After inferring the group labels, the resulting group labels for objects as well as the input images are fed into the criterions discovery unit 103. In the criterions discovery unit 103, the following steps are performed. At step S301, the degrees of difference between objects that are inferred to have the same group label are computed. At step S302, the object pairs having a degree of difference larger than a threshold are chosen as the criterions. And at step S303, the object pairs that are inferred with the same group label but should have different group labels according to the target domain prior are chosen as the criterions. These criterions will be used in the training unit 104 to fine-tune the DCN of the feature extraction unit 101. In some embodiments, step S302 may be omitted; in some embodiments, step S303 may be omitted.
In some embodiments, the degrees of difference between objects that are inferred to have the same group label may be obtained by calculating distance between the features of each pair of objects in the feature space, for example, by calculating L2-distance between features of two objects. Then the top 20%or other percentage of object pairs with the largest degree of difference (for example, L2-distance) are chosen as the criterions, that is, the object pairs having a degree of difference larger than a threshold are chosen as the criterions. For example, in the scenario where the 20%object pairs with the largest degree of difference (for example,
L2-distance) are chosen as the criterions, the threshold is the shortest L2-distance in the top 20%of all L2-distances. The large L2-distance means that two objects likely belong to different group label, so the inference of two objects having large L2-distance is likely error, the DCN used to extract features should be corrected, and the information on “these two objects belong to different labels” will be used as the criterion in the correction process. So, at step S302, the object pairs having a degree of difference larger than a threshold are chosen as the criterions.
In some embodiments, before calculating the degrees of difference between objects that are inferred to have the same group label, the whole similarity degree of all objects having same group label may be firstly calculated, for example, trace of the covariance matrix i.e. trace (Σl) , wherein Σl denotes the covariance matrix of the Gaussian of the l-th group label, the lower the whole similarity degree is, the larger the trace (Σl) is. Then only the objects with group label whose trace (Σl) is larger than a threshold are considered during calculating the degree of difference between objects that are inferred to have the same group label.
In some embodiments, the target domain prior comprises information on the objects in the input images or relationship between objects in the input images. For example, when the system is used in the face tracking or clustering in a movie, the target domain prior can be the context extracted from the subtitle that helps to identify the character’s face. Other similar prior can be in a pairwise form: faces appearing in the same frame of a video/movie unlikely belong to the same person (negative pair) while any two faces in the same location between neighboring frames more likely belong to the same person (positive pair) . If a pair of objects are inferred to have same group label, but it can known from the target domain prior that these two objects should not have same group label, the label inference of these two objects is likely error, the DCN used to extract features should be corrected, and the information on “these two objects belong to different labels” will be used as criterion in the correction process. So at step S303, object pairs that are inferred to have the same
group label but should have different group labels according to the target domain prior are chosen as the criterions.
In some embodiments, the criterions may contain the information on which pair of objects that is distributed to have same group label actually are not same object.
Fig. 4 shows the steps used for the training unit 104 according to some embodiments of the present application. In the training unit, the original DCN or DCN used in the previous iteration is fine-tuned according to the discovered criterions. The parameters of DCN are adjusted in order to make the extracted features are more consistent with the criterions. At step S401, a fine-tuning score for each of the candidate parameter adjustments is computed according to the discovered criterions; at step S402, the candidate parameter adjustment having highest fine-tuning score is determined as the resulting parameter adjustments of the deep model; and at step S403, the deep model is fine-tuned with the determined parameter adjustment, then the fine-tuned deep model for the target domain is outputted.
In some embodiments, the fine-tuning score may be inversely proportional to a value of a function that contains variables related to the features of the objects, the relation of features or the like. For example, for criterions obtained from the criterions discovery unit 103, the function may be contrastive loss function that encourages features of the objects of the same group label to be close and that of the different group labels to be far away from each other. The formulation of the contrastive loss may be:
where Ec is the loss, Ii, Ij is the objects i and j. x denotes the according feature. τ is
the margin between different identities. C (Ii, Ij) =1 means that object Ii, Ij are of the same group label, while C (Ii, Ij) =-1 means object Ii, Ij are of different group labels. When the system is used in the face recognition, Ii, Ij may be the face images i and j. x may denote the according feature. τ may be the margin between different identities. C (Ii, Ij) =1 may mean that face images Ii, Ij are of the same person, while C (Ii,Ij) =-1 may mean face images Ii, Ij are of different persons. The features extracted by DCN with different parameter adjustments are different, and the different Ec are obtained, the more consistent with the criterions, the smaller the value of Ec is. Through minimizing Ec, the most appropriated parameter adjustment may be obtained, or the appropriated parameter adjustment make Ec smallest is the most appropriated parameter adjustment. In some embodiments, the candidate parameter adjustments may be included in a parameter adjustment set. In some embodiments, the process of minimizing Ec may an iterative process, the candidate parameter adjustment may be obtained by modifying the parameter adjustment in the previous iteration, the process ends when the value of Ec converges. After minimizing Ec, the deep model may be fine-tuned with the determined parameter adjustment.
In some embodiments, the triplet loss or other loss functions may also be used, which learn an embedding in which the distances between the positive pairs are smaller than that of the negative pairs.
As will be appreciated by one skilled in the art, the present application may be embodied as a system, a method or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment and hardware aspects that may all generally be referred to herein as a “unit” , “circuit, ” “module” or “system. ” Much of the inventive functionality and many of the inventive principles when implemented, are best supported with or integrated circuits (ICs) , such as a digital signal processor and software therefore or application specific ICs. It
is expected that one of ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein will be readily capable of generating ICs with minimal experimentation. Therefore, in the interest of brevity and minimization of any risk of obscuring the principles and concepts according to the present application, further discussion of such software and ICs, if any, will be limited to the essentials with respect to the principles and concepts used by the preferred embodiments. In addition, the present invention may take the form of an entirely software embodiment (including firmware, resident software, micro-code, etc. ) or an embodiment combining software. For example, the system may comprise a memory that stores executable components and a processor, electrically coupled to the memory to execute the executable components to perform operations of the system, as discussed in reference to Figs. 1-4. Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer-usable program code embodied in the medium.
Although the preferred examples of the present application have been described, those skilled in the art can make variations or modifications to these examples upon knowing the basic inventive concept. The appended claims are intended to be considered as comprising the preferred examples and all the variations or modifications fell into the scope of the present application.
Obviously, those skilled in the art can make variations or modifications to the present application without departing the spirit and scope of the present application. As such, if these variations or modifications belong to the scope of the claims and equivalent technique, they may also fall into the scope of the present application.
Claims (21)
- A method for adapting a deep model for object representation from a source domain to a target domain, comprising:extracting, by the deep model for the source domain, features for objects from input images for the target domain;inferring group labels for objects according to the extracted features;discovering criterions based on target domain priors derived from the input images and the inferred group labels, wherein the criterions contain information indicating which objects should not be inferred to have a same group label; andfine-tuning the deep model for the source domain according to the discovered criterions, wherein the fine-tuned deep model is outputted as a deep model for the target domain.
- The method of claim 1, wherein the extracting, the inferring, the discovering, and the fine-tuning are implemented in an iterative feedback loop that is performed for predetermined times, whereinin starting iteration of the iterative feedback loop, the features for objects are extracted from input images for the target domain by the deep model for the source domain,in iterations following the starting iteration, the features for objects are extracted from input images for the target domain by the fine-tuned deep model fine-tuned in a previous iteration of the iterative feedback loop.
- The method of claim 1 or 2, wherein the inferring comprises:computing, according to the exacted features of the objects, a judgment score for each of candidate group label distributions for the objects;determining a candidate group label distribution having highest judgment score; andinferring, based on the determined distribution, group labels for objects,wherein the higher the similarity between the features of the objects having same group label is, the higher the judgment score is.
- The method of claim 1 or 2, wherein the target domain prior comprises information on the objects in the input images or relationship between objects in the input images.
- The method of claim 1 or 2, wherein the discovering comprises:computing degrees of difference between objects that are inferred to have the same group label; andchoosing pairs of object, having a degree of difference larger than a threshold, as the criterions.
- The method of claim 1 or 2, wherein the discovering comprises:choosing pairs of object from the objects, which are inferred to have the same group label but should have different group labels according to the target domain prior as the criterions.
- The method of claim 6, wherein the fine-tuning comprises:computing a fine-tuning score for each of candidate parameter adjustments according to the discovered criterions;determining the candidate parameter adjustment having highest fine-tuning score; andfine-tuning the deep model with the determined parameter adjustment,wherein the fine-tuning score indicates the similarity between the objects having a same group label, and the higher the similarity is, the higher the fine-tuning score is.
- A system for adapting a deep model for object representation from a source domain to a target domain, comprising:a feature extraction unit configured to receive the deep model for the source domain and use the deep model to extract features for objects from input images for the target domain;an inference unit configured to infer group labels for objects according to the extracted features;a criterions discovery unit configured to discover criterions based on target domain priors derived from the input images and the inferred group labels, wherein the criterions contain information indicating which objects should not be inferred to have a same group label; anda training unit configured to fine-tune the deep model for the source domain according to the discovered criterions, wherein the fine-tuned deep model is outputted as a deep model for the target domain.
- The system of claim 8, wherein the feature extraction unit, the inference unit, the criterions discovery unit, and the training unit are implemented in an iterative feedback loop that is performed for predetermined times, whereinin starting iteration of the iterative feedback loop, the features for objects are extracted from input images for the target domain by the deep model for the source domain,in iterations following the starting iteration, the features for objects are extracted from input images for the target domain by the fine-tuned deep model fine-tuned in a previous iteration of the iterative feedback loop.
- The system of claim 8 or 9, wherein the inference unit is configured for:computing, according to the extracted features of the objects, a judgment score for each of candidate group label distributions for the objects;determining a candidate group label distribution having highest judgment score; andinferring, based on the determined distribution, group labels for objects,wherein the higher the similarity between the features of the objects having same group label is, the higher the judgment score is.
- The system of claim 8 or 9, wherein the target domain prior comprises information on the objects in the input images or relationship between objects in the input images.
- The system of claim 8 or 9, wherein the criterions discovery unit is configured for:computing degrees of difference between objects that are inferred to have the same group label; andchoosing pairs of object, having a degree of difference larger than a threshold, as the criterions.
- The system of claim 8 or 9, wherein the criterions discovery unit is configured for:choosing pairs of object from the objects, which are inferred to have the same group label but should have different group labels according to the target domain prior as the criterions.
- The system of claim 13, wherein the training unit is configured for:computing a fine-tuning score for each of candidate parameter adjustments according to the discovered criterions;determining the candidate parameter adjustment having highest fine-tuning score; andfine-tuning the deep model with the determined parameter adjustment,wherein the fine-tuning score indicates the similarity between the objects having a same group label, and the higher the similarity is, the higher the fine-tuning score is.
- A system for object representation, comprising:a memory that stores executable components; anda processor electrically coupled to the memory to execute the executable components for:extracting, by the deep model for the source domain, features for objects from input images for the target domain;inferring group labels for objects according to the extracted features;discovering criterions based on target domain priors derived from the input images and the inferred group labels, wherein the criterions contain information indicating which objects should not be inferred to have a same group label; andfine-tuning the deep model for the source domain according to the discovered criterions, wherein the fine-tuned deep model is outputted as a deep model for the target domain.
- The system of claim 15, wherein the extracting, the inferring, the determining, and the fine-tuning are implemented in an iterative feedback loop that is performed for predetermined times, whereinin starting iteration of the iterative feedback loop, the features for objects are extracted from input images for the target domain by the deep model for the source domain,in iterations following the starting iteration, the features for objects are extracted from input images for the target domain by the fine-tuned deep model fine-tuned in a previous iteration of the iterative feedback loop.
- The system of claim 15 or 16, wherein the inferring comprises:computing, according to the exacted features of the objects, a judgment score for each of candidate group label distributions for the objects;determining a candidate group label distribution having highest judgment score; andinferring, based on the determined distribution, group labels for objects,wherein the higher the similarity between the features of the objects having same group label is, the higher the judgment score is.
- The system of claim 15 or 16, wherein the target domain prior comprises information on the objects in the input images or relationship between objects in the input images.
- The system of claim 15 or 16, wherein the discovering comprises:computing degrees of difference between objects that are inferred to have the same group label; andchoosing pairs of object, having a degree of difference larger than a threshold, as the criterions.
- The system of claim 15 or 16, wherein the discovering comprises:choosing pairs of object from the objects, which are inferred to have the same group label but should have different group labels according to the target domain prior as the criterions.
- The system of claim 20, wherein the fine-tuning comprises:computing a fine-tuning score for each of candidate parameter adjustments according to the discovered criterions;determining the candidate parameter adjustment having highest fine-tuning score; andfine-tuning the deep model with the determined parameter adjustment,wherein the fine-tuning score indicates the similarity between the objects having a same group label, and the higher the similarity is, the higher the fine-tuning score is.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201680079452.0A CN108604304A (en) | 2016-01-20 | 2016-01-20 | For adapting the depth model indicated for object from source domain to the method and system of aiming field |
PCT/CN2016/071501 WO2017124336A1 (en) | 2016-01-20 | 2016-01-20 | Method and system for adapting deep model for object representation from source domain to target domain |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2016/071501 WO2017124336A1 (en) | 2016-01-20 | 2016-01-20 | Method and system for adapting deep model for object representation from source domain to target domain |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017124336A1 true WO2017124336A1 (en) | 2017-07-27 |
Family
ID=59361172
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/071501 WO2017124336A1 (en) | 2016-01-20 | 2016-01-20 | Method and system for adapting deep model for object representation from source domain to target domain |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108604304A (en) |
WO (1) | WO2017124336A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113011568A (en) * | 2021-03-31 | 2021-06-22 | 华为技术有限公司 | Model training method, data processing method and equipment |
CN113159199A (en) * | 2021-04-27 | 2021-07-23 | 广东工业大学 | Cross-domain image classification method based on structural feature enhancement and class center matching |
US11155809B2 (en) | 2014-06-24 | 2021-10-26 | Bio-Rad Laboratories, Inc. | Digital PCR barcoding |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112669247A (en) * | 2020-12-09 | 2021-04-16 | 深圳先进技术研究院 | Priori guidance type network for multitask medical image synthesis |
CN113255823B (en) * | 2021-06-15 | 2021-11-05 | 中国人民解放军国防科技大学 | Unsupervised domain adaptation method and unsupervised domain adaptation device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102902966A (en) * | 2012-10-12 | 2013-01-30 | 大连理工大学 | Super-resolution face recognition method based on deep belief networks |
CN103793718A (en) * | 2013-12-11 | 2014-05-14 | 台州学院 | Deep study-based facial expression recognition method |
CN104318215A (en) * | 2014-10-27 | 2015-01-28 | 中国科学院自动化研究所 | Cross view angle face recognition method based on domain robustness convolution feature learning |
CN104616033A (en) * | 2015-02-13 | 2015-05-13 | 重庆大学 | Fault diagnosis method for rolling bearing based on deep learning and SVM (Support Vector Machine) |
CN105160866A (en) * | 2015-08-07 | 2015-12-16 | 浙江高速信息工程技术有限公司 | Traffic flow prediction method based on deep learning nerve network structure |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101582813B (en) * | 2009-06-26 | 2011-07-20 | 西安电子科技大学 | Distributed migration network learning-based intrusion detection system and method thereof |
CN101840569B (en) * | 2010-03-19 | 2011-12-07 | 西安电子科技大学 | Projection pursuit hyperspectral image segmentation method based on transfer learning |
US9231851B2 (en) * | 2011-01-31 | 2016-01-05 | Futurewei Technologies, Inc. | System and method for computing point-to-point label switched path crossing multiple domains |
US9681250B2 (en) * | 2013-05-24 | 2017-06-13 | University Of Maryland, College Park | Statistical modelling, interpolation, measurement and anthropometry based prediction of head-related transfer functions |
CN104199023B (en) * | 2014-09-15 | 2017-02-08 | 南京大学 | RFID indoor positioning system based on depth perception and operating method thereof |
-
2016
- 2016-01-20 CN CN201680079452.0A patent/CN108604304A/en active Pending
- 2016-01-20 WO PCT/CN2016/071501 patent/WO2017124336A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102902966A (en) * | 2012-10-12 | 2013-01-30 | 大连理工大学 | Super-resolution face recognition method based on deep belief networks |
CN103793718A (en) * | 2013-12-11 | 2014-05-14 | 台州学院 | Deep study-based facial expression recognition method |
CN104318215A (en) * | 2014-10-27 | 2015-01-28 | 中国科学院自动化研究所 | Cross view angle face recognition method based on domain robustness convolution feature learning |
CN104616033A (en) * | 2015-02-13 | 2015-05-13 | 重庆大学 | Fault diagnosis method for rolling bearing based on deep learning and SVM (Support Vector Machine) |
CN105160866A (en) * | 2015-08-07 | 2015-12-16 | 浙江高速信息工程技术有限公司 | Traffic flow prediction method based on deep learning nerve network structure |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11155809B2 (en) | 2014-06-24 | 2021-10-26 | Bio-Rad Laboratories, Inc. | Digital PCR barcoding |
CN113011568A (en) * | 2021-03-31 | 2021-06-22 | 华为技术有限公司 | Model training method, data processing method and equipment |
CN113159199A (en) * | 2021-04-27 | 2021-07-23 | 广东工业大学 | Cross-domain image classification method based on structural feature enhancement and class center matching |
CN113159199B (en) * | 2021-04-27 | 2022-12-27 | 广东工业大学 | Cross-domain image classification method based on structural feature enhancement and class center matching |
Also Published As
Publication number | Publication date |
---|---|
CN108604304A (en) | 2018-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chaudhuri et al. | Multilabel remote sensing image retrieval using a semisupervised graph-theoretic method | |
US10902243B2 (en) | Vision based target tracking that distinguishes facial feature targets | |
US9449432B2 (en) | System and method for identifying faces in unconstrained media | |
CN108140032B (en) | Apparatus and method for automatic video summarization | |
WO2017124336A1 (en) | Method and system for adapting deep model for object representation from source domain to target domain | |
US9940577B2 (en) | Finding semantic parts in images | |
CN108288051B (en) | Pedestrian re-recognition model training method and device, electronic equipment and storage medium | |
CN108268823B (en) | Target re-identification method and device | |
CN108664526B (en) | Retrieval method and device | |
CN105100894A (en) | Automatic face annotation method and system | |
US9875397B2 (en) | Method of extracting feature of input image based on example pyramid, and facial recognition apparatus | |
CN109460774B (en) | Bird identification method based on improved convolutional neural network | |
Kim et al. | Deep stereo confidence prediction for depth estimation | |
WO2019007253A1 (en) | Image recognition method, apparatus and device, and readable medium | |
US10007678B2 (en) | Image processing apparatus, image processing method, and recording medium | |
Wang et al. | Scene text detection and tracking in video with background cues | |
CN110765882B (en) | Video tag determination method, device, server and storage medium | |
Miclea et al. | Real-time semantic segmentation-based stereo reconstruction | |
Wang et al. | Aspect-ratio-preserving multi-patch image aesthetics score prediction | |
CN109635647B (en) | Multi-picture multi-face clustering method based on constraint condition | |
CN113705596A (en) | Image recognition method and device, computer equipment and storage medium | |
Gallagher et al. | Using context to recognize people in consumer images | |
Lee et al. | Property-specific aesthetic assessment with unsupervised aesthetic property discovery | |
CN110472591A (en) | It is a kind of that pedestrian's recognition methods again is blocked based on depth characteristic reconstruct | |
CN114519863A (en) | Human body weight recognition method, human body weight recognition apparatus, computer device, and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16885609 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16885609 Country of ref document: EP Kind code of ref document: A1 |