WO2018161217A1 - Procédé et système d'apprentissage sans exemple à marge maximale transductive et/ou adaptative - Google Patents

Procédé et système d'apprentissage sans exemple à marge maximale transductive et/ou adaptative Download PDF

Info

Publication number
WO2018161217A1
WO2018161217A1 PCT/CN2017/075764 CN2017075764W WO2018161217A1 WO 2018161217 A1 WO2018161217 A1 WO 2018161217A1 CN 2017075764 W CN2017075764 W CN 2017075764W WO 2018161217 A1 WO2018161217 A1 WO 2018161217A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
unseen
configuring
embedding
image
Prior art date
Application number
PCT/CN2017/075764
Other languages
English (en)
Inventor
Yunlong YU
Original Assignee
Nokia Technologies Oy
Nokia Technologies (Beijing) Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy, Nokia Technologies (Beijing) Co., Ltd. filed Critical Nokia Technologies Oy
Priority to PCT/CN2017/075764 priority Critical patent/WO2018161217A1/fr
Priority to EP17899880.3A priority patent/EP3593284A4/fr
Priority to CN201780088157.6A priority patent/CN110431565B/zh
Publication of WO2018161217A1 publication Critical patent/WO2018161217A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • G06F18/21355Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis nonlinear criteria, e.g. embedding a manifold in a Euclidean space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle

Definitions

  • the present disclosure is related to a zero-shot learning, and more particularly, to an adaptive max margin and/or transductive zero-shot learning method and system.
  • a pattern recognition system can involve training and testing operations, which share the same categories.
  • Conventional pattern recognition approaches can require some categories to be predefined, and then a model to be trained for recognizing particular objects of interest in these categories.
  • ZSL Zero-shot learning
  • a general ZSL approach can apply one transformation matrix to embed visual features into the semantic space, or two transformation matrix to embed both the visual features (e.g., color, texture, motion, CNN features) and side information features (e.g., attribute, word vector features) into a latent space.
  • the connection between the visual and side information features is bridged, and the category of the unseen samples can be inferred by using the nearest neighborhood method.
  • this approach may not adequately discover the intrinsic mapping relations between the visual instances and their corresponding semantic labels, which can lead to unsatisfactory performance.
  • the knowledge transferred from the seen data to the unseen data may lead to the domain shift problem, which refers to the situation in which the embedding function learned from the seen data will be biased when directly applied to the disjoined unseen data.
  • the domain shift problem decreases the generalization ability of ZSL on the unseen data.
  • an apparatus and method is provided to implement object detection with a model, which is refined or updated using transductive-based zero-shot learning (ZSL) .
  • the apparatus and method are configured to provide at least one embedding matrix of the ZSL model, to provide unseen data for one or more unseen instances, to update the embedding matrix according to selected one or more unseen instances (e.g., a set of unseen instances) that have higher predicted confidence when the embedding matrix is applied, and to detect an object of interest in an unseen category from a region of an image using the updated embedded matrix.
  • the unseen category can comprise of an unseen object class or category for which one or more attributes are identified in the unseen data.
  • the selected ones of the unseen instances are, for example, a set of unseen instances with higher similarity scores.
  • the initial embedding matrix for a ZSL model can be learned using seen data.
  • an apparatus and method are configured to develop a model with an embedding matrix using an adaptive max margin ZSL.
  • the apparatus and method are configured to provide a compatible matrix based on seen data, to calculate an adaptive margin based on a predicted class label vector and a calculated predicted class label vector, to calculate a loss based on the adaptive margin, to update the compatible matrix according to the calculated loss, and to output the embedding matrix based on the compatible matrix and a sematic label vector matrix.
  • the adaptive max margin ZSL model and its parameters, including the embedding matrix (or matrices) can be further refined or updated using the transductive-based ZSL.
  • the apparatus and method can be further configured to receive the image which is captured from an image sensing device, and/or to initiate an alarm if the object is detected. Furthermore, the zero-shot learning model can be applied to each region of the image to detect whether the object is present in any of the regions of the image. The apparatus and method can also be configured to stitch or blend images including the image together based on the detected object of interest to produce a panoramic image (e.g., image or video) or panorama.
  • a panoramic image e.g., image or video
  • Fig. 1 illustrates a block diagram of an example system for detecting a presence or absence of an object of interest with a model which is developed or refined using adaptive max min zero-shot learning (ZSL) or transductive-based ZSL in accordance with an embodiment of the present disclosure.
  • ZSL adaptive max min zero-shot learning
  • Fig. 2 illustrates a block diagram of an example system for detecting a presence or absence of an object or interest with a model which is developed or refined using ZSL in accordance with another embodiment of the present disclosure.
  • Fig. 3 is a flow diagram showing an example process by which a system, such as for example in Fig. 1 or 2, is configured to implement training and/or testing stages using a ZSL model, in accordance with an embodiment of the present disclosure.
  • Fig. 4 is a flow diagram showing an example process by which a system, such as for example in Fig. 1 or 2, is configured to implement a training stage for training a ZSL model using transductive learning approach, in accordance with an embodiment of the present disclosure.
  • Figs. 5 and 6 illustrate Tables 1 and 2 containing experimental results of the transductive ZSL of the present disclosure versus ZSL with a canonical correlation analysis (CCA) , using available image (e.g., image and video) datasets.
  • CCA canonical correlation analysis
  • Fig. 7 is a flow diagram showing an example process by which a system, such as for example in Fig. 1 or 2, is configured to implement a training stage for training a ZSL model using adaptive margin learning approach, in accordance with a further embodiment of the present disclosure.
  • Figs. 8 and 9 illustrate Tables 3 and 4 containing experimental results for the adaptive max margin ZSL of the present disclosure and ZSL with a canonical correlation analysis (CCA) , using available datasets.
  • CCA canonical correlation analysis
  • Fig. 10 is a flow diagram showing an example detection process by which a system, such as for example in Fig. 1 or 2, is configured to detect a presence (or absence) of a feature, such as an object, using a ZSL model, in accordance with an example embodiment of the present disclosure.
  • Fig. 11 is a flow diagram showing an example process by which a system, such as for example in Fig. 1 or 2, is configured to process video and/or images, such as video or image stitching and other video or image related operations, from multiple video or image streams of separate lenses (e.g. in multi-camera or -sensor system) .
  • a system such as for example in Fig. 1 or 2
  • process video and/or images such as video or image stitching and other video or image related operations, from multiple video or image streams of separate lenses (e.g. in multi-camera or -sensor system) .
  • an apparatus and method which employ zero-shot learning (ZSL) model to analyze an image or region thereof, and to detect a presence (or absence) of an object (s) of interest.
  • ZSL zero-shot learning
  • the ZSL model and its parameters, including one or more embedding matrices, are learned (or trained) or refined using an adaptive max margin and/or transductive approach.
  • transductive as used herein, relates to the use of unseen data to learn or refine the ZSL model.
  • an apparatus and method are provided, which can be used to further refine or update one or more embedding matrices of a ZSL model.
  • the apparatus and method implement a transductive-based ZSL (also referred as “transductive ZSL” ) , which can alleviate, for example, the domain shift problem in ZSL.
  • a transductive-based ZSL also referred as “transductive ZSL”
  • an initial embedding matrix is provided, and can, for example, be learned using seen data. Thereafter, the unseen data is used to refine or update the initial embedding matrix to improve the embedding matrix based on the unseen data. It should be understood that, in this training operation, the corresponding labels of the unseen instances are still unknown.
  • “pseudo” labels are predicted for the unseen instances using the unseen data, and their prediction confidences (e.g., confidence level, value, etc. ) are evaluated to refine or update the embedding matrix of the ZSL model. For example, a set of unseen instances in the unseen data with higher (e.g., higher, highest, most, etc. ) prediction confidence is selected and used to refine the embedding matrix.
  • This transductive approach of using unseen data is a process, which is repeated to gradually or iteratively update and refine the embedding matrix, thereby guiding the embedding model to transfer to the unseen domain gradually.
  • the apparatus and method can, for example, alleviate the domain shift problem in ZSL, and can be used to refine or update an embedding matrix (or matrices) of existing ZSL models or even the other ZSL models described herein.
  • an apparatus and method are provided to implement a ZSL model with an adaptive max margin based multiclass model.
  • This learning approach is also referred to herein as adaptive max margin ZSL (AMM-ZSL) .
  • AMM-ZSL adaptive max margin ZSL
  • the ZSL model utilizes an adaptive max margin which is determined by the calculated label vector and the predicted label vector of each of the unseen instances during training.
  • the unseen instances with higher classification confidence are given a small or smaller penalty (e.g., penalty, punishment, etc. )
  • those unseen instances with lower classification confidence are given a large, larger or severe penalty.
  • the embedding matrix is thereafter updated accordingly.
  • the apparatus and method of the present disclosure can be employed in object recognition systems, such as for example a video surveillance system, autonomous or semi-autonomous vehicle, or other systems that employ a camera or other sensor.
  • the camera can capture several multi-view images of the same scenario such as 360-degree images.
  • the task of the video surveillance is to detect one or more objects of interest (e.g., pedestrians, animals, or other objects) from the multi-view images, and then provide an alert or notification (e.g., an alarm or warning) to the user.
  • an alert or notification e.g., an alarm or warning
  • a video surveillance system can potentially detect all objects of interest appearing in a scenario or enviromnent.
  • each camera can be configured to perform object detection.
  • the operations of the video surveillance system using ZSL can involve the following.
  • Each camera of the system captures an image.
  • the ZSL model can, for example, be employed to classify the region as an object of interest ifthe visual feature corresponds to a category and label of an object of interest, and to classify the region as background (e.g., non-object) if the response of the ZSL does not correspond to a category and label of an object of interest.
  • the object detection process can involve a training stage to learn and/or refine a ZSL model, and a testing stage for object detection using the ZSL model.
  • the goal of the training stage is to design or configure the structure of the ZSL model, such as with an adaptive margin (e.g., an adaptive max margin, etc. ) and/or transductively using unseen data, and to learn the parameters including the embedding matrix (or matrices) of the ZSL model.
  • an adaptive margin e.g., an adaptive max margin, etc.
  • the ZSL model is trained and/or refined or updated to detect a presence (or absence) of a particular object (s) including unseen categories of objects of interests (e.g., particular objects or actions) .
  • the trained ZSL model including one or more embedding matrices are applied to an image to be tested (e.g., input image or testing image) to classify, and thus, detect a presence (or absence) of the particular object (s) .
  • the goal of the testing stage is to classify each region of the image by taking the region, in particular any extracted visual feature, as the input of the trained ZSL model. The region is classified, for example, as either an object of interest or background.
  • the system can take action including generating an alert or notification (e.g., an alert signal in the form of voice or message) which can be immediately sent to the user via a network connection (e.g., the Intemet) or other media, or implementing control operations including, for example, autonomous or semi-autonomous vehicle control, or implement other actions depending on the application.
  • an alert or notification e.g., an alert signal in the form of voice or message
  • a network connection e.g., the Intemet
  • control operations including, for example, autonomous or semi-autonomous vehicle control, or implement other actions depending on the application.
  • These operations implemented in the process of object detection can be performed in each camera or camera-subsystem of the surveillance system.
  • An alert can be generated once one of the cameras in the system detects an object of interest.
  • the object detection processes may be implemented in or with each camera or each camera subsystem. Examples of a model using ZSL, and an objection detection system are described in further detail below with reference to the figures.
  • Fig. 1 illustrates a block diagram of example components or means of an example system 100 for detecting a presence (or absence) of an object of interest using a zero-shot learning (ZSL) model.
  • the system 100 includes one or more processor (s) 110, one or more sensors 120, a user interface (s) 130, a memory 140, a communication interface (s) 150, a power supply 160 and output device (s) 170.
  • the power supply 160 can include a battery power unit, which can be rechargeable, or a unit that provides connection to an external power source.
  • the sensors 120 are configured to sense or monitor activities, e.g., an object (s) , in a geographical area or an environment, such as around a vehicle, around or inside a building, and so forth.
  • the sensors 120 can include one or more image sensing device (s) or sensor (s) .
  • the sensor 120 can for example be a camera with one or more lenses (e.g., a camera, a web camera, a camera system to capture panoramic or 360 degree images, a camera with a wide lens or multiple lenses, etc. ) .
  • the image sensing device is configured to capture images or image data, which can be analyzed using the ZSL model to detect a presence (or absence) of an object of interest.
  • the captured images or image data can include image frames, video, pictures, and/or the like.
  • the sensor 120 may also comprise a millimeter wave radar, an infrared camera, Lidar (Light Detection And Ranging) sensor and/or other types of sensors.
  • the sensors 120 can also include other sensors, such as one or more Global Navigation Satellite System (GNSS) sensors, one or more compass or direction sensors, and/or one or more acceleration or motion sensors.
  • GNSS Global Navigation Satellite System
  • a location of the detected object e.g., its position, location, etc.
  • can be determined based on the actual location of the device e.g., using a GNSS sensor or the like
  • Other position detection techniques may also be employed to sense and determine a location of any detected object of interest.
  • the user interface (s) 130 may include one or more user input devices through which a user can input information or commands to the system 100.
  • the user interface (s) 130 may include a keypad, a touch-screen display, a microphone, or other user input devices through which a user can input information or commands.
  • the output devices 170 can include a display, a speaker or other devices which are able to convey information to a user.
  • the communication interface (s) 150 can include communication circuitry (e.g., transmitter (TX) , receiver (RX) , transceiver such as a radio frequency transceiver, etc. ) for conducting line-based communications with an external device such as a USB or Ethernet cable interface, or for conducting wireless communications with an external device, such as for example through a wireless personal area network, a wireless local area network, a cellular network or wireless wide area network.
  • the communication interface (s) 150 can, for example, be used to receive a ZSL model and its parameters or updates thereof from an external computing device 180 (e.g., server, data center, etc.
  • an external computing device 180 e.g., a user’s device such as a computer, etc.
  • an external computing device 180 e.g., a user’s device such as a computer, etc.
  • an external computing device 180 e.g., a user’s device such as a computer, etc.
  • external computing devices 180 e.g., a user’s device such as a computer, etc.
  • operations described herein such as the training stage, the testing stage, the alarm notification, vehicle control, and/or other operations as described herein.
  • the memory 140 is a data storage device that can store computer executable code or programs, which when executed by the processor 110, controls the operations of the system 100.
  • the memory 140 also can store configuration information for a ZSL model 142 and its data such as parameters (e.g., embedding matrix (or matrices) , compatible matrix, seen data, unseen data, threshold (s) , etc. ) , images 146 (e.g., training or seen images, captured images, etc. ) , and a detection algorithm 148 for implementing the various operations described herein, such as the training stage, the testing stage, the alarm notification, and other operations as described herein.
  • parameters e.g., embedding matrix (or matrices) , compatible matrix, seen data, unseen data, threshold (s) , etc.
  • images 146 e.g., training or seen images, captured images, etc.
  • a detection algorithm 148 for implementing the various operations described herein, such as the training stage, the testing stage, the
  • the processor 110 is in communication with the memory 140.
  • the processor 110 is a processing system, which can include one or more processors, such as CPU, GPU, controller, dedicated circuitry or other processing unit, which controls the operations of the system 100, including the detection operations (e.g., training stage, testing stage, alarm notification, etc. ) described herein in the present disclosure.
  • the processor 110 is configured to train the ZSL model 142 to detect a presence or absence of objects of interest (e.g., detect an object (s) or action (s) of interest, background (s) , etc. ) by configuring or learning the parameters including an embedding matrix with seen data and/or unseen data.
  • the processor 110 is also configured to test captured image (s) or regions thereof using the trained ZSL model 142 with the learned parameters in order to detect a presence (or absence) of an object in an image or region thereof.
  • the object of interest may include a person such as a pedestrian, an animal, vehicles, traffic signs, road hazards, and/or the like, or other objects or actions of interest according to the intended application.
  • the processor 110 is also configured to initiate an alarm or other notification when a presence of the object is detected, such as notifying a user by outputting the notification using the output device 170 or by transmitting the notification to an external computing device 180 (e.g., user’s device, data center, server, etc. ) via the communication interface 150.
  • the external computing device 180 can include components similar to those in the system 100, such as shown and described above with reference to Fig. 1.
  • Fig. 2 depicts a block diagram of example components or means of an example system 200 including processor (s) 210, and sensor (s) 220 in accordance with some example embodiments.
  • the system 200 may also include a radio frequency transceiver 250.
  • the system 200 may be mounted in a vehicle 20, such as a car or truck, although the system may be used without the vehicle 20 as well.
  • the environment around the vehicle 20 may include various objects of interest, including for example a bicycle or bicyclist 30, tree 32 or other objects detectable or to be detected by the system 200.
  • the system 200 may include the same or similar components and functionality, such as provided in the system 100 of Fig. 1.
  • the senor (s) 220 may comprise one or more image sensors configured to provide image data, such as image frames, video, pictures, and/or the like.
  • the sensor 220 may comprise a camera, millimeter wave radar, an infrared camera, Lidar (Light Detection And Ranging) sensor and/or other types of sensors.
  • the sensor (s) 220 can also include other sensors, such as one or more Global Navigation Satellite System (GNSS) sensors, one or more compass or direction sensors, and/or one or more acceleration or motion sensors.
  • GNSS Global Navigation Satellite System
  • a location of the detected object can be determined based on the actual location of the device or system (e.g., using the GNSS sensor or the like) and stereoscopic measurement of the distance of the object in the device or system. Other position detection techniques may also be employed to sense and determine a location of any detected object of interest.
  • the processor 210 may comprise of ZSL circuitry, which may represent dedicated ZSL circuitry configured to implement the zero-shot learning and other operations as described herein.
  • the ZSL circuitry may be implemented in other ways such as,using at least one memory including program code which when executed by at least one processing device (e.g., CPU, GPU, controller, etc. ) .
  • the system 200 may have a training stage.
  • the training stage may configure the ZSL circuitry to learn to detect and/or classify one or more objects of interest.
  • the processor 210 may be trained with seen data (e.g., images and labels for objects such as people, other vehicles, road hazards, and/or the like) and unseen data (e.g., attributes for unseen or unknown categories, etc. ) .
  • seen data e.g., images and labels for objects such as people, other vehicles, road hazards, and/or the like
  • unseen data e.g., attributes for unseen or unknown categories, etc.
  • the ZSL model may learn its configuration (e.g., embedding matrix (or matrices) , and/or the like) . Once trained, the configured ZSL model can be used in a test or operational stage to detect and/or classify regions (e.g., patches or portions) of an unknown, input image and thus determine whether that input image includes an object of interest or just background (i.e., not having an object of interest including a particular object or action involving such object) .
  • regions e.g., patches or portions
  • the system 200 may be trained to detect objects, such as people, animals, other vehicles, traffic signs, road hazards, and/or the like.
  • ADAS advanced driver assistance system
  • an output such as a warning sound, haptic feedback, indication of recognized object, or other indication may be generated to for example warn or notify a driver.
  • the detected objects may signal control circuitry to take additional action in the vehicle (e.g., initiate breaking, acceleration/deceleration, steering and/or some other action) .
  • the indication may be transmitted to other vehicles, IoT devices or cloud, mobile edge computing (MEC) platform and/or the like via radio transceiver 250.
  • MEC mobile edge computing
  • Fig. 3 is a flow diagram showing an example of an overall process 300 by which a system, such as for example in Fig. 1 or 2, is configured to implement training and/or testing stages with a ZSL model.
  • a system such as for example in Fig. 1 or 2
  • the process 300 is discussed below with reference to the processor 110 and other components of the system 100 in Fig 1, and describes high level operations that can be performed in relations to a training stage, and a testing stage.
  • the processor 110 is configured to provide a ZSL model with at least one initial embedding matrix during a training stage.
  • the ZSL model may be trained using an adaptive margin, such as described herein, or with other types of ZSL models.
  • the initial embedding matrix can be generated using seen data.
  • the processor 110 is configured to provide unseen data for one or more unseen instances during a training stage.
  • the processor 110 is configured to update the embedding matrix according to selected one or more unseen instances that have higher predicted confidence when the embedding matrix is applied.
  • the embedding matrix can be updated iteratively or gradually, such as for example by repeating the selection of unseen instances with higher predicted confidence, and then the update of parameters associated with the embedding matrix depending on the selection until the ZSL model converges or satisfies a threshold.
  • the processor 110 is configured to detect a presence of an object of interest (e.g., object or action) in the region of the image based on the updated embedding matrix of the ZSL model during the testing stage.
  • an object of interest e.g., object or action
  • the processor 110 can initiate further action (s) , including triggering or sending an alarm notification, controlling vehicle operations, and/or other actions according to the application.
  • Fig. 4 illustrates a flow diagram of an example of a transductive learning process 400 by which to refine or update a ZSL model and its parameters during a training stage.
  • the transductive ZSL model denotes the visual features of the seen images (e.g., images or videos) which are also referred to as training images, and denotes its corresponding semantic vectors (e.g., attribute, the distributed text representation) in C S classes, where N S is the number of training data, and d v and d a are their dimensionalities respectively.
  • N U is the number of unseen data.
  • d is the dimensionality for the common space, and ⁇ ⁇ (0, 1) is a threshold number.
  • the label’s vectors Y S of the seen instances X S are known, and can be used to train an initial embedding function, e.g., an embedding matrix (or matrices) .
  • the unseen instances X U and their candidate labels’ vectors Y U are also provided; however, the corresponding relations between X U and Y U are unknown, and are learned through transductive ZSL approach, which will be described in greater detail hereinafter.
  • An example of the transductive ZSL model includes the following inputs and outputs.
  • the inputs include visual features of the seen data, and their labels’semantic vectors
  • the inputs further include visual features of the unseen data, their labels’semantic vectors and a threshold number ⁇ ⁇ (0, 1) .
  • the outputs are the embedding matrix of W, and the category of the testing data X t ⁇ X U .
  • Fig. 4 illustrates an example of a transductive learning process 400 by which a system, such as for example in Fig. 1 or 2, is configured to implement training of the transductive ZSL model.
  • a system such as for example in Fig. 1 or 2
  • the process 400 is discussed below with reference to the processor 110 and other components of the system 100 in Fig 1, and describes high level operations that are performed in relations to a training stage.
  • the processor 110 initiates the process 400 by obtaining an initial embedding function, e.g., embedding matrix, between the visual and semantic space.
  • an initial embedding function e.g., embedding matrix
  • any ZSL approach can be used which utilizes one or more embedding matrices.
  • the process 400 also obtains unseen data.
  • the processor 110 predicts the label Y U of each unseen instance X t in the unseen data with equation (1) , as follows:
  • Y U denotes the semantic label vector matrix for target classes (e.g., such as attributes or word vectors)
  • y U ⁇ Y U is the semantic vector for the u-th candidate unseen class
  • the processor 110 selects a set (e.g., a number, some, or a few) of “pseudo” labeled instances. Assume there are m j instances to be classified into j-th category in Y U , the processor 110 selects the top ⁇ m j instances with higher similarity scores as the confident instances (e.g., instances with higher predicted confidence) , which are used to refine the embedding matrix. Denote the set of pseudo labeled instances and their corresponding label vector matrix as X t , Y t , respectively. At this stage, the final labels have not been predicted; instead, pseudo labels are predicted for use in optimizing the ZSL model, particularly the embedding matrix.
  • the processor 110 refines or updates the embedding matrix of the ZSL model according to the following objective function noted in equation (2) below:
  • is a weight between 0 and 1;
  • W is the embedding matrix
  • N is the number of training data
  • d v and d a are the dimensionalities of X and Y.
  • the optimal matrix W can be obtained with an iterative optimization, such as follows.
  • the matrix W can be obtained in a closed form solution according to equation (3) :
  • set ⁇ ⁇ + 0.1.
  • is a threshold.
  • the processor 110 predicts the label of each unseen instances x t according to equation (1) , e.g.,
  • process 400 is described with reference to optimization of one embedding matrix for a ZSL model, the process 400 can be applied to optimize any number of embedding matrices, e.g., two and so forth, according to the configuration or type of the ZSL model.
  • Fig. 5 illustrates experimental results for the proposed method, i.e., the transductive ZSL herein, and an existing ZSL method that uses embedding based on canonical correlation analysis (CCA) to align different semantic spaces with the low-level feature space (hereinafter “ZSL with CAA” ) , based on two available image benchmark datasets (e.g., AwA and CUB) .
  • Fig. 6 illustrates experimental results for the proposed transductive ZSL, and ZSL with CCA based on two available video benchmark datasets (e.g., HMDB51, and UCF101) .
  • the proposed transductive ZSL of the present disclosure provides a significant performance improvement to object recognition (e.g., object or action recognition) over existing ZSL approaches such as ZSL with CAA.
  • a transductive ZSL model with one embedding matrix can be applied to learn and refine a ZSL model that utilizes multiple embedding matrices.
  • a first embedding matrix can be a common space embedding matrix
  • a second embedding matrix can be a sematic-to-common space embedding matrix.
  • the two embedding matrices can include a mapping matrix W and a mapping matrix H.
  • An example process for transductively refining a ZSL model with two matrices is similar to the above process 400 of Fig. 4 for a single matrix, and is described below in greater detail.
  • An example of the transductive ZSL model for two matrices includes the following inputs and outputs.
  • the inputs include visual feature of the seen data, and their labels’semantic vectors
  • the inputs further include visual features of the unseen data, their labels’semantic vectors
  • the term d is for the dimensionality of the common space, and a threshold number is ⁇ ⁇ (0, 1) .
  • Outputs are the mapping matrix W and mapping matrix H, and the category of the testing data x′ ⁇ X 1 .
  • Step 1 the initial embedding functions between the visual and semantic space are obtained. It should be noted that most of the existing ZSL methods can be used in this operation step.
  • a ZSL approach with CCA canonical correlation analysis
  • the objective function of CCA is set forth in equation (4) as follows:
  • the equation (4) can be solved to obtain a visual to common space embedding matrix W 0 and a semantic to common space embedding matrix H 0 . Both W 0 and H 0 can be used as the initial embedding functions.
  • Step 2 the label y * of each unseen instances x′can be predicted according to the equation (5) , as follows:
  • Step 3 a set of pseudo labeled instances are selected. Assume there are m j instances to be classified into j-th category in Y′. Select the top ⁇ m j instances with higher similarity scores as the confident instances, which are used to refine the embedding matrix.
  • the set of pseudo labeled instances and their corresponding label vector matrix are denoted as x t and y t respectively.
  • Step 4 the embedding model is then refined with the following objective function according to equation (6) :
  • the optimal matrices W and H can be obtained with an iterative optimization method as follows. For example, fixing matrix H and optimizing matrix W, the following is obtained:
  • is a small number, for example, 0.01.
  • Step 8 the label of each unseen instances x′is predicted according to equation (5) , e.g.,
  • an adaptive max margin ZSL in a multi-classification model is provided.
  • an adjustable (or variable) penalty is assessed on instances depending on their classification confidence. For example, a small penalty is assessed on the instances with higher classification confidence, and a more severe penalty is assessed on those instances with lower classification confidence.
  • an adaptive margin can be used to configure an ZSL model by considering the embedding differences among data through the application of different penalties in a multi-classification model.
  • the adaptive max margin ZSL model denotes the visual features of the seen images (e.g., images or videos) which are also referred to as training images, and denotes its corresponding labels in c classes, where N is the number of training data, and d v is the dimensionality of visual features.
  • N is the number of training data
  • d v is the dimensionality of visual features.
  • W s VA s
  • the objective is to learn a compatible matrix V (from which to determine an embedding matrix W s ) .
  • ⁇ n is a slack variable
  • the adaptive margin is where is the calculated predicted label vector.
  • the Equation (A) can be solved, for example, by a concave-convex procedure (CCCP) .
  • the adaptive max margin ZSL model can include the following inputs and outputs.
  • the input further includes the semantic label vector matrix for source classes the randomly initialized matrix V (0) , and a tolerance value ⁇ .
  • Fig. 7 is a flow diagram showing an example process 700 by which a system, such as for example in Fig. 1 or 2, is configured to implement a training stage for training and generating a ZSL model using adaptive margin, e.g., an adaptive max margin learning approach.
  • the adaptive margin approach can be utilized to learn or train a ZSL model and its parameters including embedding function, e.g., compatible matrix and embedding matrix.
  • the process 700 is discussed below with reference to the processor 110 and other components of the system 100 in Fig 1, and describes operations that are performed during the training stage.
  • the process 700 begins by initializing values and obtaining other data including seen data.
  • the processor 110 calculates the predicted class label vector:
  • the processor 110 calculates the calculated predicted class label vector:
  • the processor 110 calculates the adaptive margin:
  • the processor 110 calculates the loss of equation (A) according to the following equations and conditions.
  • the processor 110 updates the compatible matrix V with the stochastic gradient descent approach as follows:
  • is a learning step-size variable
  • e.g., a constant value
  • the ZSL model can be used to detect objects of interest including from unseen categories.
  • x′de notes a testing instance
  • a t denotes the semantic label vector matrix for target classes (such as attributes or word vectors)
  • Z is the target class label space that is disjoint from the source class label space Y, i.e., Then, the predicted label z′of x′can be determined by:
  • Fig. 8 illustrates experimental results for the proposed method, i.e., the adaptive max margin ZSL herein, and an existing ZSL method that uses embedding based on canonical correlation analysis (CCA) to align different semantic spaces with the low-level feature space (hereinafter “ZSL with CAA” ) , using two available image benchmark datasets (e.g., AwA and CUB) .
  • Fig. 9 illustrates experimental results for the proposed adaptive max margin ZSL, and ZSL with CCA, using two available video benchmark datasets (e.g., HMDB51, and UCF101) .
  • the proposed adaptive max margin ZSL of the present disclosure provides a significant performance improvement to object recognition (e.g., object or action recognition) over existing ZSL approaches such as ZSL with CAA.
  • Fig. 10 is a flow diagram showing an example detection process 1000 by which a system, such as for example in Fig. 1 or 2, is configured to detect a presence (or absence) of an object of interest, using a trained ZSL model that is generated, refined and/or updated using an adaptive margin (e.g., adaptive max margin) and/or a transductive approach such as described herein.
  • a system such as for example in Fig. 1 or 2
  • a trained ZSL model that is generated, refined and/or updated using an adaptive margin (e.g., adaptive max margin) and/or a transductive approach such as described herein.
  • an adaptive margin e.g., adaptive max margin
  • the sensor (s) 120 captures image (s) .
  • the images can be captured for different scenarios depending on the application for the detection process 1000.
  • the sensor (s) 120 may be positioned, installed or mounted to capture images for fixed locations (e.g., different locations in or around a building or other location) or for movable locations (e.g., locations around a moving vehicle, person or other system) .
  • a camera system such as a single or multi-lens camera or camera system to capture panoramic or 360 degree images, can be installed on a vehicle.
  • the processor 110 scans each region of an image and extracts visual feature, such as from the captured image (s) .
  • the processor 110 applies the ZSL model to each region of the image or extracted visual feature. For example, as part of the image classification, a label is predicted for the extracted visual feature in the image using the trained ZSL model and its parameters.
  • the above-noted equation (1) can be used for label prediction:
  • z′ arg max z ⁇ Z (x′) T V A t z.
  • the processor 110 classifies the visual feature by determining if the extracted visual feature in the image corresponds to an object of interest (e.g., the vector u * , y * or z * matches a label for a particular object of interest or is background) .
  • an object of interest e.g., the vector u * , y * or z * matches a label for a particular object of interest or is background
  • the processor 110 does not take any particular action (e.g., does not initiate an alarm or notification) at reference 1010.
  • the process 1000 continues to capture and evaluate images. Otherwise, if an object of interest is detected, the processor 110, at reference 712, initiates an action, such as an alarm or notification reflecting a detection of an object of interest or classification of such an object, or other actions including but not limited to control of vehicle operations, and so forth.
  • an object of interest can include a pedestrian, an animal, a tree, a vehicle, a bicycle, a traffic sign, a road hazard or other pertinent objects depending on the intended application for the detection process.
  • the alarm or notification may be initiated locally at the system 100 via one of the output devices 170 or transmitted to an external computing device 180.
  • the alarm may be provided to the user in the form of a visual or audio notification or other suitable medium (e.g., vibrational, etc. ) .
  • the notification may be provided to a control system for an autonomous or semi-autonomous vehicle control system (e.g., ADAS) or the like.
  • ADAS autonomous or semi-autonomous vehicle control system
  • Fig. 11 is a flow diagram showing an example process 1100 by which a system, such as for example in Fig. 1 or 2, is configured to process one or more videos or images, such as video or image stitching and other video or image processing-related operations, from multiple video or image streams.
  • the multiple video or image streams can be captured from separate lenses in a multi-sensor system (e.g., a multi-camera system) , which is configured to provide spherical and stereoscopic video or image capture with optional spatial audio array.
  • the multi-sensor system can be configured to produce a virtual reality or panoramic environment, e.g., 360x180 degree video or image (e.g., full spherical) and with optional 360 degree spatial audio or other desired video/image/audio representation.
  • the virtual reality or panoramic environment can be generated by stitching/blending a video stream (or an image) with another video stream (or another image) subsequently captured/accessed that has an overlapping region with the video stream (or the image) .
  • the process 1100 is discussed below with reference to the processor 110 and other components of the system 100 in Fig 1.
  • the process 1100 as discussed below can also be implemented in the processor 210 and other components of the system 200 in Fig 2.
  • one or more video streams are captured from one or more sensors 120.
  • the video streams can be previously recorded, live or real-time video streams.
  • the processor 110 is configured to detect a presence or absence of one or more object (s) of interest in the video streams (or the images) using the ZSL model or techniques, such as described herein.
  • a location e.g., position, location, etc.
  • location information is determined for detected one or more object (s) of interest.
  • the location of the detected object can be determined by the processor 110 based on the actual location of the device or system (e.g., detected using a sensor such as a GNSS sensor, etc. ) and stereoscopic measurement of the distance of the object in the device or system.
  • the location information may comprise coordinates of the object (moveable, moving or stationary) , or one or more parts of the object, in one or more video streams or one or more images.
  • Other position detection techniques can also be applied to ascertain the position of an object (s) of interest.
  • the processor 110 (or the system 100) is configured to perform video or image processing operation (s) , such as stitching or other operations, according to detected presence or absence of object (s) of interest and their determined location.
  • the video or image processing may be performed live, in real-time, or with recorded video or image data.
  • These operations can, for example, include: (i) avoiding stitching on area (s) with specific objects (e.g., critical objects) , such as a people or animal or a moving object by automatically moving a stitching seam line between two video or image streams based on the object recognition and object’s location; (ii) blending by automatically (versus manually or by manual operation) moving multiple stitching areas/locations based on the object recognition and object’s location; (iii) automatically adjusting convergence depth (s) of a stitching seam/area based on the object recognition and object’s location; (iv) stitching with autotune that automatically adjusts stitching areas/locations based on the object recognition and object’s location; and (v) panning with autotune that automatically pans a central subject selected based on object recognition and object’s location to a central view; and/or (vi) other video or image processing operations to provide seamless or substantially seamless video or image such as, for example, live video or live video feed produced from stitching or blending one or more live
  • the processor 110 can provide or suggest one or more processing operations according to detected presence or absence of object (s) of interest and their determined location, wherein a user of the system can make one or more selections on the provided or suggested one or more processing operations.
  • the method includes facilitating at least in part receiving of a first image and a second image.
  • the first image and the second image comprise an overlapping region and are configured to be blended to generate a panoramic image.
  • the first image and the second image may be captured in a successive manner.
  • the first and second images may also be received from a memory location, where these images are already stored.
  • the method further includes facilitating receiving of location information of a movable object.
  • the location information comprises location information of the movable object in the overlapping regions of the first image and the second image.
  • the location information may include information of the points associated with the movable object.
  • the method may include identifying presence of the movable object, for example, in viewfinders of the first image and the second image.
  • the apparatus may also be configured to determine points associated with the movable object in the overlapping regions of the first image and the second image.
  • the location information may be stored in a memory internal or external to the apparatus/system. In some example embodiments, the location information may be stored as metadata of the first image and the second image.
  • the method also includes generating the panoramic image based on the first image, the second image and the location information.
  • the panoramic image may be generated based on the first image, the second image and the location information of the movable object in the overlapping regions of the first image and the second image.
  • generating the panoramic image comprises determining an image registration matrix of the first image and the second image.
  • the image registration matrix is determined by discarding the points associated with the movable object.
  • the first image and the second image may be warped together based on the image registration matrix.
  • the method comprises labelling a seam in the overlapping regions of warped images of the at least one of the first image and the second image based on the points associated with the movable object.
  • labelling the seam comprises defining the seam such that the seam does not pass through the movable object.
  • the method includes blending the first image and the second image along the seam to generate the panoramic image.
  • a second example of an image processing method is described hereinafter.
  • the method includes identifying presence of a movable object in a viewfinder of a first image, such as using the object detection processes described herein.
  • the first image is captured.
  • location information of the movable object is determined in the first image.
  • the location information may be stored along with the first image as metadata of the first image.
  • the location information may be stored in a separate file.
  • the location information comprises coordinates of the movable object in the first image.
  • a second image is captured.
  • the second image has an overlapping region with the first image.
  • the second image may be captured if there is at least a threshold percentage of overlap with the first image in a viewfinder frame of the second image.
  • the method comprises determining location information of the movable object in the overlapping region of the second image.
  • an object tracking means such as a specific algorithm implemented in the system 100 or 200 and executed by the processor 110 or 210, may be configured to track the points corresponding to the movable object in the overlapping region of the second image and/or the first image.
  • the method further comprises generating the panoramic image based on the first image, the second image and the location information of the movable object in the first image and the overlapping region of the second image. It should be noted that the movable object may or may not be present in both of the overlapping regions of the first image and the second image.
  • the generation of the panoramic image can be performed in the following manner.
  • An image registration matrix of the first image and the second image is determined.
  • the image registration matrix is determined by discarding the points associated with the movable object.
  • correspondences for the points belonging to the non-movable objects in the overlapping regions of the first and second images are considered for determining the image registration matrix.
  • the first image and the second image are warped together based on the image registration matrix.
  • the first and second images may be warped based on the image registration matrix; for example, the second image alone or in combination with the first image may be rotated/aligned in a required manifold.
  • a seam may be labelled in the overlapping regions of warped images of the at least one of the first image and the second image.
  • the seam is defined such that the seam does not pass through the movable object.
  • the first image and the second image are then blended along the seam to generate the panoramic image.
  • system 100 or 200 can be used to implement among other things operations including the training stage, the testing stage, the alarm notification and the vehicle operation, these operations may be distributed and performed across a plurality of systems over a communication network (s) .
  • transductive ZSL approach as described herein, can be applied to refine or update the embedding matrix (or matrices) of a ZSL model, including the Adaptive Max Margin ZSL model described herein.
  • the training and testing stages may also adopt other suitable loss functions or incorporate additional training strategies.
  • the ZSL approaches may be utilized in various applications, including but not limited to object (or action) detection/recognition in video surveillance systems, in autonomous or semi-autonomous vehicles, or in ADAS implementations.
  • the ZSL approaches can also be employed in natural scene understanding, image/video retrieval, virtual reality, or other applications involving object classification or recognition.
  • example embodiments may be implemented as a machine, process, or article of manufacture by using standard programming and/or engineering techniques to produce programming software, firmware, hardware or any combination thereof.
  • Any resulting program (s) having computer-readable program code, may be embodied on one or more computer-usable media such as resident memory devices, smart cards or other removable memory devices, or transmitting devices, thereby making a computer program product or article of manufacture according to the embodiments.
  • the terms “article of manufacture” and “computer program product” as used herein are intended to encompass a computer program that exists permanently or temporarily on any computer-usable medium or in any transmitting medium which transmits such a program.
  • memory/storage devices can include, but are not limited to, disks, solid state drives, optical disks, removable memory devices such as smart cards, SIMs, WIMs, semiconductor memories such as RAM, ROM, PROMS, etc.
  • Transmitting mediums include, but are not limited to, transmissions via wireless communication networks, the Interuet, intranets, telephone/modem-based network communication, hard-wired/cabled communication network, satellite communication, and other stationary or mobile network systems/communication links.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention porte sur un appareil et un procédé pour mettre en œuvre une détection d'objet avec un modèle d'apprentissage sans exemple. L'appareil et le procédé sont configurés pour fournir au moins une matrice d'intégration du modèle d'apprentissage sans exemple, pour fournir des données non vues pour une ou plusieurs instances non vues, pour mettre à jour la matrice d'incorporation selon une ou plusieurs instances non vues qui ont une confiance prédite plus élevée lorsque la matrice d'intégration est appliquée, et pour détecter un objet d'intérêt dans une catégorie non vue à partir d'une région d'une image à l'aide de la matrice intégrée mise à jour. La matrice d'incorporation initiale peut être affinée au préalable à l'aide d'une approche de marge maximale adaptative.
PCT/CN2017/075764 2017-03-06 2017-03-06 Procédé et système d'apprentissage sans exemple à marge maximale transductive et/ou adaptative WO2018161217A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2017/075764 WO2018161217A1 (fr) 2017-03-06 2017-03-06 Procédé et système d'apprentissage sans exemple à marge maximale transductive et/ou adaptative
EP17899880.3A EP3593284A4 (fr) 2017-03-06 2017-03-06 Procédé et système d'apprentissage sans exemple à marge maximale transductive et/ou adaptative
CN201780088157.6A CN110431565B (zh) 2017-03-06 2017-03-06 直推式和/或自适应最大边界零样本学习方法和系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/075764 WO2018161217A1 (fr) 2017-03-06 2017-03-06 Procédé et système d'apprentissage sans exemple à marge maximale transductive et/ou adaptative

Publications (1)

Publication Number Publication Date
WO2018161217A1 true WO2018161217A1 (fr) 2018-09-13

Family

ID=63447133

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/075764 WO2018161217A1 (fr) 2017-03-06 2017-03-06 Procédé et système d'apprentissage sans exemple à marge maximale transductive et/ou adaptative

Country Status (3)

Country Link
EP (1) EP3593284A4 (fr)
CN (1) CN110431565B (fr)
WO (1) WO2018161217A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582960A (zh) * 2018-11-27 2019-04-05 上海交通大学 基于结构化关联语义嵌入的零示例学习方法
CN109598279A (zh) * 2018-09-27 2019-04-09 天津大学 基于自编码对抗生成网络的零样本学习方法
CN111914872A (zh) * 2020-06-04 2020-11-10 西安理工大学 一种标记与语义自编码融合的零样本图像分类方法
EP3751467A1 (fr) * 2019-06-14 2020-12-16 Robert Bosch GmbH Système d'apprentissage machine
CN115424096A (zh) * 2022-11-08 2022-12-02 南京信息工程大学 一种多视角零样本图像识别方法
CN116051909A (zh) * 2023-03-06 2023-05-02 中国科学技术大学 一种直推式零次学习的未见类图片分类方法、设备及介质
CN117893743A (zh) * 2024-03-18 2024-04-16 山东军地信息技术集团有限公司 一种基于通道加权和双对比学习的零样本目标检测方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763744A (zh) * 2020-06-02 2021-12-07 荷兰移动驱动器公司 停车位置提醒方法及车载装置
CN117541882B (zh) * 2024-01-05 2024-04-19 南京信息工程大学 一种基于实例的多视角视觉融合转导式零样本分类方法

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512679A (zh) * 2015-12-02 2016-04-20 天津大学 一种基于极限学习机的零样本分类方法
CN105701504A (zh) * 2016-01-08 2016-06-22 天津大学 用于零样本学习的多模态流形嵌入方法
CN105701514A (zh) * 2016-01-15 2016-06-22 天津大学 一种用于零样本分类的多模态典型相关分析的方法
CN105718940A (zh) * 2016-01-15 2016-06-29 天津大学 基于多组间因子分析的零样本图像分类方法
CN105740888A (zh) * 2016-01-26 2016-07-06 天津大学 一种用于零样本学习的联合嵌入模型
US20160253597A1 (en) 2015-02-27 2016-09-01 Xerox Corporation Content-aware domain adaptation for cross-domain classification
WO2016145379A1 (fr) * 2015-03-12 2016-09-15 William Marsh Rice University Compilation automatisée de description probabiliste de tâches en une spécification exécutable de réseau neuronal
CN106096661A (zh) * 2016-06-24 2016-11-09 中国科学院电子学研究所苏州研究院 基于相对属性随机森林的零样本图像分类方法
CN106203472A (zh) * 2016-06-27 2016-12-07 中国矿业大学 一种基于混合属性直接预测模型的零样本图像分类方法
CN106203483A (zh) * 2016-06-29 2016-12-07 天津大学 一种基于语义相关多模态映射方法的零样本图像分类方法
CN106250925A (zh) * 2016-07-25 2016-12-21 天津大学 一种基于改进的典型相关分析的零样本视频分类方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101179860B (zh) * 2007-12-05 2011-03-16 中兴通讯股份有限公司 随机接入信道的zc序列排序方法和装置
US10331976B2 (en) * 2013-06-21 2019-06-25 Xerox Corporation Label-embedding view of attribute-based recognition

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160253597A1 (en) 2015-02-27 2016-09-01 Xerox Corporation Content-aware domain adaptation for cross-domain classification
WO2016145379A1 (fr) * 2015-03-12 2016-09-15 William Marsh Rice University Compilation automatisée de description probabiliste de tâches en une spécification exécutable de réseau neuronal
CN105512679A (zh) * 2015-12-02 2016-04-20 天津大学 一种基于极限学习机的零样本分类方法
CN105701504A (zh) * 2016-01-08 2016-06-22 天津大学 用于零样本学习的多模态流形嵌入方法
CN105701514A (zh) * 2016-01-15 2016-06-22 天津大学 一种用于零样本分类的多模态典型相关分析的方法
CN105718940A (zh) * 2016-01-15 2016-06-29 天津大学 基于多组间因子分析的零样本图像分类方法
CN105740888A (zh) * 2016-01-26 2016-07-06 天津大学 一种用于零样本学习的联合嵌入模型
CN106096661A (zh) * 2016-06-24 2016-11-09 中国科学院电子学研究所苏州研究院 基于相对属性随机森林的零样本图像分类方法
CN106203472A (zh) * 2016-06-27 2016-12-07 中国矿业大学 一种基于混合属性直接预测模型的零样本图像分类方法
CN106203483A (zh) * 2016-06-29 2016-12-07 天津大学 一种基于语义相关多模态映射方法的零样本图像分类方法
CN106250925A (zh) * 2016-07-25 2016-12-21 天津大学 一种基于改进的典型相关分析的零样本视频分类方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3593284A4

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598279A (zh) * 2018-09-27 2019-04-09 天津大学 基于自编码对抗生成网络的零样本学习方法
CN109598279B (zh) * 2018-09-27 2023-04-25 天津大学 基于自编码对抗生成网络的零样本学习方法
CN109582960A (zh) * 2018-11-27 2019-04-05 上海交通大学 基于结构化关联语义嵌入的零示例学习方法
EP3751467A1 (fr) * 2019-06-14 2020-12-16 Robert Bosch GmbH Système d'apprentissage machine
CN111914872A (zh) * 2020-06-04 2020-11-10 西安理工大学 一种标记与语义自编码融合的零样本图像分类方法
CN111914872B (zh) * 2020-06-04 2024-02-02 西安理工大学 一种标记与语义自编码融合的零样本图像分类方法
CN115424096A (zh) * 2022-11-08 2022-12-02 南京信息工程大学 一种多视角零样本图像识别方法
CN115424096B (zh) * 2022-11-08 2023-01-31 南京信息工程大学 一种多视角零样本图像识别方法
CN116051909A (zh) * 2023-03-06 2023-05-02 中国科学技术大学 一种直推式零次学习的未见类图片分类方法、设备及介质
CN117893743A (zh) * 2024-03-18 2024-04-16 山东军地信息技术集团有限公司 一种基于通道加权和双对比学习的零样本目标检测方法
CN117893743B (zh) * 2024-03-18 2024-05-31 山东军地信息技术集团有限公司 一种基于通道加权和双对比学习的零样本目标检测方法

Also Published As

Publication number Publication date
CN110431565B (zh) 2023-06-20
CN110431565A (zh) 2019-11-08
EP3593284A4 (fr) 2021-03-10
EP3593284A1 (fr) 2020-01-15

Similar Documents

Publication Publication Date Title
WO2018161217A1 (fr) Procédé et système d'apprentissage sans exemple à marge maximale transductive et/ou adaptative
US12056623B2 (en) Joint processing for embedded data inference
US11308334B2 (en) Method and apparatus for integration of detected object identifiers and semantic scene graph networks for captured visual scene behavior estimation
EP3673417B1 (fr) Système et procédé d'apprentissage distributif et de distribution de poids dans un réseau neuronal
US11074470B2 (en) System and method for automatically improving gathering of data using a data gathering device
CN113366496A (zh) 用于粗略和精细对象分类的神经网络
US20200013273A1 (en) Event entity monitoring network and method
KR20190078543A (ko) 이미지 획득 장치 및 그의 제어 방법
US11592825B2 (en) Electronic device and operation method therefor
KR20180107930A (ko) 딥 러닝을 이용한 인공지능 기반 영상 감시 방법 및 시스템
KR102404791B1 (ko) 입력 영상에 포함된 객체를 인식하는 디바이스 및 방법
JP7111175B2 (ja) 物体認識システム、認識装置、物体認識方法および物体認識プログラム
US11995766B2 (en) Centralized tracking system with distributed fixed sensors
WO2016179808A1 (fr) Appareil et procédé de détection de parties du visage et de visage
US20190114799A1 (en) Image recognition system
Zhang et al. Autonomous long-range drone detection system for critical infrastructure safety
WO2022243337A2 (fr) Système de détection et de gestion d'incertitude dans des systèmes de perception, pour la détection de nouveaux objets et pour l'anticipation de situation
JP2022164640A (ja) マルチモーダル自動ラベル付けと能動的学習のためのデータセットとモデル管理のためのシステムと方法
US20210390419A1 (en) Device and Method for Training and Testing a Classifier
EP4343700A1 (fr) Architecture pour une augmentation d'intelligence artificielle distribuée
KR20210048271A (ko) 복수 객체에 대한 자동 오디오 포커싱 방법 및 장치
KR20200084428A (ko) 동영상을 제작하는 방법 및 그에 따른 장치
US11553162B2 (en) Image processing system for extending a range for image analytics
KR20210127638A (ko) 분류기를 훈련하고 분류기의 강건성을 평가하기 위한 디바이스 및 방법
KR20210127639A (ko) 분류기를 훈련하기 위한 디바이스 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17899880

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017899880

Country of ref document: EP

Effective date: 20191007