CN104537124A - Multi-view metric learning method - Google Patents

Multi-view metric learning method Download PDF

Info

Publication number
CN104537124A
CN104537124A CN201510042581.XA CN201510042581A CN104537124A CN 104537124 A CN104537124 A CN 104537124A CN 201510042581 A CN201510042581 A CN 201510042581A CN 104537124 A CN104537124 A CN 104537124A
Authority
CN
China
Prior art keywords
learning method
frame
metric
metric learning
view
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510042581.XA
Other languages
Chinese (zh)
Other versions
CN104537124B (en
Inventor
张驰
付彦伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUZHOU DEWO INTELLIGENT SYSTEM Co Ltd
Original Assignee
SUZHOU DEWO INTELLIGENT SYSTEM Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUZHOU DEWO INTELLIGENT SYSTEM Co Ltd filed Critical SUZHOU DEWO INTELLIGENT SYSTEM Co Ltd
Priority to CN201510042581.XA priority Critical patent/CN104537124B/en
Publication of CN104537124A publication Critical patent/CN104537124A/en
Application granted granted Critical
Publication of CN104537124B publication Critical patent/CN104537124B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The invention discloses a multi-view metric learning method for multi-view video abstraction. The multi-view metric learning method comprises the following steps: decomposing a video into a set of frames; learning a unitized metric space; performing clustering on the unitized metric space; selecting a specific frame for outputting as an abstract. The multi-view metric learning method is capable of finding out the metric for best separating data, and simultaneously forcing the learned metric to keep the original inherent information between data points.

Description

Multi views metric learning method
Technical field
The present invention relates to a kind of multi views metric learning method, specifically, relate to a kind of multi views metric learning method for multi-view video summary.
Background technology
Supervised study is by learning with Modling model for predicting the mark having no example a large amount of markd training example.Here " mark " refers to the output corresponding to example, and such as, in classification task, mark is exactly the classification of example, and mark is exactly the real-valued output corresponding to example in recurrence task.Along with the mankind collect, store the high development of data capability, can easily obtain large quantities of Unlabeled data in a lot of actual task, these data be given to the manpower and materials marking and then often need at substantial.Semi-supervised learning attempts to allow Learning machine automatically utilize a large amount of Unlabeled data has flag data to learn with auxiliary on a small quantity.And do not need manpower to carry out input label without supervised study.
In the past ten years, multi views study receives a large amount of concern.The existing method of great majority is all devoted to semi-supervised learning.
In many real world applications, the data of non-label are usually expressed as the view of a large amount of height correlation.Such as, in field of video processing, different cameras may focus on substantially same visual field.Again such as, in Quick Response Code scanning field, have multiple fixed Quick Response Code scanner at diverse location from the same Quick Response Code target of multiple angle shot, and for hand-held scanner rifle, also can produce the Quick Response Code target image of multiple different angles because of shake.In this case, people may expect utilize correlativity to help understand data and make its characterization, and expect to find the tolerance of " optimization " of the immanent structure that can reflect input data.
Assuming that the complicated human motion in this locality set coordinate is time dependent function, and temporally sampled by multiple camera simultaneously.In order to disclose the feature of this initial space, classic method generally extracts high-dimensional space of feature vectors to each view video with multiple supposition independently, then uses many dimension reduction methods.
Conventional video method of abstracting is designed to generate summary to single-view video record, therefore can not utilize the redundancy in multi views record completely, can neglect that different view video comprises concerning information distinctive and complementary raw data set.
Summary of the invention
The invention provides a kind of multi views metric learning method for multi-view video summary.Multi-view video gathers the different visual projections of the identical space-time in real-life simultaneously.Multi views metric learning in the present invention projects all multi-view videos in new metric space, with simulate real world diverse spatial best, and is used for disclosing the internal characteristics of object of which movement.This greatly simplifies video frequency abstract by preserving across maximum internal characteristics of different views.In learnt metric space, vision data is made a summary by cluster and extract key frame in each cluster.
Method of the present invention its combine the advantage of largest interval cluster and inconsistent minimum sandards, therefore, it is possible to find the tolerance of mask data best, and the original internal information forcing learnt tolerance to keep between data point simultaneously.
Method of the present invention is specially adapted to the situation of unsupervised learning.
The invention provides a kind of multi views metric learning method for multi-view video summary, it comprises the following steps:
(1) videograph is decomposed the set of framing, be expressed as X (1)..., X (K), wherein , be the d of n the frame representing a kth view kdimensional characteristics, R represents real number, d krepresent the dimension in a kth former space, n represents frame number;
(2) according to being in X (1)..., X (K)in information, the metric space X that study is unitized, wherein , d represents the dimension mapping rear space.
(3) perform the cluster on X, the center using cluster representatively, is expressed as , F represents the set of summary, i 1i crepresent the ID of a frame;
(4) to each f icfrom K frame, select the frame of and its correspondence, and export these frames as last summary.
Wherein, in the step performing study, unified coordinates matrix is found , minimize it
Wherein, R emp(X), R struct(X), R diff(X) be the empirical loss of X, structural penalties and inconsistent loss respectively; γ 1, γ 2it is the parameter balanced between control objectives; And
Empirical loss R emp(X) be 0;
Structural penalties R struct(X) be , wherein G xthe similarity transformation of tolerance X, it is normalization Laplace operator;
Inconsistent loss R diff(X) be
, wherein tr is mark.
According to a preferred embodiment of the present invention, minimized R (X) is
According to another preferred embodiment of the present invention, minimized R (X) is optimized for
, wherein
, and .
According to another preferred implementation of the present invention, minimized R (X) is optimized for further
, wherein
, μ represents weight factor, and μ k represents weight factor.
Accompanying drawing explanation
Fig. 1 shows the process flow diagram of the method according to some embodiments of the present invention.
Embodiment
Hereafter will set forth the specific embodiment of the present invention by reference to the accompanying drawings.Need to understand, these embodiments are illustrative, and not restrictive.
The object of largest interval cluster is the cluster finding out band largest interval.Method of the present invention optimizes the measurement of graph theory to find the kernel matrix allowed compared with large-spacing between cluster.
Suppose the low-level features of K different views, wherein it is coordinates matrix.The present invention seeks to find unified coordinates matrix , it minimizes
Wherein R emp(X), R struct(X), R diff(X) be experience, structure and the inconsistent loss of X respectively; γ 1, γ 2it is the parameter balanced between control objectives.
The loss R of experience emp(X) usually definition for tag information is passed through.Such as, in observed Multiple Kernel Learning, R emp(X) minimum turnover loss (can obtain in the tolerance defined by X) is normally defined.The loss R of structure struct(X) complicacy of sorter can be defined as, or be used for ensureing that similar " example " has similar " label ".Inconsistent loss R diff(X) then X and X is referred to (k)different amplitudes.The inconsistent standard (DMC) that minimizes passes through R diff(X) add.
In learning procedure, the present invention adopts unsupervised multi views metric learning.
First, assuming that G (1)..., G (K)∈ R nxnby metric space X (1)..., X (K)the similar matrix defined respectively, wherein G (k)(i, j)=G (k)(j, i) is data point x iand x jin the similarity of a kth view, n represents frame number.
With as normalization Laplace operator, wherein the normalization Laplace operator of similarity matrix G is defined as:
Wherein, , and , I is order matrix.
Good video frequency abstract should have the R better coordinated diff, and be constant (such as rotate, translation and convergent-divergent) to the metric transformation of synchronization frame (X).For this reason, inconsistent loss is:
(3)
G xthe similarity transformation of tolerance X.This equation is constant to some metric transformation, such as rotation, translation and convergent-divergent, and the different visual conditions better coordinated.In addition, more need to introduce without optimized variable.
To R struct(X) definition is by the inspiration of following spectral image theory.
First, the diversity c of eigenvalue 0 equal the component that connects in image number.
Secondly,
For , have
Wherein ,
be arrive shortest path,
and
be eigenvalue.
These theories show a k smallest eigen determine by k-cluster in the tolerance of (it is the conversion of tolerance x) implicit difiinition.Therefore definition structure loss is as follows:
Finally, unsupervised study is arranged does not have label information, makes simply .
In conjunction with above-mentioned definition, formula is provided for unsupervised multi views metric learning:
(5)
In some alternative embodiments, because be the inconsistent measurement between metric space, can think that CCA is a good selection.But, the calculating of CCA relates to the optimization of transformation matrix, and this can introduce optimized variable, makes optimization become untraceable.
The simplification that CCA measures causes the following inconsistent measurement based on prediction:
Wherein with represent according to tolerance X and the prediction of the sorter of study.
When classification results can easily from learn tolerance derive time, this definition be recommend.But problem produces when facing cluster task, and wherein the inconsistent of different cluster causes dyscalculia.
In some embodiments, also peer-to-peer 5 is optimized.
If , and
Once find , the metric space with regard to implicit difiinition.In fact, given , coordinates matrix metric space, wherein be eigenvector, correspond to i-th smallest eigen, and k-means algorithm can be used for carrying out cluster according to this space.Therefore, in order to the object of cluster, it is enough to calculate itself.(to know, have and the same eigenvector, therefore causes the same cluster result).This optimization problem becomes now
Consider efficiency, also suppose ,
First fixing , P, can pass through eigen decomposition try to achieve; Then fixing P, μ by quadratic programming (quadratic programming) until concentrate try to achieve (as shown in the formula), this quadratic programming can pass through Mosek(software) effectively try to achieve, wherein m is little ([RU1] is Constant Grade algorithm complex) in practice forever.
(8)
According to certain embodiments of the present invention, for generating video frequency abstract, assuming that each event E in real world ian all corresponding distribution D i, wherein D ithat zonule in " potential " semantic space is placed in the middle.Event E ieach " example " be exactly a data point x ij, it is according to the D in latent semantic space isample.
The video of the different visual angles of process same place of the present invention, therefore the high-dimensional low-level features of each view is embedded in identical low latitude space.This has just been applied to the inconsistent standard that minimizes in study.
Fig. 1 shows the process flow diagram of the method according to some embodiments of the present invention.
See Fig. 1, carry out multi-view video summary in the following manner:
(1) videograph is decomposed the set of framing, be expressed as , wherein , be the d of n the frame representing kth view kdimensional characteristics,
(2) according to being in in information, the metric space that study is unitized .
(3) perform the cluster on X, the center using cluster representatively, is expressed as
(4) to each , from K frame, select a frame according to it, and export these frames as last summary.
Method according to the present invention compares experiment, and handling object is the office data collection previously gathered, and finds that method of the present invention is better than existing certain methods greatly.Experimental result sees the following form:
Method Event number Precision (%) Recall rate (%)
Evenly/random 10/5 70/60 26.9/11.5
ED/DM 10/13 80/76.9 30.8/38.5
The present invention 20 100 76.9
Wherein " evenly " expression is evenly made a summary to video, and " at random " expression is made a summary at random to the frame of video; " ED " represents Euclidean distance (Euclidean distance) method, and " DM " represents disperse tolerance (Diffusion Metric) method.
More than by concrete form of implementation, the invention has been described, need be appreciated that above description is illustrative instead of restrictive.Such as, the feature in foregoing can be combined with each other, only otherwise exceed scope of the present invention.In addition, when not deviating from spirit of the present invention, can modify to adapt to concrete condition to form of implementation.Although each form of implementation of element-specific as herein described and procedure definition, they are restrictive anything but, and just exemplary role.By reading above description, other embodiments many it will be apparent to those of skill in the art.Therefore, the complete equivalent scope that scope of the present invention should contain together with this kind of claim with reference to claim is determined jointly.

Claims (4)

1., for a multi views metric learning method for multi-view video summary, it comprises the following steps:
(1) videograph is decomposed the set of framing, be expressed as X (1)..., X (K), wherein , be the d of n the frame representing kth view kdimensional characteristics, R represents real number, d krepresent the dimension of a kth view spaces, n represents frame number;
(2) according to being in X (1)..., X (K)in information, the metric space X that study is unitized, wherein , d represents the dimension mapping rear space;
(3) perform the cluster on X, the center using cluster representatively, is expressed as , F represents the set of summary, i 1i crepresent the ID of a frame;
(4) to each f icfrom K frame, select the frame of and its correspondence, and export these frames as last summary;
Wherein, in the step performing study, unified coordinates matrix is found , minimize it
Wherein, R emp(X), R struct(X), R diff(X) be the empirical loss of X, structural penalties and inconsistent loss respectively; γ 1, γ 2it is the parameter balanced between control objectives; And
Empirical loss R emp(X) be 0;
Structural penalties R struct(X) be , wherein G xthe similarity transformation of tolerance X, normalization Laplace operator, λ ibe eigenvalue, c represents the number of predefined class;
Inconsistent loss R diff(X) be
, wherein tr is mark.
2. multi views metric learning method according to claim 1, is characterized in that, minimized R (X) is
3. multi views metric learning method according to claim 2, is characterized in that, minimized R (X) is optimized for
, wherein
, and .
4. multi views metric learning method according to claim 3, is characterized in that, minimized R (X) is optimized for further
, wherein
, μ represents weight factor, and μ k represents weight factor.
CN201510042581.XA 2015-01-28 2015-01-28 Multiple view metric learning method Active CN104537124B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510042581.XA CN104537124B (en) 2015-01-28 2015-01-28 Multiple view metric learning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510042581.XA CN104537124B (en) 2015-01-28 2015-01-28 Multiple view metric learning method

Publications (2)

Publication Number Publication Date
CN104537124A true CN104537124A (en) 2015-04-22
CN104537124B CN104537124B (en) 2018-08-07

Family

ID=52852652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510042581.XA Active CN104537124B (en) 2015-01-28 2015-01-28 Multiple view metric learning method

Country Status (1)

Country Link
CN (1) CN104537124B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292341A (en) * 2017-06-20 2017-10-24 西安电子科技大学 Adaptive multi views clustering method based on paired collaboration regularization and NMF
CN107563403A (en) * 2017-07-17 2018-01-09 西南交通大学 A kind of recognition methods of bullet train operating condition
CN107886109A (en) * 2017-10-13 2018-04-06 天津大学 It is a kind of based on have supervision Video segmentation video summarization method
CN110472484A (en) * 2019-07-02 2019-11-19 山东师范大学 Video key frame extracting method, system and equipment based on multiple view feature

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011123644A (en) * 2009-12-10 2011-06-23 Nec Corp Data processing apparatus, data processing method and data processing program
CN102938070A (en) * 2012-09-11 2013-02-20 广西工学院 Behavior recognition method based on action subspace and weight behavior recognition model
US20130238664A1 (en) * 2012-03-08 2013-09-12 eBizprise Inc. Large-scale data processing system, method, and non-transitory tangible machine-readable medium thereof
CN103577841A (en) * 2013-11-11 2014-02-12 浙江大学 Human body behavior identification method adopting non-supervision multiple-view feature selection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011123644A (en) * 2009-12-10 2011-06-23 Nec Corp Data processing apparatus, data processing method and data processing program
US20130238664A1 (en) * 2012-03-08 2013-09-12 eBizprise Inc. Large-scale data processing system, method, and non-transitory tangible machine-readable medium thereof
CN102938070A (en) * 2012-09-11 2013-02-20 广西工学院 Behavior recognition method based on action subspace and weight behavior recognition model
CN103577841A (en) * 2013-11-11 2014-02-12 浙江大学 Human body behavior identification method adopting non-supervision multiple-view feature selection

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292341A (en) * 2017-06-20 2017-10-24 西安电子科技大学 Adaptive multi views clustering method based on paired collaboration regularization and NMF
CN107292341B (en) * 2017-06-20 2019-12-10 西安电子科技大学 self-adaptive multi-view clustering method based on pair-wise collaborative regularization and NMF
CN107563403A (en) * 2017-07-17 2018-01-09 西南交通大学 A kind of recognition methods of bullet train operating condition
CN107886109A (en) * 2017-10-13 2018-04-06 天津大学 It is a kind of based on have supervision Video segmentation video summarization method
CN107886109B (en) * 2017-10-13 2021-06-25 天津大学 Video abstraction method based on supervised video segmentation
CN110472484A (en) * 2019-07-02 2019-11-19 山东师范大学 Video key frame extracting method, system and equipment based on multiple view feature
CN110472484B (en) * 2019-07-02 2021-11-09 山东师范大学 Method, system and equipment for extracting video key frame based on multi-view characteristics

Also Published As

Publication number Publication date
CN104537124B (en) 2018-08-07

Similar Documents

Publication Publication Date Title
Xie et al. Hyper-Laplacian regularized multilinear multiview self-representations for clustering and semisupervised learning
Brachmann et al. Dsac-differentiable ransac for camera localization
Liu et al. $ p $-Laplacian regularization for scene recognition
Kapoor et al. Active learning with gaussian processes for object categorization
Cao et al. Landmark recognition with sparse representation classification and extreme learning machine
US20160140425A1 (en) Method and apparatus for image classification with joint feature adaptation and classifier learning
Zamir et al. Image geo-localization based on multiplenearest neighbor feature matching usinggeneralized graphs
US10885379B2 (en) Multi-view image clustering techniques using binary compression
CN108229347B (en) Method and apparatus for deep replacement of quasi-Gibbs structure sampling for human recognition
Liu et al. Learning semantic features for action recognition via diffusion maps
CN111627065A (en) Visual positioning method and device and storage medium
Ye et al. Temporal order-preserving dynamic quantization for human action recognition from multimodal sensor streams
CN104537124A (en) Multi-view metric learning method
Werner et al. DeepMoVIPS: Visual indoor positioning using transfer learning
Korrapati et al. Multi-resolution map building and loop closure with omnidirectional images
Panda et al. Embedded sparse coding for summarizing multi-view videos
Zhang et al. l2, 1 norm regularized fisher criterion for optimal feature selection
CN110598740A (en) Spectrum embedding multi-view clustering method based on diversity and consistency learning
Lin et al. Image set-based face recognition using pose estimation with facial landmarks
Yu et al. Learning bipartite graph matching for robust visual localization
Palle et al. Automated image and video object detection based on hybrid heuristic-based U-net segmentation and faster region-convolutional neural network-enabled learning
Rahimi et al. Uav sensor fusion with latent-dynamic conditional random fields in coronal plane estimation
Zhu et al. Correspondence-free dictionary learning for cross-view action recognition
Fan et al. Hcpvf: Hierarchical cascaded point-voxel fusion for 3d object detection
Li et al. Few-shot meta-learning on point cloud for semantic segmentation

Legal Events

Date Code Title Description
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Multi view metric learning method

Effective date of registration: 20230721

Granted publication date: 20180807

Pledgee: Societe Generale Bank Co.,Ltd. Suzhou Branch

Pledgor: Suzhou Dewo Smart System Co.,Ltd.

Registration number: Y2023980049259

PE01 Entry into force of the registration of the contract for pledge of patent right