CN101021849A - Transmedia searching method based on content correlation - Google Patents
Transmedia searching method based on content correlation Download PDFInfo
- Publication number
- CN101021849A CN101021849A CN 200610053390 CN200610053390A CN101021849A CN 101021849 A CN101021849 A CN 101021849A CN 200610053390 CN200610053390 CN 200610053390 CN 200610053390 A CN200610053390 A CN 200610053390A CN 101021849 A CN101021849 A CN 101021849A
- Authority
- CN
- China
- Prior art keywords
- subspace
- image
- vector
- data
- isomorphism
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 77
- 239000013598 vector Substances 0.000 claims abstract description 62
- 230000000007 visual effect Effects 0.000 claims abstract description 28
- 238000013507 mapping Methods 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims description 27
- 239000011159 matrix material Substances 0.000 claims description 14
- 238000010219 correlation analysis Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 11
- 238000013480 data collection Methods 0.000 claims description 9
- 230000008713 feedback mechanism Effects 0.000 claims description 9
- 230000003993 interaction Effects 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 7
- 230000001174 ascending effect Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 5
- 230000009977 dual effect Effects 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 3
- 241000555745 Sciuridae Species 0.000 description 8
- 238000011160 research Methods 0.000 description 8
- 239000004744 fabric Substances 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 210000004556 brain Anatomy 0.000 description 2
- 230000019771 cognition Effects 0.000 description 2
- 230000001149 cognitive effect Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000004907 flux Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 241000282693 Cercopithecidae Species 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 241000282376 Panthera tigris Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000013011 mating Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
- 230000000638 stimulation Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This invention discloses a method for media-crossing searches based on content relativity, which applies the typical relativity analysis to analyze the content characters of different mode media data, maps a visual sense character vector of image data and an auditory character vector of audio data in a low dimension isomorphic sub-space simultaneously by a sub-space mapping algorithm, measures the relativities among different mode data based on a general distance function and modifies the topological structure of a multi-mode data set in the sub-space to increase the cross media search efficiency effectively.
Description
Technical field
The present invention relates to multimedia retrieval, what relate in particular to a kind of content-based correlativity strides the medium search method.
Background technology
Content-based multimedia retrieval is the research focus of computer vision and information retrieval field, carries out the similarity coupling according to vision, the sense of hearing or several how low-level image feature and realizes retrieval.As far back as 1976, Mai Geke just disclosed human brain to external world the cognition of information need cross over and comprehensive different sensory information, with the understanding of formation globality.The research of cognitive neuropsychology aspect has in the recent period verified further that also the human brain cognitive process presents the characteristic of striding medium, produces cognitive result from the information stimulation mutually of different sense organs such as vision, the sense of hearing, acting in conjunction.Therefore, press at present research a kind of support different modalities stride the medium search method, break through the restriction that the content-based multimedia retrieval of tradition only acts on the single mode data.
The content-based medium retrieval technique of striding is meant by the low-level image feature to multimedia object and analyzes, be implemented in the leap from a kind of mode to another kind of mode in the retrieving, it is the inquiry example that the user submits a kind of mode to, system returns the multimedia object of other different modalities similarly, has broken through the restriction to single mode of the image retrieval, audio retrieval, three-dimensional picture retrieval etc. of single mode.Stride new research field in multimedia analysis that medium retrievals is based on content and the retrieval, also ripe without comparison in the world at present medium searching algorithm and the technology of striding.
The initial stage nineties, people proposed the CBIR technology, extracted the visual signature of bottom from image, such as the index of bottom visual signatures such as color, texture, shape as image.This technology also was applied to video frequency searching and audio retrieval afterwards, and wherein also different at the low-level image feature that different media content adopted, video frequency searching may be used the motion vector feature, and audio retrieval is used time domain, frequency domain, compression domain feature etc.It is the prototype system of representative that content-based multimedia retrieval method has with QBIC, VideoQ etc. in early days, but owing to lack the support of high-level semantic, can not satisfy user's requirement on accuracy rate and efficient; Methods such as example study afterwards, convergence analysis and manifold learning are used to realize semantic information of multimedia understanding, to fill up the wide gap between low-level image feature and the high-level semantic; Then in order to overcome the deficiency of training sample, relevant feedback mechanism often is used, with perception priori in conjunction with the user, for example: utilize feedback information to revise query vector and make its distribution center to the coordinate indexing object move, adjust the weights etc. of each component in the distance metric formula, some machine learning methods also combine with related feedback method recently.Semantic wide gap has been dwindled in the use of these methods to a certain extent, has improved the performance of single mode retrieval.
Yet, the multimedia database that comprises single mode all can only be retrieved by existing multimedia retrieval system, though or can handle multi-modal media data, do not support to stride the retrieval of medium, promptly retrieve the multimedia object of other mode according to a kind of multimedia object of mode.Because not only intrinsic dimensionality is different between the aural signature of visual feature of image and audio frequency, and expresses different attributes, can't directly measure similarity, this isomerism and noncomparabilities are present between the multi-medium data of other mode equally.Therefore, above-mentioned single mode search method all can not be directly used in strides medium retrievals, because different with the single mode retrieval, the research object of striding the medium retrieval is different modalities, the low-level image feature space of isomery each other.
Some researchers have successively proposed similar research of striding medium thought, for example carry out the index and the retrieval of video database by excavating multi-modal feature, the text that transcribed text that news-video comprised and internet page are comprised is analyzed, realized object video and the similar coupling of internet page on text feature.But these researchs are at low-level image features different in the modality-specific media object, for example: the transcribed text that comprises in the video clips, color, texture etc., can not realize the flexible leap between the different modalities media data.
Canonical correlation analysis (Canonical Correlation Analysis) is a kind of statistical analysis technique, is applied to the data analysis of aspects such as economy, medical science, meteorology the earliest.But aspect multi-medium data analysis and retrieval, canonical correlation analysis but seldom is used, because this statistical analysis technique is to analyze the correlation information that exists between two kinds of different variablees fields, and traditional single mode retrieval technique research is a kind of single feature space of mode.
Summary of the invention
The present invention overcomes above-mentioned existing method in the restriction of retrieval on the mode, and what a kind of content-based correlativity was provided strides the medium search method.
The medium search method of striding of content-based correlativity may further comprise the steps:
(1) gathers the object of different modalities from multimedia database: image and voice data;
(2) visual signature of extraction view data, and the aural signature of voice data, vision that the extraction of employing canonical correlation analysis obtains and the canonical correlation between the aural signature;
(3) adopt isomorphism subspace mapping algorithm, the visual feature vector of view data and the aural signature vector of voice data are mapped in the isomorphism subspace of a low-dimensional simultaneously, realize the unified expression of different modalities media data;
(4) adopt polar mode to define general distance function, the correlativity size between tolerance different modalities media data, and stride the medium retrieval on this basis;
(5), be used for extracting the priori of user interactions, to revise the topological structure of multi-medium data collection in the isomorphism subspace based on the relevant feedback mechanism of incremental learning;
(6),, other media object beyond the training set are accurately navigated in the isomorphism subspace perhaps by relevant feedback mechanism according to the base vector of asking in the mapping process of subspace.
The visual signature of described extraction view data, and the aural signature of voice data, vision that the extraction of employing canonical correlation analysis obtains and the canonical correlation between the aural signature: the level image visual signature constitutes the characteristics of image vector of p dimension, the bottom aural signature of audio frequency constitutes q dimension audio feature vector, adopts canonical correlation analysis to learn visual feature of image X simultaneously
(n * p)Aural signature Y with audio frequency
(n * q), the eigenmatrix X of isomery
(n * p)And Y
(n * q)Between related coefficient be calculated as follows:
Wherein A and B are linear transformation, by formula 2 turn to relevant between less union variable L and M having relevant between the eigenmatrix X of more a plurality of variablees and the Y, the numeric distribution of A and B is determined the space correlation distribution form of X and Y, the numerical values recited of A and B determine to the significance level of dependent variable.
Adopt isomorphism subspace mapping algorithm, the visual feature vector of view data and the aural signature vector of voice data are mapped in the isomorphism subspace of a low-dimensional simultaneously, realize the unified expression of different modalities media data: isomorphism subspace mapping algorithm is on the basis of canonical correlation analysis, study obtains the low n-dimensional subspace n of an optimum, has farthest kept original feature vector X
(n * p)And Y
(n * q)Between correlativity, algorithm steps is as follows:
Input: image characteristic matrix X
(n * p), audio frequency characteristics matrix Y
(n * q)
Output: all images data and the voice data vector representation L in low n-dimensional subspace n
(n * m)And M
(n * m)
Step 1:, view data all in the database and voice data are divided into different semantic classess with the average cluster of K by the mode of semi-supervised learning;
Step 2: under the constraint of formula 3, make related coefficient ρ=r (L, M) optimization,
v(L)=L
TL=A
TX
TXA=1;v(M)=M
TM=B
TY
TYB=1 3
Adopt method of Lagrange multipliers to obtain the equation C that form is Ax=λ Bx
XyC
Yy -1C
YxA=λ
2C
XxA, the characteristic root of asking for this equation promptly obtains separating of matrix A and B;
Step 3: linear method structure isomorphism subspace promptly becomes m dimension coordinate L with B with characteristics of image vector sum audio frequency characteristics DUAL PROBLEMS OF VECTOR MAPPING with base vector A respectively
(n * m)And M
(n * m)
Adopt polar mode to define general distance function, the correlativity size between tolerance different modalities media data, and stride the medium retrieval on this basis: image and voice data in the m n-dimensional subspace n with polar formal definition proper vector x
i'=(x
I1' ..., x
Ik' ..., x
Im'), (x
Ik'=a+bi, (a, b ∈ R)), between image and the image, between audio frequency and the audio frequency and the similarity between image and the voice data be calculated as follows with general distance function:
The user provides inquiry example image by man-machine interface in the retrieving, if this example is in tranining database, then find the m dimension coordinate of inquiry example in the subspace according to the subspace mapping result, with the distance between general distance function calculating and other audio frequency and view data, k image and k the audio frequency nearest with the query image example return to the user as Query Result; Equally, if the inquiry example is a section audio, then retrieve similar audio frequency and image object according to above-mentioned steps.
Relevant feedback mechanism based on incremental learning, be used for extracting the priori of user interactions, to revise the topological structure of multi-medium data collection in the isomorphism subspace: system can commonly use the perception priori that the family provides in relevant feedback process middle school, if Ω presentation video training set, A represents the audio frequency training set, definition " modifying factor " γ
(i, j)=Pos (a
i, b
j) (a
i∈ Ω, b
j∈ A), be used to revise similarity between the different modalities media object: Crodis
(i, j)=CCAdis
(i, j)+ γ
(i, j), modifying factor is initialized as zero;
When the user submits image querying example R to, use CCAdis (i, j) the k neighbour image collection C of calculating R in the subspace
1, (i j) calculates the k neighbour audio set C of R in the subspace to use Crodis
2, the return results of striding the medium retrieval is C
1And C
2
In user interaction process, the user marks positive example P and negative routine N, by relevant feedback in Query Result
Pi∈ P, order
And find p according to CCAdis
iIn audio database A-neighbour T={t
1..., t
j..., t
k), arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set T successively:
Ni∈ N, order
And find n according to CCAdis
iK-neighbour H={H in audio database A
1..., h
j..., h
k, arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set H successively:
Equally, when the user submit to be audio object the time, making uses the same method upgrades modifying factor γ
(i, j), the retrieving of next round is arranged the result who returns according to new similarity.
According to the base vector of asking in the mapping process of subspace, perhaps by relevant feedback mechanism, other media object beyond the training set are accurately navigated in the isomorphism subspace: when the inquiry example of user's submission does not belong to training dataset, the use characteristic extraction procedure extracts example visual feature of image vector V, divides following two kinds of situations to carry out the mapping of new media object to the isomorphism subspace:
(1) if the semantic information of known new media object representation, the subspace base vector that described training obtains according to claim 3 then, method with linear transformation is mapped to the isomorphism subspace that m ties up with vectorial V, with other multimedia object computer general distances in the training set;
(2) if content-based single mode retrieval is adopted in semantic the unknown of new media object representation, return the image similar, user's mark feedback positive example Z={z to inquiring about example
1..., a
j, stride the medium searching system and calculate coordinate Pos (V)=Pos (z of new media object in m dimension isomorphism subspace with weighted average method
1) β
1+ ...+Pos (z
j) β
j, (β
1+ ...+β
j=1).
Beneficial effect of the present invention:
1) this method has broken through content-based multimedia retrieval only at the restriction of single mode, proposes a kind of completely newly stride the medium search method.This method is analyzed the content characteristic of two kinds of different modalities simultaneously, excavates the canonical correlation on statistical significance between the feature;
2) the subspace mapping method has not only solved the isomerism problem between different modalities, and farthest in the subspace, kept correlation information between the multi-modal feature, this correlation information is actually a kind of semantic association information, so this method has merged semanteme when realizing the feature dimensionality reduction;
3) media object of different modalities can be with the vector representation of isomorphism, and the similarity under polar coordinate system between the compute vector is between the promptly identical mode and the distance between the different modalities.
Description of drawings
Fig. 1 is based on the system framework figure that strides the medium search method of content relevance;
Fig. 2 (a) is the multi-medium data collection distribution schematic diagram before relevant feedback in the isomorphism of the present invention subspace;
Fig. 2 (b) is the multi-medium data collection distribution schematic diagram after relevant feedback in the isomorphism of the present invention subspace;
Fig. 3 (a) is that the present invention serves as that the retrieval example adopts the isomorphism subspace method to obtain result for retrieval with " automobile " image;
Fig. 3 (b) is that the present invention serves as the result for retrieval that the retrieval example directly adopts content characteristic to obtain with " automobile " image;
Fig. 4 (a) is that the present invention serves as the result for retrieval that the retrieval example adopts the isomorphism subspace method to obtain with " war " image;
Fig. 4 (b) is that the present invention serves as the result for retrieval that the retrieval example directly adopts content characteristic to obtain with " war " image.
Embodiment
The bottom content characteristic of different modalities media object, as the aural signature (temporal signatures, frequency domain character, time-frequency characteristics etc.) of visual feature of image (color, texture, shape etc.) with audio frequency, intrinsic dimensionality isomery not only, and express different attributes, can't directly measure similarity.The present invention can analyze the visual signature and the aural signature of isomery simultaneously, and be foundation with the canonical correlation between the feature, carry out the subspace mapping, solved the isomerism and the noncomparabilities problem of striding in the medium retrieval, and the subspace mapping process has farthest kept the correlation information between the initial characteristics.The technical scheme and the step of striding the concrete enforcement of medium search method of content-based correlativity of the present invention are as follows:
1. training data choosing and marking
Canonical correlation inquiry learning between visual signature and the aural signature is to be based upon on the basis of semantic relation, with the method for statistical study, excavates connecting each other on the semantic hierarchies from low-level image feature.Choosing of training data need have view data and voice data to express similar semanteme simultaneously.For example,, choose the picture of expression " dog " resemblance, and the audio-frequency fragments of expression " dog " cry is as training data for " dog " this semantic classes.
In known semantic classes number, under the semantic tagger condition of unknown of view data and voice data, adopt the study of semi-supervised formula, images all in the database and voice data are marked in conjunction with the method for the average cluster of K, and cluster is to different semantic classess, and concrete steps are as follows:
Input: not Biao Zhu image data set Ω and audio data set Γ, semantic classes number Z;
Output: the semantic classes numbering under each view data and each voice data;
Step 1: for semantic classes Z
i, 5 image examples A of random labelling
i, calculate A
iCluster barycenter ICtr
i
Step 2: with ICtr
iBe the initial input of the average clustering algorithm of K, Ω carries out cluster to the whole image data collection, is endowed identical semantic classes numbering in the image examples of identical cluster areas;
Step 3: also adopt step 1 and step 2 to carry out the mark of training data to voice data Γ.
2. the extraction of vision and aural signature
For the view data in each semantic classes, extract the bottom visual signature, comprising: hsv color histogram, color convergence vector CCV and Tamura direction degree are the characteristics of image vector x of every width of cloth image configuration p dimension
p, the image data set composing images eigenmatrix X in the whole semantic classes
(n * p)For the voice data in each semantic classes, extract the bottom aural signature, comprise: barycenter (Centroid), decay are the audio feature vector y of each section audio example structure q dimension by these four Mpeg compression domain features of frequency (Rolloff), frequency spectrum flow (Spectral Flux) and root mean square (RMS)
q, the audio data set in the whole semantic classes constitutes audio frequency characteristics matrix Y
(n * q)If the duration difference of voice data, the dimension of the audio frequency characteristics vector of extraction are also different, the present invention uses fuzzy clustering method, extracts the cluster barycenter of similar number as audio index in the original audio feature.
3. hold the isomorphism subspace mapping of multi-semantic meaning different modalities media data
On the basis of canonical correlation analysis, study obtains the low n-dimensional subspace n of an optimum, has farthest kept original feature vector X
(n * p)And Y
(n * q)Between correlativity, algorithm steps is as follows:
Input: image characteristic matrix X
(n * p), audio frequency characteristics matrix Y
(n * q)
Output: all images data and the voice data vector representation L in low n-dimensional subspace n
(n * m)And M
(n * m)
Step 1:, view data all in the database and voice data are divided into different semantic classess with the average cluster of K-by the mode of semi-supervised learning;
Step 2: at v (L)=L
TL=A
TX
TXA=1; V (M)=M
TM=B
TY
TUnder the constraint of YB=1, (L, M) optimization adopt method of Lagrange multipliers to obtain the equation C that form is Ax=λ Bx to make related coefficient ρ=r
XyC
Yy -1C
YxA=λ
2C
XxA, the characteristic root of asking for this equation promptly obtains separating of matrix A and B;
Step 3: linear method structure isomorphism subspace promptly becomes m dimension coordinate L with B with characteristics of image vector sum audio frequency characteristics DUAL PROBLEMS OF VECTOR MAPPING with base vector A respectively
(n * m)And M
(n * m)
4. adopt general distance function to calculate similarity
After the proper vector of all images and voice data converts the m dimensional vector that hangs down in the n-dimensional subspace n to, a large amount of plural numbers appear, in order to calculate the similarity between various mode media datas, the proper vector behind the employing polar form expression dimensionality reduction: x
i'=(x
I1' ..., x
Ik' ..., x
Im'), (x
Ik'=a+bi, (a, b ∈ R)).Therefore, between image and the image, between audio frequency and the audio frequency and the similarity between image and the voice data be calculated as follows with general distance function:
The user provides inquiry example image by man-machine interface in the retrieving, if this example is in tranining database, then find the m dimension coordinate of inquiry example in the subspace according to the subspace mapping result, with the distance between general distance function calculating and other audio frequency and view data, k image and k the audio frequency nearest with the query image example return to the user as Query Result; Equally, if the inquiry example is a section audio, then retrieve similar audio frequency and image object according to above-mentioned steps.
The present invention supports the retrieval of single mode and strides the retrieval of medium, promptly the user submit a kind of mode to media object as inquiry, in result for retrieval, can comprise the media object of other mode, and can cause new inquiry based on another kind of mode object.
5. relevant feedback
By content-based method, the canonical correlation between study visual signature and the aural signature, thus at utmost keeping realizing the subspace mapping under the constant situation of correlativity, solve feature isomerism problem.But because the wide gap between bottom content and the high-level semantic makes learning outcome and true semanteme there are differences.By user's relevant feedback, mark positive example and negative example in returning Query Result mark middle school idiom justice information from the user, and revise the topological structure of multi-medium data collection in the subspace that study obtains.
If Ω presentation video training set, A represents the audio frequency training set, definition " modifying factor " γ
(i, j)=Pos (a
i, b
j) (a
i∈ Ω, b
j∈ A), be used to revise similarity between the different modalities media object: Crodis
(i, j)=CCAdis
(i, j)+ γ
(i, j), modifying factor is initialized as zero; When the user submits image querying example R to, use CCAdis (i, j) the k neighbour image collection C of calculating R in the subspace
1, use Crodis (i, j)) to calculate the k neighbour audio set C of R in the subspace
2, the return results of striding the medium retrieval is C
1And C
2In user interaction process, the user marks positive example P and negative routine N, p by relevant feedback in Query Result
i∈ P, order
And find p according to CCAdis
iK-neighbour T={t in audio database A
1..., t
j..., t
k, arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set T successively:
n
i∈ N, order
And find n according to CCddis
iK-neighbour H={h in audio database A
1..., h
j..., h
k, arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set H successively:
Equally, when the user submit to be audio object the time, making uses the same method upgrades modifying factor γ
(i, j)The retrieving of next round is arranged the result who returns according to new similarity.
6. the location of new media object
The single multimedia object that the user submits to is defined as the new media object.If the new media object is not in tranining database, also can pass through the subspace base vector, directly navigate in the subspace that training obtains with the method for linearity, perhaps mutual by simple user, accurately navigate in the subspace, remain in the subspace similar semantically simultaneously to multimedia object on every side.At first the use characteristic extraction procedure extracts example visual feature of image vector V, divides following two kinds of situations to carry out the mapping of new media object to the isomorphism subspace:
On the one hand, if the semantic information of known new media object representation, the subspace base vector that obtains according to training then is mapped to the isomorphism subspace of m dimension with the method for linear transformation with vectorial V, with other multimedia object computer general distances in the training set.
On the other hand,, adopt content-based single mode retrieval, return the image similar, user's mark feedback positive example Z={Z to inquiring about example if the new media object representation is semantic unknown
1... z
j, stride the medium searching system and calculate coordinate Pos (V)=Pos (z of new media object in m dimension isomorphism subspace with weighted average method
1) β
1+ ...+Pos (z
j) β
j, (β
1+ ...+β
j=1).
As shown in Figure 2, provided the example of some training datasets topological structures in low-dimensional isomorphism subspace.Describe the concrete steps that this example is implemented in detail below in conjunction with method of the present invention, as follows:
(1) view data and the voice data of 7 semantemes of collection (birds, dog, automobile, war, tiger, squirrel, monkey) are as training dataset;
(2) adopt feature extraction program to extract hsv color histogram, color convergence vector CCV and the Tamura direction degree feature of image, be the visual signature vector of every width of cloth image configuration 500 dimensions, be respectively the visual signature matrix of 7 semantic classes structure 70 * 500 dimensions;
(3) adopt feature extraction program to extract the barycenter (Centroid) of audio frequency, decay by these four Mpeg compression domain features of frequency (Rolloff), frequency spectrum flow (Spectral Flux) and root mean square (RMS);
(4) the duration difference of audio example, the proper vector length that extracts is also different, adopt fuzzy clustering method, the audio frequency characteristics vector unified specification of different dimensions is changed into the vector of 40 dimensions, as the index of every section audio example, be respectively the aural signature matrix of 7 semantic classes structure 70 * 40 dimensions;
(5) under the Matlab7.0 environment, use the canonical correlation analysis function, learn the pairing vision of training data of 7 semantic classess and the correlativity between the aural signature matrix respectively.And carrying out subspace mapping with linear method, the eigenmatrix with 70 * 500 and 70 * 40 is transformed into 70 * 40 and 70 * 40 new feature matrix respectively;
(6) basis
Calculate the distance between the 40 characteristics of image vector sum audio frequency characteristics vectors of tieing up in the subspace, return and inquire about example nearest 20 width of cloth images and 20 section audios:
(7) in striding the medium retrieving, the user can be undertaken alternately by man-machine interface, mark striding the medium result for retrieval, system learns feedback positive example and the negative example of feedback that the user submits to automatically, the semantic information of extracting is used for revising the topological structure of multi-medium data collection in the isomorphism subspace, promptly uses respectively
With
Revise around the positive example and the topological structure of multimedia object around the negative example.
Fig. 2 is an example with squirrel, birds and automobile, has shown in the isomorphism subspace that dimensionality reduction mapping obtains, and uses the theoretical distribution of the data of media object collection that CCAdis measures out, and through relevant feedback repair apart from after, the corresponding distribution situation that adopts Crodis to measure out.In Fig. 2 (a), and the image data set of CCAdis minimum is the image of birds between the squirrel audio data set, through relevant feedback, Crodis distance between squirrel audio frequency and the squirrel image " has furthered ", " pushed away " the Crodis distance between squirrel audio frequency and the birds image far away, and the topological relation of the topological relation of squirrel image inside and squirrel audio frequency inside remains unchanged substantially, shown in Fig. 2 (b).
Can see,, can learn the correlativity between image and voice data preferably, solve the isomerism problem between the different modalities media data, effectively realize striding the distance metric of medium by method of the present invention; And by relevant feedback, learnt the semantic information in the user interaction process, the distribution of multimedia number pick collection in the subspace meets the relation between the high-level semantic more.
As shown in Figure 4, provided one " war " promptly semantic retrieval example.Describe the concrete steps that this example is implemented in detail below in conjunction with method of the present invention, as follows:
(1) input be the semantic colour picture of a width of cloth " war " as the inquiry example, system finds the vector representation in the isomorphism subspace of this width of cloth picture correspondence;
(2) the subspace vector that adopts existing conversion method of data format will inquire about the example correspondence shows with polar mode;
(3) calculate the distance between other images and audio frequency in this inquiry example and database with general distance function, return preceding 10 nearest images and preceding 10 nearest audio example;
(4) directly use the bottom content characteristic of inquiring about example in addition, do not shine upon and do not carry out the subspace, mate with the content characteristic of other images in the database, promptly use content-based single mode search method, return preceding 10 images the most similar, the result for retrieval that obtains with the method for describing among the present invention compares.
The operation result of this example shows in accompanying drawing 4, wherein inquiring about example is the semantic colour blast picture of a reflection " war ", method with the present invention's description, shown in figure (a), (b) in contrast directly uses the bottom visual signature to mate the similar image that returns in the result of mating in the isomorphism subspace and returning.Even use coloured image, also can in preceding 10 result for retrieval, return and retrieve example and express common semantic black and white picture as the retrieval example.
Can see that method of the present invention can be understood the common semanteme of coloured image and black white image well, realize the mutual retrieval of black white image and coloured image, efficiently solve the accurate tolerance of multi-medium data on similarity that differs greatly on the content characteristic; And adopt content-based single mode search method, can only return and inquire about example similar picture on visual signature.
Claims (6)
- A content-based correlativity stride the medium search method, it is characterized in that may further comprise the steps:(1) gathers the object of different modalities from multimedia database, i.e. image and voice data;(2) visual signature of extraction view data, and the aural signature of voice data, vision that the extraction of employing canonical correlation analysis obtains and the canonical correlation between the aural signature;(3) adopt isomorphism subspace mapping algorithm, the visual feature vector of view data and the aural signature vector of voice data are mapped in the isomorphism subspace of a low-dimensional simultaneously, realize the unified expression of different modalities media data;(4) adopt polar mode to define general distance function, the correlativity size between tolerance different modalities media data, and stride the medium retrieval on this basis;(5), be used for extracting the priori of user interactions, to revise the topological structure of multi-medium data collection in the isomorphism subspace based on the relevant feedback mechanism of incremental learning;(6),, other media object beyond the training set are accurately navigated in the isomorphism subspace perhaps by relevant feedback mechanism according to the base vector of asking in the mapping process of subspace.
- 2, content-based correlativity according to claim 1 strides the medium search method, it is characterized in that, the visual signature of described extraction view data, and the aural signature of voice data, vision that the extraction of employing canonical correlation analysis obtains and the canonical correlation between the aural signature: the level image visual signature constitutes the characteristics of image vector of p dimension, the bottom aural signature of audio frequency constitutes q dimension audio feature vector, adopts canonical correlation analysis to learn visual feature of image X simultaneously (n * p)Aural signature Y with audio frequency (n * q), the eigenmatrix X of isomery (n * p)And Y (n * q)Between related coefficient be calculated as follows:Wherein A and B are linear transformation, by formula 2 turn to relevant between less union variable L and M having relevant between the eigenmatrix X of more a plurality of variablees and the Y, the numeric distribution of A and B is determined the space correlation distribution form of X and Y, the numerical values recited of A and B determine to the significance level of dependent variable.
- 3, content-based correlativity according to claim 1 strides the medium search method, it is characterized in that, described employing isomorphism subspace mapping algorithm, the visual feature vector of view data and the aural signature vector of voice data are mapped in the isomorphism subspace of a low-dimensional simultaneously, realize the unified expression of different modalities media data: isomorphism subspace mapping algorithm is on the basis of canonical correlation analysis, study obtains the low n-dimensional subspace n of an optimum, has farthest kept original feature vector X (n * p)And Y (n * q)Between correlativity, algorithm steps is as follows:Input: image characteristic matrix X (n * p), audio frequency characteristics matrix Y (n * q)Output: all images data and the voice data vector representation L in low n-dimensional subspace n (n * m)And M (n * m)Step 1:, view data all in the database and voice data are divided into different semantic classess with the average cluster of K by the mode of semi-supervised learning;Step 2: under the constraint of formula 3, make related coefficient ρ=r (L, M) optimization,v(L)=L TL=A TX TXA=1;v(M)=M TM=B TY TYB=1 3Adopt method of Lagrange multipliers to obtain the equation C that form is Ax=λ Bx XyC Yy -1C YxA=λ 2C XxA, the characteristic root of asking for this equation promptly obtains separating of matrix A and B;Step 3: linear method structure isomorphism subspace promptly becomes m dimension coordinate L with B with characteristics of image vector sum audio frequency characteristics DUAL PROBLEMS OF VECTOR MAPPING with base vector A respectively (n * m)And M (n * m)
- 4, this content-based correlativity according to claim 1 strides the medium search method, it is characterized in that, the polar mode of described employing defines general distance function, correlativity size between tolerance different modalities media data, and stride the medium retrieval on this basis: image and voice data in the m n-dimensional subspace n with polar formal definition proper vector x i'=(x Il' ..., x Ik' ..., x Im'), (x Ik'=a+bi, (a, b ∈ R)), between image and the image, between audio frequency and the audio frequency and the similarity between image and the voice data be calculated as follows with general distance function:The user provides inquiry example image by man-machine interface in the retrieving, if this example is in tranining database, then find the m dimension coordinate of inquiry example in the subspace according to the subspace mapping result, with the distance between general distance function calculating and other audio frequency and view data, k image and k the audio frequency nearest with the query image example return to the user as Query Result; Equally, if the inquiry example is a section audio, then retrieve similar audio frequency and image object according to above-mentioned steps.
- 5, this content-based correlativity according to claim 1 strides the medium search method, it is characterized in that, described relevant feedback mechanism based on incremental learning, be used for extracting the priori of user interactions, to revise the topological structure of multi-medium data collection in the isomorphism subspace: system can commonly use the perception priori that the family provides in relevant feedback process middle school, if Ω presentation video training set, A represents the audio frequency training set, definition " modifying factor " γ (i, j)=Pos (a i, b j) (a i∈ Ω, b j∈ A), be used to revise similarity between the different modalities media object: Crodis (i, j)=CCAdis (i, j)+ γ (i, j), modifying factor is initialized as zero;When the user submits image querying example R to, use CCAdis (i, j)Calculate the k neighbour image collection C of R in the subspace 1, use Crodis (i, j)Calculate the k neighbour audio set C of R in the subspace 2, the return results of striding the medium retrieval is C 1And C 2In user interaction process, the user marks positive example P and negative routine N, by relevant feedback in Query Result Pi∈ P makes γ (R, p iThe τ of)=-, (τ>0), and find p according to CCAdis iK-neighbour T={t in audio database A 1..., t j..., t k, arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set T successively: γ (R, t jτ+the j of)=-* d 1, (d 1=τ/k); Ni∈ N, make γ (R, ni)=τ, (τ>0), and find the k-neighbour H={h of ni in audio database A according to CCAdis 1... h j... h k, arrange by the ascending order of distance, then in the mode of equal difference, revise the γ value of each element among the set H successively:γ(R,h j)=τ-j×d 2,(d 2=τ/k);Equally, when the user submit to be audio object the time, making uses the same method upgrades modifying factor γ (i, j), the retrieving of next round is arranged the result who returns according to new similarity.
- 6. this content-based correlativity according to claim 1 strides the medium search method, it is characterized in that, described according to the base vector of asking in the mapping process of subspace, perhaps by relevant feedback mechanism, other media object beyond the training set are accurately navigated in the isomorphism subspace: when the inquiry example of user's submission does not belong to training dataset, the use characteristic extraction procedure extracts example visual feature of image vector V, divides following two kinds of situations to carry out the mapping of new media object to the isomorphism subspace:(1) if the semantic information of known new media object representation, the subspace base vector that described training obtains according to claim 3 then, method with linear transformation is mapped to the isomorphism subspace that m ties up with vectorial V, with other multimedia object computer general distances in the training set;(2) if content-based single mode retrieval is adopted in semantic the unknown of new media object representation, return the image similar, user's mark feedback positive example Z={z to inquiring about example 1..., z j, stride the medium searching system and calculate coordinate Pos (V)=Pos (z of new media object in m dimension isomorphism subspace with weighted average method 1) β 1+ ...+Pos (z j) β j, (β 1+ ...+β j=1).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006100533904A CN100422999C (en) | 2006-09-14 | 2006-09-14 | Transmedia searching method based on content correlation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006100533904A CN100422999C (en) | 2006-09-14 | 2006-09-14 | Transmedia searching method based on content correlation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101021849A true CN101021849A (en) | 2007-08-22 |
CN100422999C CN100422999C (en) | 2008-10-01 |
Family
ID=38709618
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2006100533904A Expired - Fee Related CN100422999C (en) | 2006-09-14 | 2006-09-14 | Transmedia searching method based on content correlation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100422999C (en) |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101833565A (en) * | 2010-03-31 | 2010-09-15 | 南京大学 | Method for actively selecting related feedbacks of representative image |
CN101984424A (en) * | 2010-10-26 | 2011-03-09 | 浙江工商大学 | Mass inter-media index method |
CN101546556B (en) * | 2008-03-28 | 2011-03-23 | 展讯通信(上海)有限公司 | Classification system for identifying audio content |
CN102262670A (en) * | 2011-07-29 | 2011-11-30 | 中山大学 | Cross-media information retrieval system and method based on mobile visual equipment |
CN102521368A (en) * | 2011-12-16 | 2012-06-27 | 武汉科技大学 | Similarity matrix iteration based cross-media semantic digesting and optimizing method |
CN102640153A (en) * | 2009-12-04 | 2012-08-15 | 诺基亚公司 | Method and apparatus for providing media content searching capabilities |
CN102663447A (en) * | 2012-04-28 | 2012-09-12 | 中国科学院自动化研究所 | Cross-media searching method based on discrimination correlation analysis |
CN102693321A (en) * | 2012-06-04 | 2012-09-26 | 常州南京大学高新技术研究院 | Cross-media information analysis and retrieval method |
CN102693316A (en) * | 2012-05-29 | 2012-09-26 | 中国科学院自动化研究所 | Linear generalization regression model based cross-media retrieval method |
CN102713900A (en) * | 2009-11-03 | 2012-10-03 | 高通股份有限公司 | Data searching using spatial auditory cues |
CN102932321A (en) * | 2011-08-08 | 2013-02-13 | 索尼公司 | Information processing apparatus, information processing method, program, and information processing system |
CN103049526A (en) * | 2012-12-20 | 2013-04-17 | 中国科学院自动化研究所 | Cross-media retrieval method based on double space learning |
CN103279579A (en) * | 2013-06-24 | 2013-09-04 | 魏骁勇 | Video retrieval method based on visual space |
WO2013159356A1 (en) * | 2012-04-28 | 2013-10-31 | 中国科学院自动化研究所 | Cross-media searching method based on discrimination correlation analysis |
WO2013177751A1 (en) * | 2012-05-29 | 2013-12-05 | 中国科学院自动化研究所 | Cross-media retrieval method based on generalized linear regression model |
CN103793447A (en) * | 2012-10-26 | 2014-05-14 | 汤晓鸥 | Method and system for estimating semantic similarity among music and images |
CN103995903A (en) * | 2014-06-12 | 2014-08-20 | 武汉科技大学 | Cross-media search method based on isomorphic subspace mapping and optimization |
CN103995804A (en) * | 2013-05-20 | 2014-08-20 | 中国科学院计算技术研究所 | Cross-media topic detection method and device based on multimodal information fusion and graph clustering |
CN104077408A (en) * | 2014-07-11 | 2014-10-01 | 浙江大学 | Distributed semi-supervised content identification and classification method and device for large-scale cross-media data |
CN104166982A (en) * | 2014-06-30 | 2014-11-26 | 复旦大学 | Image optimization clustering method based on typical correlation analysis |
CN104679902A (en) * | 2015-03-20 | 2015-06-03 | 湘潭大学 | Information abstract extraction method in conjunction with cross-media fuse |
CN105574133A (en) * | 2015-12-15 | 2016-05-11 | 苏州贝多环保技术有限公司 | Multi-mode intelligent question answering system and method |
CN105930873A (en) * | 2016-04-27 | 2016-09-07 | 天津中科智能识别产业技术研究院有限公司 | Self-paced cross-modal matching method based on subspace |
CN105938561A (en) * | 2016-04-13 | 2016-09-14 | 南京大学 | Canonical-correlation-analysis-based computer data attribute reduction method |
CN106095893A (en) * | 2016-06-06 | 2016-11-09 | 北京大学深圳研究生院 | A kind of cross-media retrieval method |
CN106127305A (en) * | 2016-06-17 | 2016-11-16 | 中国科学院信息工程研究所 | A kind of for method for measuring similarity between the allos of multi-source heterogeneous data |
CN106663429A (en) * | 2014-03-10 | 2017-05-10 | 韦利通公司 | Engine, system and method of providing audio transcriptions for use in content resources |
CN107209760A (en) * | 2014-12-10 | 2017-09-26 | 凯恩迪股份有限公司 | The sub-symbol data coding of weighting |
CN107480158A (en) * | 2016-06-07 | 2017-12-15 | 百度(美国)有限责任公司 | The method and system of the matching of content item and image is assessed based on similarity score |
CN107766571A (en) * | 2017-11-08 | 2018-03-06 | 北京大学 | The search method and device of a kind of multimedia resource |
CN108228757A (en) * | 2017-12-21 | 2018-06-29 | 北京市商汤科技开发有限公司 | Image search method and device, electronic equipment, storage medium, program |
CN108885639A (en) * | 2016-03-29 | 2018-11-23 | 斯纳普公司 | Properties collection navigation and automatic forwarding |
CN109074363A (en) * | 2016-05-09 | 2018-12-21 | 华为技术有限公司 | Data query method, data query system determine method and apparatus |
CN109408648A (en) * | 2018-10-26 | 2019-03-01 | 京东方科技集团股份有限公司 | It is associated with and determines method, works recommended method |
US10275685B2 (en) | 2014-12-22 | 2019-04-30 | Dolby Laboratories Licensing Corporation | Projection-based audio object extraction from audio content |
CN109784405A (en) * | 2019-01-16 | 2019-05-21 | 山东建筑大学 | Cross-module state search method and system based on pseudo label study and semantic consistency |
CN109784287A (en) * | 2019-01-22 | 2019-05-21 | 中国科学院自动化研究所 | Information processing method, system, device based on scene class signal forehead leaf network |
CN109992676A (en) * | 2019-04-01 | 2019-07-09 | 中国传媒大学 | Across the media resource search method of one kind and searching system |
CN110019898A (en) * | 2017-08-08 | 2019-07-16 | 航天信息股份有限公司 | A kind of animation image processing system |
CN110879863A (en) * | 2018-08-31 | 2020-03-13 | 阿里巴巴集团控股有限公司 | Cross-domain search method and cross-domain search device |
CN111046166A (en) * | 2019-12-10 | 2020-04-21 | 中山大学 | Semi-implicit multi-modal recommendation method based on similarity correction |
CN111291204A (en) * | 2019-12-10 | 2020-06-16 | 河北金融学院 | Multimedia data fusion method and device |
CN111931866A (en) * | 2020-09-21 | 2020-11-13 | 平安科技(深圳)有限公司 | Medical data processing method, device, equipment and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7185049B1 (en) * | 1999-02-01 | 2007-02-27 | At&T Corp. | Multimedia integration description scheme, method and system for MPEG-7 |
JP2001282813A (en) * | 2000-03-29 | 2001-10-12 | Toshiba Corp | Multimedia data retrieval method, index information providing method, multimedia data retrieval device, index server and multimedia data retrieval server |
CN1267838C (en) * | 2002-12-31 | 2006-08-02 | 程松林 | Sound searching method and video and audio information searching system using said method |
CN100336061C (en) * | 2003-08-08 | 2007-09-05 | 富士通株式会社 | Multimedia object searching device and methoed |
CN1529264A (en) * | 2003-10-06 | 2004-09-15 | 李少峰 | Method for searching associated multimedia content through text block position coding |
-
2006
- 2006-09-14 CN CNB2006100533904A patent/CN100422999C/en not_active Expired - Fee Related
Cited By (67)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101546556B (en) * | 2008-03-28 | 2011-03-23 | 展讯通信(上海)有限公司 | Classification system for identifying audio content |
CN102713900A (en) * | 2009-11-03 | 2012-10-03 | 高通股份有限公司 | Data searching using spatial auditory cues |
CN102713900B (en) * | 2009-11-03 | 2014-12-10 | 高通股份有限公司 | Data searching using spatial auditory cues |
CN102640153A (en) * | 2009-12-04 | 2012-08-15 | 诺基亚公司 | Method and apparatus for providing media content searching capabilities |
CN101833565B (en) * | 2010-03-31 | 2011-10-19 | 南京大学 | Method for actively selecting related feedbacks of representative image |
CN101833565A (en) * | 2010-03-31 | 2010-09-15 | 南京大学 | Method for actively selecting related feedbacks of representative image |
CN101984424A (en) * | 2010-10-26 | 2011-03-09 | 浙江工商大学 | Mass inter-media index method |
CN102262670A (en) * | 2011-07-29 | 2011-11-30 | 中山大学 | Cross-media information retrieval system and method based on mobile visual equipment |
CN102932321A (en) * | 2011-08-08 | 2013-02-13 | 索尼公司 | Information processing apparatus, information processing method, program, and information processing system |
CN102521368A (en) * | 2011-12-16 | 2012-06-27 | 武汉科技大学 | Similarity matrix iteration based cross-media semantic digesting and optimizing method |
CN102521368B (en) * | 2011-12-16 | 2013-08-21 | 武汉科技大学 | Similarity matrix iteration based cross-media semantic digesting and optimizing method |
CN102663447A (en) * | 2012-04-28 | 2012-09-12 | 中国科学院自动化研究所 | Cross-media searching method based on discrimination correlation analysis |
WO2013159356A1 (en) * | 2012-04-28 | 2013-10-31 | 中国科学院自动化研究所 | Cross-media searching method based on discrimination correlation analysis |
CN102693316A (en) * | 2012-05-29 | 2012-09-26 | 中国科学院自动化研究所 | Linear generalization regression model based cross-media retrieval method |
CN102693316B (en) * | 2012-05-29 | 2014-03-26 | 中国科学院自动化研究所 | Linear generalization regression model based cross-media retrieval method |
WO2013177751A1 (en) * | 2012-05-29 | 2013-12-05 | 中国科学院自动化研究所 | Cross-media retrieval method based on generalized linear regression model |
CN102693321A (en) * | 2012-06-04 | 2012-09-26 | 常州南京大学高新技术研究院 | Cross-media information analysis and retrieval method |
CN103793447A (en) * | 2012-10-26 | 2014-05-14 | 汤晓鸥 | Method and system for estimating semantic similarity among music and images |
CN103793447B (en) * | 2012-10-26 | 2019-05-14 | 汤晓鸥 | The estimation method and estimating system of semantic similarity between music and image |
CN103049526A (en) * | 2012-12-20 | 2013-04-17 | 中国科学院自动化研究所 | Cross-media retrieval method based on double space learning |
CN103049526B (en) * | 2012-12-20 | 2015-08-05 | 中国科学院自动化研究所 | Based on the cross-media retrieval method of double space study |
CN103995804A (en) * | 2013-05-20 | 2014-08-20 | 中国科学院计算技术研究所 | Cross-media topic detection method and device based on multimodal information fusion and graph clustering |
CN103995804B (en) * | 2013-05-20 | 2017-02-01 | 中国科学院计算技术研究所 | Cross-media topic detection method and device based on multimodal information fusion and graph clustering |
CN103279579B (en) * | 2013-06-24 | 2016-07-06 | 魏骁勇 | The video retrieval method in view-based access control model space |
CN103279579A (en) * | 2013-06-24 | 2013-09-04 | 魏骁勇 | Video retrieval method based on visual space |
CN106663429A (en) * | 2014-03-10 | 2017-05-10 | 韦利通公司 | Engine, system and method of providing audio transcriptions for use in content resources |
CN103995903B (en) * | 2014-06-12 | 2017-04-12 | 武汉科技大学 | Cross-media search method based on isomorphic subspace mapping and optimization |
CN103995903A (en) * | 2014-06-12 | 2014-08-20 | 武汉科技大学 | Cross-media search method based on isomorphic subspace mapping and optimization |
CN104166982A (en) * | 2014-06-30 | 2014-11-26 | 复旦大学 | Image optimization clustering method based on typical correlation analysis |
CN104077408A (en) * | 2014-07-11 | 2014-10-01 | 浙江大学 | Distributed semi-supervised content identification and classification method and device for large-scale cross-media data |
CN104077408B (en) * | 2014-07-11 | 2017-09-29 | 浙江大学 | Extensive across media data distributed semi content of supervision method for identifying and classifying and device |
US11061952B2 (en) | 2014-12-10 | 2021-07-13 | Kyndi, Inc. | Weighted subsymbolic data encoding |
CN107209760A (en) * | 2014-12-10 | 2017-09-26 | 凯恩迪股份有限公司 | The sub-symbol data coding of weighting |
US10275685B2 (en) | 2014-12-22 | 2019-04-30 | Dolby Laboratories Licensing Corporation | Projection-based audio object extraction from audio content |
CN104679902B (en) * | 2015-03-20 | 2017-11-28 | 湘潭大学 | A kind of informative abstract extracting method of combination across Media Convergence |
CN104679902A (en) * | 2015-03-20 | 2015-06-03 | 湘潭大学 | Information abstract extraction method in conjunction with cross-media fuse |
CN105574133A (en) * | 2015-12-15 | 2016-05-11 | 苏州贝多环保技术有限公司 | Multi-mode intelligent question answering system and method |
US11729252B2 (en) | 2016-03-29 | 2023-08-15 | Snap Inc. | Content collection navigation and autoforwarding |
CN108885639A (en) * | 2016-03-29 | 2018-11-23 | 斯纳普公司 | Properties collection navigation and automatic forwarding |
CN105938561A (en) * | 2016-04-13 | 2016-09-14 | 南京大学 | Canonical-correlation-analysis-based computer data attribute reduction method |
CN105930873A (en) * | 2016-04-27 | 2016-09-07 | 天津中科智能识别产业技术研究院有限公司 | Self-paced cross-modal matching method based on subspace |
CN105930873B (en) * | 2016-04-27 | 2019-02-12 | 天津中科智能识别产业技术研究院有限公司 | A kind of walking across mode matching method certainly based on subspace |
CN109074363A (en) * | 2016-05-09 | 2018-12-21 | 华为技术有限公司 | Data query method, data query system determine method and apparatus |
CN106095893A (en) * | 2016-06-06 | 2016-11-09 | 北京大学深圳研究生院 | A kind of cross-media retrieval method |
CN106095893B (en) * | 2016-06-06 | 2018-11-20 | 北京大学深圳研究生院 | A kind of cross-media retrieval method |
CN107480158B (en) * | 2016-06-07 | 2021-01-12 | 百度(美国)有限责任公司 | Method and system for evaluating matching of content item and image based on similarity score |
CN107480158A (en) * | 2016-06-07 | 2017-12-15 | 百度(美国)有限责任公司 | The method and system of the matching of content item and image is assessed based on similarity score |
CN106127305A (en) * | 2016-06-17 | 2016-11-16 | 中国科学院信息工程研究所 | A kind of for method for measuring similarity between the allos of multi-source heterogeneous data |
CN106127305B (en) * | 2016-06-17 | 2019-07-16 | 中国科学院信息工程研究所 | A kind of heterologous method for measuring similarity for multi-source heterogeneous data |
CN110019898A (en) * | 2017-08-08 | 2019-07-16 | 航天信息股份有限公司 | A kind of animation image processing system |
CN107766571A (en) * | 2017-11-08 | 2018-03-06 | 北京大学 | The search method and device of a kind of multimedia resource |
CN107766571B (en) * | 2017-11-08 | 2021-02-09 | 北京大学 | Multimedia resource retrieval method and device |
CN108228757A (en) * | 2017-12-21 | 2018-06-29 | 北京市商汤科技开发有限公司 | Image search method and device, electronic equipment, storage medium, program |
CN110879863B (en) * | 2018-08-31 | 2023-04-18 | 阿里巴巴集团控股有限公司 | Cross-domain search method and cross-domain search device |
CN110879863A (en) * | 2018-08-31 | 2020-03-13 | 阿里巴巴集团控股有限公司 | Cross-domain search method and cross-domain search device |
CN109408648B (en) * | 2018-10-26 | 2021-01-22 | 京东方科技集团股份有限公司 | Association determination method and work recommendation method |
CN109408648A (en) * | 2018-10-26 | 2019-03-01 | 京东方科技集团股份有限公司 | It is associated with and determines method, works recommended method |
CN109784405B (en) * | 2019-01-16 | 2020-09-08 | 山东建筑大学 | Cross-modal retrieval method and system based on pseudo-tag learning and semantic consistency |
CN109784405A (en) * | 2019-01-16 | 2019-05-21 | 山东建筑大学 | Cross-module state search method and system based on pseudo label study and semantic consistency |
US10915815B1 (en) | 2019-01-22 | 2021-02-09 | Institute Of Automation, Chinese Academy Of Sciences | Information processing method, system and device based on contextual signals and prefrontal cortex-like network |
CN109784287A (en) * | 2019-01-22 | 2019-05-21 | 中国科学院自动化研究所 | Information processing method, system, device based on scene class signal forehead leaf network |
CN109992676B (en) * | 2019-04-01 | 2020-12-25 | 中国传媒大学 | Cross-media resource retrieval method and retrieval system |
CN109992676A (en) * | 2019-04-01 | 2019-07-09 | 中国传媒大学 | Across the media resource search method of one kind and searching system |
CN111291204A (en) * | 2019-12-10 | 2020-06-16 | 河北金融学院 | Multimedia data fusion method and device |
CN111046166A (en) * | 2019-12-10 | 2020-04-21 | 中山大学 | Semi-implicit multi-modal recommendation method based on similarity correction |
CN111291204B (en) * | 2019-12-10 | 2023-08-29 | 河北金融学院 | Multimedia data fusion method and device |
CN111931866A (en) * | 2020-09-21 | 2020-11-13 | 平安科技(深圳)有限公司 | Medical data processing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN100422999C (en) | 2008-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100422999C (en) | Transmedia searching method based on content correlation | |
Torralba et al. | 80 million tiny images: A large data set for nonparametric object and scene recognition | |
CN102521368B (en) | Similarity matrix iteration based cross-media semantic digesting and optimizing method | |
Chen et al. | CLUE: cluster-based retrieval of images by unsupervised learning | |
Chadha et al. | Comparative study and optimization of feature-extraction techniques for content based image retrieval | |
CN101539930B (en) | Search method of related feedback images | |
CN102902826B (en) | A kind of image method for quickly retrieving based on reference picture index | |
CN106203483B (en) | A kind of zero sample image classification method based on semantic related multi-modal mapping method | |
CN112905822A (en) | Deep supervision cross-modal counterwork learning method based on attention mechanism | |
JP2006510114A (en) | Representation of content in conceptual model space and method and apparatus for retrieving it | |
CN102663447B (en) | Cross-media searching method based on discrimination correlation analysis | |
CN104156433B (en) | Image retrieval method based on semantic mapping space construction | |
CN103995903B (en) | Cross-media search method based on isomorphic subspace mapping and optimization | |
CN105849720A (en) | Visual semantic complex network and method for forming network | |
CN110297931A (en) | A kind of image search method | |
CN102890700A (en) | Method for retrieving similar video clips based on sports competition videos | |
CN103336835B (en) | Image retrieval method based on weight color-sift characteristic dictionary | |
Qian et al. | HWVP: hierarchical wavelet packet descriptors and their applications in scene categorization and semantic concept retrieval | |
CN106250925B (en) | A kind of zero Sample video classification method based on improved canonical correlation analysis | |
CN101211344A (en) | Text message ergodic rapid four-dimensional visualization method | |
Barz et al. | Content-based image retrieval and the semantic gap in the deep learning era | |
Sasikala et al. | Efficient content based image retrieval system with metadata processing | |
Yen et al. | Ranked centroid projection: A data visualization approach with self-organizing maps | |
CN106951501A (en) | A kind of method for searching three-dimension model based on many figure matchings | |
Belattar et al. | CBIR using relevance feedback: comparative analysis and major challenges |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20081001 Termination date: 20120914 |