CN101303694A - Method for implementing decussation retrieval between mediums through amalgamating different modality information - Google Patents

Method for implementing decussation retrieval between mediums through amalgamating different modality information Download PDF

Info

Publication number
CN101303694A
CN101303694A CNA2008100614455A CN200810061445A CN101303694A CN 101303694 A CN101303694 A CN 101303694A CN A2008100614455 A CNA2008100614455 A CN A2008100614455A CN 200810061445 A CN200810061445 A CN 200810061445A CN 101303694 A CN101303694 A CN 101303694A
Authority
CN
China
Prior art keywords
hypermedia
distance
objects
image
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2008100614455A
Other languages
Chinese (zh)
Inventor
吴飞
庄越挺
王文华
杨易
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CNA2008100614455A priority Critical patent/CN101303694A/en
Publication of CN101303694A publication Critical patent/CN101303694A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for realizing the interaction searching between medias by integrating different modal information which includes the following steps: 1) building relation graphs to hypermedias and obtaining corresponding relative coefficient matrixes; 2) using media objects or the hypermedias inside or outside a database submitted by a user as a searching example to mark the initial matching degree; 3) utilizing the relative coefficient matrix between the hypermedias to repeatedly iterate to a stable state and broadcasting the matching ability to the un-marked hypermedias and returning to the hypermedias or the media objects of special modal states in the hypermedias with the matching degree of the searching example larger than 0.6; 4) periodically carrying out adjustment on the relation graphs of the hypermedia according to a searching example set and a positive example set. The invention integrates the bottom layer characteristics of various media objects and broadcasts the semanteme by the sibship between the media objects, thereby having a better searching effect; as the searching example and the state of a returning result can be different and are broadcasted by utilizing the semanteme, the searching is more accurate and the adaptation is broader.

Description

Merge the method that different modalities information realizes cross-searching between medium
Technical field
The present invention relates to the cross-searching of medium between different modalities, relate in particular to a kind of method that different modalities information realizes cross-searching between medium that merges.
Background technology
The development of Web is accompanied by the sharp increase of information content, face googol like this according to amount, retrieval has become the important means that people obtain information, simple text retrieval can not have been satisfied the complicated day by day demand of user, and the user wishes that by retrieving obtainable be not the data that text data also comprises the different modalities such as lantern slide of image, video, audio frequency, Microsoft PowerPoint form.And the retrieval of existing multimedia object generally realizes by the coupling of artificial mark and low-level image feature, but this method needs a large amount of manual marks, since the radix of data volume is huge and data always with high speed increment, with mark as the multimedia search on basis only be adapted at the finite data amount among a small circle in use; Though the coupling retrieval based on the low-level image feature of multimedia object does not need a large amount of artificial inputs, but owing to exist wide gap between low-level image feature and semanteme, for example visually similar image may represented diverse semanteme, and semantically identical image may seem different fully, and the search method that therefore merges low-level image feature and semanteme has very important meaning.
Therefore have no idea at present directly the to obtain semanteme of media object can only make full use of semantic relation between the media object and realize retrieval based on semantic and feature.Media object in the reality generally is not self-existent, but exists with the form that is attached to hypermedia, and the hypermedia here refers to the object of the media object that includes multiple modalities, for example webpage and lantern slide etc.For the image in the webpage,, exist similar or complementary semantic relation generally speaking between other media object in it and the webpage and the text though can not directly obtain its semanteme.Utilize with the semantic relation between the media object in the hypermedia, can cross over the wide gap on the different modalities media object low-level image feature, thereby set up network of personal connections based on the hypermedia object of low-level image feature and semantic dependency.Set up the network of personal connections of hypermedia object, the user can inquire about media object and the hypermedia of wanting by media object or hypermedia, for example can retrieve semantic similar video by submitting webpage or image to, therefore realize between media object cross-searching highly significant.
Summary of the invention
The objective of the invention is to overcome the deficiencies in the prior art, a kind of method that different modalities information realizes cross-searching between medium that merges is provided.
The method that merges cross-searching between different modalities information realization medium comprises the steps:
1) to hypermedia opening relationships figure and the corresponding correlation matrix of acquisition;
2) media object that the user submits in the database or database is outer or hypermedia are as inquiry example mark initial matching degree;
3) utilize correlation matrix between hypermedia to iterate to steady state (SS) matching is propagated into the not hypermedia of mark, and return matching degree greater than 0.6 the hypermedia or the media object of certain modality-specific in these hypermedia;
4) periodically adjust according to user's inquiry example collection and positive example set pair hypermedia object relationship figure.
Described correlation matrix step to hypermedia opening relationships figure and acquisition correspondence is as follows:
1) sets up audible distance figure A between the hypermedia object, to any two hypermedia objects, if two hypermedia objects all contain audio object, then calculate the audible distance between these two hypermedia objects, it is right as audio frequency respectively to get an audio frequency in these two hypermedia objects, calculate the Mel frequency cepstral coefficient MFCC of two audio frequency, calculate the right low-level image feature distance of all audio frequency then, get the distance between two audio frequency of characteristic distance minimum and do normalization, audible distance as these two hypermedia objects, if one of them hypermedia object does not contain audio frequency, then the audible distance of these two hypermedia objects is made as infinity;
2) set up image distance figure I between the hypermedia object, to any two hypermedia objects, if two hypermedia objects all contain image, then calculate the image distance between these two hypermedia objects, it is right as image respectively to get an image in these two hypermedia objects, these two images are extracted color and textural characteristics, compute euclidian distances then, get the right distance of image of characteristic distance minimum and do normalization, image distance as these two hypermedia objects, if one of them hypermedia object does not contain image, then the image distance of these two hypermedia objects is made as infinity, all images in the data set is extracted color characteristic and textural characteristics, and wherein color characteristic comprises color histogram, color moment and color convergence vector, textural characteristics comprises roughness, directivity and contrast;
3) set up text distance map T between the hypermedia object, to any two hypermedia objects, if two hypermedia objects all contain text, then calculate the text distance between these two hypermedia objects, adopt vocabulary frequency/contrary document frequency method to carry out vector quantization to the text object in the hypermedia object, calculate all text objects Euclidean distance between any two, and with all range normalizations, the text feature of getting characteristic distance minimum between two hypermedia objects is apart from the characteristic distance as these two hypermedia objects, if one of them hypermedia object does not contain text, then the text of these two hypermedia objects distance is made as infinity;
4) audible distance figure A, image distance figure I and the text distance map T of adjustment hypermedia object calculate the shortest path of any point-to-point transmission respectively on these three figure, and substitute the weight on the limit of original point-to-point transmission with shortest path;
5) structure hypermedia object distance figure, the precision ratio that statistics is inquired about separately with audio frequency, image and text is designated as P respectively a, P iAnd P t, a hypermedia object is represented on each summit among the hypermedia object distance figure, and two distances between the hypermedia object are represented on the limit, make normalizing coefficient gamma=1/ (P a+ P i+ P t), hypermedia object distance figure i and j is put range formula is γ * (A Ij* P a+ I Ij* P i+ T Ij* P t);
6) make data centralization that n hypermedia object arranged, set up Matrix C N * nTo represent the semantic relation between any two hypermedia objects.C IjThe element of the capable j row of i if i and j value are equal, makes C among the representing matrix C IjValue is zero, otherwise C IjValue is exp (HMG Ij 2/ 2 σ 2), HMG wherein IjConnect the limit weight that sequence number is respectively the media object of i and j among the presentation medium object distance figure, σ is an adjustable parameter.
Media object that described user submits in the database or database is outer or hypermedia are as follows as the step of inquiry example mark initial matching degree:
1), in database, finds this object and the matching degree that this object and inquiry are imported is marked into 1 if the user submits to is media object or hypermedia in the database;
2) be outer media object or hypermedia of database if the user submits to, the low-level image feature distance of the media object that comprises in all media object and the inquiry example in the computational data storehouse, according to the low-level image feature distance, find in the database and the immediate k of an inquiry example media object, the hypermedia that these media object were subordinate to is all identified into 1 with respect to the matching degree of inquiring about example.
Described to utilize correlation matrix between hypermedia to iterate as follows with the hypermedia step that matching propagates into mark not to steady state (SS): according to marking matrix Y N * 1=[y 1, y 2... y n] TY wherein iI corresponding hypermedia object and the matching degree of inquiring about example are utilized formula Y *=(1-α) (I-α C) -1Y (0) obtain all media object in the stable back of iteration the matching degree with the input example, and return matching degree greater than 0.6 hypermedia object or its media object that comprises.
The different modalities media information has been merged in the present invention, has utilized the complete semanteme in the hypermedia, and dynamically adjusts semantic relation according to user feedback, therefore has precision ratio more accurately.Simultaneously, this method also discloses a kind of method of different modalities medium cross-searching, and the user can submit to hypermedia object, text, audio frequency or image to retrieve the media object and the hypermedia of identical or different mode, and is therefore more flexible, with better function.
Description of drawings
Fig. 1 merges the method flow diagram that different modalities information realizes cross-searching between medium;
Fig. 2 is a result for retrieval of the present invention; This figure displaying contents is the user by preceding 9 results that submit to a webpage query image of talking about cat to return.
Embodiment
The method that merges cross-searching between different modalities information realization medium comprises the steps:
1) to hypermedia opening relationships figure and the corresponding correlation matrix of acquisition;
2) media object that the user submits in the database or database is outer or hypermedia are as inquiry example mark initial matching degree;
3) utilize correlation matrix between hypermedia to iterate to steady state (SS) matching is propagated into the not hypermedia of mark, and return matching degree greater than 0.6 the hypermedia or the media object of certain modality-specific in these hypermedia;
4) periodically adjust according to user's inquiry example collection and positive example set pair hypermedia object relationship figure.
Described correlation matrix step to hypermedia opening relationships figure and acquisition correspondence is as follows:
1) sets up audible distance figure A between the hypermedia object, to any two hypermedia objects, if two hypermedia objects all contain audio object, then calculate the audible distance between these two hypermedia objects, it is right as audio frequency respectively to get an audio frequency in these two hypermedia objects, calculate the Mel frequency cepstral coefficient MFCC of two audio frequency, calculate the right low-level image feature distance of all audio frequency then, get the distance between two audio frequency of characteristic distance minimum and do normalization, audible distance as these two hypermedia objects, if one of them hypermedia object does not contain audio frequency, then the audible distance of these two hypermedia objects is made as infinity;
2) set up image distance figure I between the hypermedia object, to any two hypermedia objects, if two hypermedia objects all contain image, then calculate the image distance between these two hypermedia objects, it is right as image respectively to get an image in these two hypermedia objects, these two images are extracted color and textural characteristics, compute euclidian distances then, get the right distance of image of characteristic distance minimum and do normalization, image distance as these two hypermedia objects, if one of them hypermedia object does not contain image, then the image distance of these two hypermedia objects is made as infinity, all images in the data set is extracted color characteristic and textural characteristics, and wherein color characteristic comprises color histogram, color moment and color convergence vector, textural characteristics comprises roughness, directivity and contrast;
3) set up text distance map T between the hypermedia object, to any two hypermedia objects, if two hypermedia objects all contain text, then calculate the text distance between these two hypermedia objects, adopt vocabulary frequency/contrary document frequency method to carry out vector quantization to the text object in the hypermedia object, calculate all text objects Euclidean distance between any two, and with all range normalizations, the text feature of getting characteristic distance minimum between two hypermedia objects is apart from the characteristic distance as these two hypermedia objects, if one of them hypermedia object does not contain text, then the text of these two hypermedia objects distance is made as infinity;
4) audible distance figure A, image distance figure I and the text distance map T of adjustment hypermedia object calculate the shortest path of any point-to-point transmission respectively on these three figure, and substitute the weight on the limit of original point-to-point transmission with shortest path;
5) structure hypermedia object distance figure, the precision ratio that statistics is inquired about separately with audio frequency, image and text is designated as P respectively a, P iAnd P t, a hypermedia object is represented on each summit among the hypermedia object distance figure, and two distances between the hypermedia object are represented on the limit, make normalizing coefficient gamma=1/ (P a+ P i+ P t), hypermedia object distance figure i and j is put range formula is γ * (A Ij* P a+ I Ij* P i+ T Ij* P t);
6) make data centralization that n hypermedia object arranged, set up Matrix C N * nTo represent the semantic relation between any two hypermedia objects.C IjThe element of the capable j row of i if i and j value are equal, makes C among the representing matrix C IjValue is zero, otherwise C IjValue is exp (HMG Ij 2/ 2 σ 2), HMG wherein IjConnect the limit weight that sequence number is respectively the media object of i and j among the presentation medium object distance figure, σ is an adjustable parameter.
Media object that described user submits in the database or database is outer or hypermedia are as follows as the step of inquiry example mark initial matching degree:
1), in database, finds this object and the matching degree that this object and inquiry are imported is marked into 1 if the user submits to is media object or hypermedia in the database;
2) be outer media object or hypermedia of database if the user submits to, the low-level image feature distance of the media object that comprises in all media object and the inquiry example in the computational data storehouse, according to the low-level image feature distance, find in the database and the immediate k of an inquiry example media object, the hypermedia that these media object were subordinate to is all identified into 1 with respect to the matching degree of inquiring about example.
Described to utilize correlation matrix between hypermedia to iterate as follows with the hypermedia step that matching propagates into mark not to steady state (SS): according to marking matrix Y N * 1=[y 1, y 2... y n] TY wherein iI corresponding hypermedia object and the matching degree of inquiring about example are utilized formula Y *=(1-α) (I-α C) -1Y (0) obtain all media object in the stable back of iteration the matching degree with the input example, and return matching degree greater than 0.6 hypermedia object or its media object that comprises.
The described step of periodically adjusting according to user's inquiry example collection and positive example set pair hypermedia object distance figure HMG is as follows:
1) structural map G (0), a hypermedia object is represented on each summit, does not all have the limit between any two hypermedia;
2) each user's relevant feedback all is used for G figure is improved, and for example the user feedback of t wheel can be transformed G (t-1);
3) weight on each limit among the G figure is adjusted with shortest path first;
4) with G figure hypermedia object distance figure is adjusted, make the hypermedia distance map more meet relation between the hypermedia of user perspective.
The present invention is by utilizing the high correlation semantically with different modalities media object in low-level image feature distance between the mode media object and the together individual hypermedia, concentrate all hypermedia objects to set up distance map and correlation matrix to data, and in graph of a relation according to the matching degree of the weight transmission inquiry example between point and the point, realized cross-searching between dissimilar medium and content-based and semantic hypermedia retrieval.
As shown in Figure 1, fusion different modalities information realizes that the method for cross-searching between medium specifies as follows:
1) processed offline: this module realizes the media object in the database is carried out semantic understanding and set up the hypermedia distance map.This module comprises that mainly feature extraction, hypermedia single mode distance map are set up, hypermedia object distance figure sets up, sets up four main algorithm of correlation matrix.Specify as follows:
Feature extraction of a media object and distance calculation: this algorithm adopts distinct methods to extract low-level image feature to elder generation to the dissimilar medium object, and calculates distance between identical mode media object.For all text objects in the data set, use vocabulary frequency/contrary document frequency to come the vector quantization text, calculate the Euclidean distance between any two texts then; For all audio objects of data centralization, adopt the feature of Mel frequency cepstral coefficient MFCC, and calculate the distance between audio frequency as audio frequency; For all images object, extract color characteristic and textural characteristics, and the Euclidean distance of computed image between in twos; At last the distance of text, image and audio frequency is done normalization.
B hypermedia single mode distance map is set up: this algorithm is set up the hypermedia distance map respectively to audio frequency, image and three kinds of mode of text, for hypermedia audible distance figure, a hypermedia object is represented on each summit, get two hypermedia separately the low-level image feature that comprises of object apart from the distance between minimum two audio frequency as the distance between 2 o'clock, if one of them hypermedia does not comprise audio frequency, then the distance between these two hypermedia is made as infinity; On this original audio distance map, calculate the shortest path of all point-to-point transmissions, and replace the distance of point-to-point transmission with shortest path.The method for building up of hypermedia image distance figure and hypermedia text distance map is with the audio frequency distance map.
C hypermedia object distance figure sets up: this algorithm construction hypermedia object distance figure.The precision ratio that statistics is inquired about separately with audio frequency, image and text is designated as P respectively a, P IAnd P t, a hypermedia object is represented on each summit among the hypermedia object distance figure, and two distances between the hypermedia object are represented on the limit, make normalizing coefficient gamma=1/ (P a+ P i+ P t), hypermedia object distance figure i and j is put range formula is γ * (A Ij* P a+ I Ij* P i+ T Ij* P t);
D sets up correlation matrix: make data centralization that n hypermedia object arranged, set up Matrix C N * nTo represent the semantic relation between any two hypermedia objects.C IjThe element of the capable j row of i if i and j value are equal, makes C among the representing matrix C IjValue is zero, otherwise C IjValue is exp (HMG Ij 2/ 2 σ 2), HMG wherein IjConnect the limit weight that sequence number is respectively the media object of i and j among the presentation medium object distance figure, σ is an adjustable parameter.
2) retrieval: this module realizes the intersection search and the hypermedia semantic retrieval of media object, and the user can submit to image, sound, text or hypermedia to retrieve and inquire about the media object or the hypermedia of input semantic dependency maximum as the inquiry input.Specify as follows:
When the retrieval example that a submits to as the user was the hypermedia of data centralization existence, at first the matching degree with this hypermedia object and inquiry input identified into 1, structural matrix Y N * 1=[y 1, y 2... Y n] TY wherein iI corresponding hypermedia object and the matching degree of inquiring about example are if the hypermedia object is exactly an input inquiry, just y iAssignment 1, otherwise assignment 0.Utilize formula Y then *=(1-α) (I-α C) -1Y (0) obtain all media object in the stable back of iteration the matching degree with the input example, and return matching degree greater than 0.6 hypermedia object or its media object that comprises.
When the retrieval example that b submits to as the user is the media object of data centralization existence, the hypermedia that finds this media object to be subordinate to, this hypermedia object is identified into 1 with the matching degree of inquiry input, the matching degree of other hypermedia object composes 0, next the aspire for stability stable state of all hypermedia objects and inquiry input matching degree under the state, method is consistent with step a.
When the retrieval example that c submits to as the user is a media object outside data set, then calculate inquiry example and the distance of all objects of data centralization on low-level image feature according to the method for distance between pretreatment module computing medium object, try to achieve k arest neighbors, and the matching degree of the hypermedia that this k media object is subordinate to identifies into 1 and replaces importing the example inquiry, and ensuing method is consistent with step a.
When the retrieval example that d submits to as the user is a hypermedia outside the data set, at first a plurality of media object in this hypermedia are all sought k arest neighbors, and the matching degree assignment of element becomes 1 in the Y matrix of the hypermedia correspondence that these arest neighbors are subordinate to, the value of other objects is 0, next the aspire for stability steady state (SS) of all objects and inquiry example matching degree under the state, method is consistent with step a.
3) inquiry example collection and the positive example collection according to user feedback comes structuring user's feedback diagram G to represent the visual angle of user to the hypermedia object relationship, and periodically uses G figure that hypermedia object distance figure is improved.Specify as follows:
A structural map G (0) for hypermedia object i and hypermedia object j arbitrarily, makes G Ij(0)=0.
B makes the query set and the positive example collection of the user feedback of t wheel be respectively Q tAnd P t, then t takes turns amended user feedback figure G Ij(t)=λ+log 2(G Ij(t-1)+2), wherein object i and object j belong to Q tOr P t, λ is the adjustable parameters more than or equal to 1.
C optimizes the limit weight among the user feedback figure G, wherein G according to formula 2 pBe illustrated in the weight of path p among the user feedback figure G, min represents in the parameter minimum value, and minv represents to connect the minor face of process in 2 the path, and l represents the limit number of this path process.
G p = min ( 1 + min v l , min v ) - - - ( 2 )
D is according to formula 3, and comprehensive user feedback figure G and hypermedia distance map are adjusted the distance between the hypermedia object in the hypermedia distance map, and regenerate hypermedia object dependencies Matrix C according to formula 1.To any hypermedia i and hypermedia j, if i and j belong to the query set and the positive example collection of r wheel relevant feedback, then HMG Ij=ω * HMG Ij, wherein ω is the positive integer less than 1, HMG IjLimit weight between presentation medium object i and object j; If i and j belong to the query set of r wheel relevant feedback and positive example collection and between hypermedia object k and the hypermedia object j weight is arranged in figure G is the limit of non-zero, then HMG Ij=HMG Ij/ Gk j
Embodiment:
Suppose to have 1000 hypermedia, by 950 images, 100 sound clips and 800 sections texts constitute.At first extract the color characteristic and the textural characteristics of all images, wherein color characteristic comprises color histogram, color moment and color convergence vector, and textural characteristics comprises roughness, directivity and contrast, calculates the distance in twos between all images then; To sound clip, extract Mel frequency cepstral coefficient MFCC, calculate all target voices distance between any two; To text, calculate text object distance between any two behind employing vocabulary frequency/contrary document frequency vector quantization.After finishing the media object distance calculation, be to image distance, the normalization respectively of text distance and acoustic distance.Set up audible distance figure A, image distance figure I and text distance map T between the hypermedia object, set up audible distance figure A, at first for any hypermedia to weevil and second, at first find all distances between the audio frequency that belongs to these two hypermedia respectively, get wherein minimum distance as the audible distance between first and second objects, if have in first and second one do not comprise or two do not comprise audio object, then the audible distance between first and second objects is made as infinity.Calculate any point-to-point transmission bee-line with dijkstra's algorithm again, with the new weight of bee-line as limit between two summits; The method for building up of image distance figure I and text distance map T and audible distance figure to set up mode consistent.The precision ratio that statistics is inquired about separately with audio frequency, image and text is designated as P respectively a, P IAnd P t, merging audio frequency, image and text distance map and set up the hypermedia distance map, a hypermedia object is represented on each summit in the hypermedia distance map, and two distances between the hypermedia object are represented on the limit, make normalizing coefficient gamma=1/ (P a+ P i+ P t), hypermedia distance map i and j some distance H MG Ij=γ * (A Ij* P a+ I Ij* P i+ T Ij* P t).Setting up 1000 * 1000 Matrix C on the hypermedia distance map basis to represent the semantic relation between any two hypermedia objects.C IjThe element of the capable j row of i if i and j value are equal, makes C among the representing matrix C IjValue is zero, otherwise C IjValue is exp (HMG Ij 2/ 0.5); Set up 1000 * 1 matrix Y 1000 * 1, Y iWhat represent is the degree of correlation of i hypermedia object and inquiry, Y iAll be initialized to zero.
Fig. 2 is the user by preceding 9 results that submit to a webpage of talking about cat to go query image to return, its retrieving is as follows: when the user submits a webpage of talking about cat to, suppose to comprise in the webpage audio object and passage, Mel frequency cepstral coefficient MFCC at first calculates to this audio computer in system, and find data centralization and its 3 nearest audio frequency, the element of the hypermedia that comprises these 3 audio frequency in matrix Y is set as 1, similarly, word frequency/contrary document frequency calculates to the text in the input example in system, and obtain at data centralization 3 sections texts the most close with this section text low-level image feature, and the element that comprises in the Y matrix of these 3 sections texts is set as 1, the element of remaining hypermedia correspondence all is arranged to 0, obtain the good coupling matrix Y (0) of initialization, use formula Y *=(1-0.5) (I-0.5 * C) -1Y (0) calculates final matching degree matrix Y *, return Y *Middle matching degree is greater than the image that comprises in preceding 9 hypermedia objects of 0.6, as the result of user search.From figure two as can be seen, precision ratio is quite high, illustrates that this method has effectively striden across semantic wide gap, has solved the problem of the cross-searching between the different modalities medium.
Can see from top example, different with traditional search method is, the present invention has made full use of semantic dependency and the complementarity between the multimedia object that the hypermedia object comprises, and according to the statistics reasonable distribution influence of different modalities media object to precision ratio, therefore than traditional search method precision ratio height; The present invention simultaneously both can be by this complete fusion of hypermedia the set of different modalities media object retrieve, also can retrieve the media object of any mode by submitting the generic media object to, therefore from functional perspective, the present invention is more flexible, with better function, more can meet user's demand.

Claims (4)

1. one kind merges the method that different modalities information realizes cross-searching between medium, it is characterized in that comprising the steps:
1) to hypermedia opening relationships figure and the corresponding correlation matrix of acquisition;
2) media object that the user submits in the database or database is outer or hypermedia are as inquiry example mark initial matching degree;
3) utilize correlation matrix between hypermedia to iterate to steady state (SS) matching is propagated into the not hypermedia of mark, and return matching degree greater than 0.6 the hypermedia or the media object of certain modality-specific in these hypermedia;
4) periodically adjust according to user's inquiry example collection and positive example set pair hypermedia object relationship figure.
2. a kind of method that different modalities information realizes cross-searching between medium that merges according to claim 1 is characterized in that, described correlation matrix step to hypermedia opening relationships figure and acquisition correspondence is as follows:
1) sets up audible distance figure A between the hypermedia object, to any two hypermedia objects, if two hypermedia objects all contain audio object, then calculate the audible distance between these two hypermedia objects, it is right as audio frequency respectively to get an audio frequency in these two hypermedia objects, calculate the Mel frequency cepstral coefficient MFCC of two audio frequency, calculate the right low-level image feature distance of all audio frequency then, get the distance between two audio frequency of characteristic distance minimum and do normalization, audible distance as these two hypermedia objects, if one of them hypermedia object does not contain audio frequency, then the audible distance of these two hypermedia objects is made as infinity;
2) set up image distance figure I between the hypermedia object, to any two hypermedia objects, if two hypermedia objects all contain image, then calculate the image distance between these two hypermedia objects, it is right as image respectively to get an image in these two hypermedia objects, these two images are extracted color and textural characteristics, compute euclidian distances then, get the right distance of image of characteristic distance minimum and do normalization, image distance as these two hypermedia objects, if one of them hypermedia object does not contain image, then the image distance of these two hypermedia objects is made as infinity, all images in the data set is extracted color characteristic and textural characteristics, and wherein color characteristic comprises color histogram, color moment and color convergence vector, textural characteristics comprises roughness, directivity and contrast;
3) set up text distance map T between the hypermedia object, to any two hypermedia objects, if two hypermedia objects all contain text, then calculate the text distance between these two hypermedia objects, adopt vocabulary frequency/contrary document frequency method to carry out vector quantization to the text object in the hypermedia object, calculate all text objects Euclidean distance between any two, and with all range normalizations, the text feature of getting characteristic distance minimum between two hypermedia objects is apart from the characteristic distance as these two hypermedia objects, if one of them hypermedia object does not contain text, then the text of these two hypermedia objects distance is made as infinity;
4) audible distance figure A, image distance figure I and the text distance map T of adjustment hypermedia object calculate the shortest path of any point-to-point transmission respectively on these three figure, and substitute the weight on the limit of original point-to-point transmission with shortest path;
5) structure hypermedia object distance figure, the precision ratio that statistics is inquired about separately with audio frequency, image and text is designated as P respectively a, P iAnd P t, a hypermedia object is represented on each summit among the hypermedia object distance figure, and two distances between the hypermedia object are represented on the limit, make normalizing coefficient gamma=1/ (p a+ p i+ p t), hypermedia object distance figure i and j is put range formula is γ * (A Ij* P a+ I Ij* P i+ T Ij* P t);
6) make data centralization that n hypermedia object arranged, set up Matrix C N * nTo represent the semantic relation between any two hypermedia objects.C IjThe element of the capable j row of i if i and j value are equal, makes C among the representing matrix C IjValue is zero, otherwise C IjValue is exp (HMG Ij 2/ 2 σ 2), HMG wherein IjConnect the limit weight that sequence number is respectively the media object of i and j among the presentation medium object distance figure, σ is an adjustable parameter.
3. a kind of method that different modalities information realizes cross-searching between medium that merges according to claim 1, it is characterized in that media object that described user submits in the database or database is outer or hypermedia are as follows as the step of inquiry example mark initial matching degree:
1), in database, finds this object and the matching degree that this object and inquiry are imported is marked into 1 if the user submits to is media object or hypermedia in the database;
2) be outer media object or hypermedia of database if the user submits to, the low-level image feature distance of the media object that comprises in all media object and the inquiry example in the computational data storehouse, according to the low-level image feature distance, find in the database and the immediate k of an inquiry example media object, the hypermedia that these media object were subordinate to is all identified into 1 with respect to the matching degree of inquiring about example.
4. a kind of method that different modalities information realizes cross-searching between medium that merges according to claim 1, it is characterized in that described to utilize correlation matrix between hypermedia to iterate as follows with the hypermedia step that matching propagates into mark not to steady state (SS): according to marking matrix Y N * 1=[y 1, y 2Y n] TY wherein iI corresponding hypermedia object and the matching degree of inquiring about example are utilized formula Y *=(1-α) (I-α C) -1Y (0) obtains all media object in stable back of iteration and the matching degree of importing example, and returns matching degree greater than 0.6 hypermedia object or its media object that comprises.
CNA2008100614455A 2008-04-30 2008-04-30 Method for implementing decussation retrieval between mediums through amalgamating different modality information Pending CN101303694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2008100614455A CN101303694A (en) 2008-04-30 2008-04-30 Method for implementing decussation retrieval between mediums through amalgamating different modality information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2008100614455A CN101303694A (en) 2008-04-30 2008-04-30 Method for implementing decussation retrieval between mediums through amalgamating different modality information

Publications (1)

Publication Number Publication Date
CN101303694A true CN101303694A (en) 2008-11-12

Family

ID=40113602

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2008100614455A Pending CN101303694A (en) 2008-04-30 2008-04-30 Method for implementing decussation retrieval between mediums through amalgamating different modality information

Country Status (1)

Country Link
CN (1) CN101303694A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103856817A (en) * 2012-11-29 2014-06-11 上海文广互动电视有限公司 Interactive play method and system of hypermedia
CN104317837A (en) * 2014-10-10 2015-01-28 浙江大学 Cross-modal searching method based on topic model
CN104573145A (en) * 2013-10-15 2015-04-29 奚伟祖 Automatic processing method for animated micro-video
CN106446010A (en) * 2016-08-23 2017-02-22 北京三体高创科技有限公司 Local retrieval method and device of 3D model on the basis of fuzzy corresponding function
CN109271904A (en) * 2018-09-03 2019-01-25 东南大学 A kind of black smoke vehicle detection method based on pixel adaptivenon-uniform sampling and Bayesian model
CN109783657A (en) * 2019-01-07 2019-05-21 北京大学深圳研究生院 Multistep based on limited text space is from attention cross-media retrieval method and system
CN110019907A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 A kind of image search method and device
CN115617957A (en) * 2022-12-19 2023-01-17 铭台(北京)科技有限公司 Intelligent document retrieval method based on big data

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103856817A (en) * 2012-11-29 2014-06-11 上海文广互动电视有限公司 Interactive play method and system of hypermedia
CN103856817B (en) * 2012-11-29 2018-07-20 上海文广互动电视有限公司 The interactive playback method and system of hypermedia
CN104573145B (en) * 2013-10-15 2018-08-31 奚伟祖 The automatic processing and treating method of the micro- video of animationization
CN104573145A (en) * 2013-10-15 2015-04-29 奚伟祖 Automatic processing method for animated micro-video
CN104317837A (en) * 2014-10-10 2015-01-28 浙江大学 Cross-modal searching method based on topic model
CN104317837B (en) * 2014-10-10 2017-06-23 浙江大学 A kind of cross-module state search method based on topic model
CN106446010A (en) * 2016-08-23 2017-02-22 北京三体高创科技有限公司 Local retrieval method and device of 3D model on the basis of fuzzy corresponding function
CN110019907A (en) * 2017-12-01 2019-07-16 北京搜狗科技发展有限公司 A kind of image search method and device
CN110019907B (en) * 2017-12-01 2021-07-16 北京搜狗科技发展有限公司 Image retrieval method and device
CN109271904A (en) * 2018-09-03 2019-01-25 东南大学 A kind of black smoke vehicle detection method based on pixel adaptivenon-uniform sampling and Bayesian model
CN109271904B (en) * 2018-09-03 2022-02-15 东南大学 Black smoke vehicle detection method based on pixel adaptive segmentation and Bayesian model
CN109783657A (en) * 2019-01-07 2019-05-21 北京大学深圳研究生院 Multistep based on limited text space is from attention cross-media retrieval method and system
CN109783657B (en) * 2019-01-07 2022-12-30 北京大学深圳研究生院 Multi-step self-attention cross-media retrieval method and system based on limited text space
CN115617957A (en) * 2022-12-19 2023-01-17 铭台(北京)科技有限公司 Intelligent document retrieval method based on big data

Similar Documents

Publication Publication Date Title
CN101303694A (en) Method for implementing decussation retrieval between mediums through amalgamating different modality information
CN100388282C (en) Transmedia search method based on multi-mode information convergence analysis
CN101334796B (en) Personalized and synergistic integration network multimedia search and enquiry method
CN101950302B (en) Method for managing immense amounts of music libraries based on mobile device
US7890512B2 (en) Automatic image annotation using semantic distance learning
CN104428767B (en) For identifying the mthods, systems and devices of related entities
US8356038B2 (en) User to user recommender
CN101271476B (en) Relevant feedback retrieval method based on clustering in network image search
US20100106486A1 (en) Image-based semantic distance
Hobsbawm Behind the Times: The Decline and Fall of the Twentieth-century avant-gardes
CN109388731A (en) A kind of music recommended method based on deep neural network
WO2023272748A1 (en) Academic accurate recommendation-oriented heterogeneous scientific research information integration method and system
CN110674320B (en) Retrieval method and device and electronic equipment
CN102693321A (en) Cross-media information analysis and retrieval method
CN105205695A (en) Internet-based advertisement interactive system and method
CN109587515A (en) A kind of video playing method for predicting and device
CN105447131B (en) Internet resources relatedness determines method and apparatus
CN108304519B (en) Knowledge forest construction method based on graph database
Zhou et al. Novel SDDM rating prediction models for recommendation systems
CN109101567A (en) A kind of distributed text approximate KNN semantic search calculation method
CN108108770A (en) Moving-vision search framework based on CRBM and Fisher networks
Hong et al. Multimedia encyclopedia construction by mining web knowledge
CN116051192A (en) Method and device for processing data
CN105678340A (en) Automatic image marking method based on enhanced stack type automatic encoder
CN106484810A (en) A kind of recommendation method and system of multimedia programming

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20081112