CN101388022A

CN101388022A - Web portrait search method for fusing text semantic and vision content

Info

Publication number: CN101388022A
Application number: CNA2008101182533A
Authority: CN
Inventors: 赵耀; 谢琳; 朱振峰
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2008-08-12
Filing date: 2008-08-12
Publication date: 2009-03-18
Anticipated expiration: 2028-08-12
Also published as: CN101388022B

Abstract

The invention relates to a Web portrait retrieval method fused with text semantics and visual content, which comprises steps of submitting 'a query string'to a commercial search engine server to realize functions of connecting and downloading based on a HTTP protocol, downloading picture output of the commercial picture search engine and relevant websites to be a local image library, and simultaneously, extracting key tags of original websites to form XML files for post text processing, further, utilizing the AdaBoost face detection technique, mining the high-level semantics of vector models to webpage scripts containing pictures, comparing via using experienced weights and a method of dynamically weighting based on PLSA, dynamically combining visual analysis results of image characteristics and text analysis results of image characteristics via a regulating factor, obtaining rank values of relevancy of images and query, reordering the image output list of the search engine and feeding it back to users. The method has higher precision rate which is greatly increased after fusion of the characteristics.

Description

The Web portrait search method of a kind of fusing text semantic and vision content

Technical field

The present invention relates to a kind of portrait search method, be meant the Web search method of a kind of fusing text semantic and vision content especially.The present invention is an object with the Web portrait picture retrieval in the Internet environment, the Web text semantic is excavated the integration technology of differentiating with the image vision content carry out comparatively deep research, realized the prototype system of Web portrait picture retrieval under the Internet environment.

Background technology

Continuous development along with computer technology, network technology and high capacity memory technology etc., and PC and digital photographing apparatus is universal, the quantity of retrievable multimedia messages also increases with surprising rapidity on the internet, particularly image is widely used and is uploaded to the internet with its intuitive and comprise the rich of information.Internet information has also brought challenge when increasing sharply and providing affluent resources to the user: vast as the open sea various information is distributed in everywhere disorderly, often be difficult to effectively be utilized in default of due organization and management, this has caused the waste of resource to a certain extent.Therefore, a kind of active demand that people face is, how from the vast sea of information, fast and effeciently locatees and obtain interested resource.This demand has also been impelled the generation and the development of information retrieval technique.

Development along with related discipline, the research focus of this direction of information retrieval has experienced very big development and variation, from the text based information retrieval, to content-based image/video, audio retrieval and multimedia retrieval, and towards the multimedia retrieval of WWW.Retrieval technique is an applied technology that practicality is extremely strong, can be applied to as science and technology look into newly, various fields such as news paper advertising, safety are tracked down and arrest, designed and produced, amusement and recreation.And image retrieval is as the crucial branch of information retrieval, along with technology such as multimedia messages processing, database and computer internet merge and development mutually, today that resource is spread unchecked on Internet particularly, the design of CBIR system and application under the Internet environment, it is a research direction that has much vitality, at this direction further investigation, to have great value in theory and wide application prospect, its achievement will play positive facilitation to the formation of China's this type of information industry with development.

In the variety classes that multimedia messages comprised, image is therein in occupation of consequence.Correlative study shows, and is human in to intramundane perception, information source more than 80% arranged in vision.Image has characteristics such as visual in image, abundant in content as a kind of important information carrier, is to form multimedia important content.Simultaneously, image is not only and uses maximum media formats except that text on network, also is the mode of other multimedia messages most convenients of expression.Therefore; image retrieval technologies has become a very active research field gradually since the seventies in 20th century; and under data base set is unified the promotion of computer vision two big research fields, successively differentiate text based and content-based two different research angles again.

The text based information retrieval technique has obtained sufficient research in the past few decades, and successfully applies in the commercial search engine.The latter stage seventies, the text retrieval technology is applied in the early stage image retrieval, main method is to image file mark key word or text header and some additional informations, again image is carried out retrieval based on key word, this changes into image retrieval for the text retrieval problem in fact exactly, be text-based image retrieval (Text-BasedImage Retri eval, TBIR).Present most commercial Web photographic search engine, as Google, AltaVista, Lycos etc., mainly all be to adopt the TBIR method, and its performance is largely increased by some network analysis techniques, as the famous Page-Rank technology that Google adopted.But, TBIR has often only analyzed text message and has not considered the vision content of picture, yet surge along with the Web image, this employing can not adapt to the requirement of networked information retrieval to the mode of textual description information such as image labeling keyword, its limitation is also outstanding further: 1) the image labeling text message is needed by manually finishing, work longsomeness and workload are big, and the speed of artificial mark can't adapt to the velocity of propagation of multimedia messages explosive growth and network, and this just requires mark to change the automatic mode of computing machine into by manual type; 2) because some images content of forgiving is far from a small amount of text marking institute energy expressed intact, promptly so-called " figure wins thousand speeches ", perhaps as abstract graph and so on, its content is difficult to again express with literal, and different people may have different understanding again to same width of cloth figure, same individual also may have different understanding to same width of cloth figure under the varying environment condition, these have all caused inevitable subjectivity of text marking and inaccuracy.

So in the early 1990s the phase, (Content-Based ImageRetrieval's CBIR technology CBIR) arises at the historic moment.Be different from the way that image is manually marked based in the system of text retrieval, the content-based retrieval technology mainly is that vision content feature with image self is as its index, as bottom visual signatures such as color, texture, shape and spatial relationships.In retrieval, the user submits to a width of cloth can represent " the example image " of own demand to give system as inquiry, system can return image therewith on visual signature similar other images as result for retrieval.Why the CBTR technology is better than traditional retrieval method based on key word, be because it has merged image understanding, pattern-recognition and computer vision scheduling theory, and combine multi-field knowledge such as artificial intelligence, Object-oriented Technique, cognitive psychology and database, these researchs are once huge leap in the evolution of image retrieval.

In the last few years, lot of domestic and international research institution and establishment be all in the further investigation of carrying out CBIR, and correspondingly develop some valuable general-purpose systems.For example, external more famous having: " information retrieval based on contents system " QBIC (Query By Image Cont ent) of International Business Machines Corp.'s Almaden research centre exploitation, the Vi rage of Vi rage company exploitation, the Photobook of MTT Media Lab exploitation, the VisualSEEK of the common exploitation in electronic engineering of Columbia Univ USA and telecommunications research centre image and advanced television laboratory, and the MARS (Multimedia Analysis and Retrieval System) of the E Benna of U.S. University of Illinois-champagne branch school (UIUC-University of Illinois at Urbana-Champaign) etc.Domestic more representational " based on the multimedia information retrieval system of feature " MIRES as Inst. of Computing Techn. Academia Sinica and National Library of China joint development, Photo Nayigator, the PhotoEngine of Zhejiang University's exploitation and WebscopeCBR etc.

CBIR has obtained people's extensive concern always since producing, more and more researchers is put into this work.But, the problem of content-based retrieval method is, at present mostly the CBIR system uses is that the bottom visual signature of image overall is described image, and these features and people do not have the correlativity of uniform rules to the subjective judgement of image high-level semantic, though the extractive technique of Image Visual Feature has had more theory support, result for retrieval is still unsatisfactory.This is because level image visual signature and its high-level semantic do not have necessary relation, therefore in many cases, two dissimilar pictures might have similarly certain low-level image feature fully, particularly when its bottom visual signature and high-level semantic are inconsistent, the CBIR system often can not provide gratifying result, problem that Here it is so-called " semantic wide gap " (Semantic Gap) also is that CBIR wants the bottleneck that further develops.

At this problem, the researchist proposes to utilize man-machine interaction (Human-ComputerInteraction) to come assisted retrieval, typical technology be relevant feedback (Relevance Feedback, RF).Relevant feedback is utilized the user that the result who returns is estimated and is readjusted current inquiry, can make return results meet user's subjective demand more.But, and increase the complicacy of system just because of this Technology Need user once even interaction feedback repeatedly, has also caused burden to the user to a certain extent.

With image-related retrieval technique research and development the more than ten years, be still when previous important research project.On psychologic angle, embody multiple standard in the judgement of people to similarity between image, existing semantic criteria also have the visual signature standard, and different people is also variant on criterion, and a good searching system must be able to be simulated this subjective diversity.Because text-based image retrieval technology and two kinds of content-based image retrieval technologies emphasize particularly on different fields between image, semantic and visual signature, when displaying one's respective advantages, also be subjected to the serious restriction of " semantic wide gap " problem, hindered the further raising of image indexing system performance.

After the nineties,, add popularizing of digital photographing apparatus, on webpage, use image to become very easy along with rapid development of network technology.Image can enrich the ornamental of webpage greatly, strengthens user's the visual understanding to information, and oneself is through becoming the indispensable part of current webpage, and these Web images become the important source that the user obtains picture interested.So people have turned to the research of the CBIR under the network environment, but how effectively to collect the focus that these image documents become the research of current educational circles, thereby also proposed new challenge according to user's request.But, do not do other improvement if just the method for CBIR is removed to network environment, then " semantic wide gap " problem among the CBIR still can not solve.

Though the scientific research personnel of various countries has obtained some achievements in research in the CBIR field, regrettably,, be difficult to search out the business-like CBIR of a success system owing to be subjected to the restriction of " semantic wide gap ".At present, still be those based on the image search engine of keyword query in occupation of market.However, above-mentioned text based image search engine also exists " semantic wide gap " problem, only the wide gap performance of this moment for the people to the difference between the Web script markup information of the understanding of image and style varied, cause problems thus, as the result who searches is too many, the ordering instability of Search Results, semantic close picture can not immediately following together, the picture degree of correlation that retrieves do not make us feeling quite pleased, and redundant information is more etc.

But the researchist finds, the Web picture has the characteristic that is different from the traditional database picture, except picture itself, in comprising their webpage, abundant textual description may also be arranged, as picture header, picture URL, replace text (ALT) and around text etc., these all help to disclose its high-layer semantic information.And in general, text is easier to disclose the high-level semantic of picture than picture bottom visual signature, thereby it has also brought into play huge effect in commercial search engine.But, the design of many webpages and make not standard so, but optionally set type and do not provide necessary label or do suitable mark according to the demand of oneself.Therefore, the vision content of picture combined with the textual description of webpage just can provide more comprehensive and objective degree of correlation evaluation, and this also is the effective way that it(?) can effective way on the Web image retrieval improves performance.Because there are some researches show, a typical network user is when using search engine, on average only import 1-2 query word as key word, and 3 pages of contents of on average only browsing return results, and only be contracted to retrieval at portrait, the user is accustomed to only importing name naturally especially and inquires about as keyword, and wishes just to find in former pages or leaves the picture of query object.Therefore, how under the prerequisite that does not increase burden for users, as far as possible in advance and feed back to the user and meet present practical application request more with the result of coupling more.

At present, researchers have also carried out deep research to the method for this many Feature Fusion, proposed many relevant methods vision and text feature are carried out combination.As, people such as Cascia proposed one under the WWW environment in 1998, used the image indexing system of linear vector with text and visual cues combination; Zhao also proposed in 2002 to utilize LSI that the file that uses text and visual signature to represent is carried out semantic analysis, and proved the huge raising that the adopted introducing of analyzing of enigmatic language brings CBIR systematicness the subject of knowledge and the object of knowledge; People such as Y.Alp Aslandogan have proposed personage's picture Web search station (Web search agent) of " Diogenes " by name in 2000, checked the Dempster-Shafer method to carry out the combination of multi thread; The Cortina that was proposed by people such as Quack in 2004 then focuses on extensive image, has introduced the relevant feedback technology simultaneously; Jing etc. have also proposed one and have carried out the framework of image retrieval in conjunction with key word and visual signature in 2005, need the participation of relevant feedback equally; For alleviating the burden that relevant feedback causes to the user, the feature associated methods that people such as He used based on correlation rule (association rules) and clustering technique in 2006 proposes single step search (One-step search) or the like, and is numerous.Therefore, for " semantic wide gap " problem of image retrieval, retrieve if image vision and relevant textual information can be combined, the two remedies mutually and can improve retrieval performance.More existing researchers are used for visual media with thoughts such as artificial intelligence, neural network, concept learning, data minings and describe and retrieve in conjunction with the MPEG-7 international standard, study visual media search engine and correlation technique thereof on the so-called next generation network.

On present research level, the CBIR technology is primarily aimed at the retrieval of general image, and it is retrieved as the master with the similarity coupling of general image low-level image feature, and is auxiliary with the high-rise content characteristic of image.The description of image high-level characteristic need be by the knowledge of specialized field, relate to the accurate identification of special image, as fingerprint recognition, face recognition, iris recognition, Gait Recognition etc., this class identification has constituted current extremely active class image recognition technology branch---a biometrics identification technology branch.At present, the research of this class special image retrieval also is in full swing, has occurred as some research systems such as portrait searching system Diogenes.

Generally speaking, at present content-based image search engine technology is still quite immature, all have many problems to need to be resolved hurrily in theory with in the practicality, especially describe at characteristics of image, versatility design, system function optimization and on Internet aspect such as practicability, be still the problem that needs research.

Summary of the invention

The Web portrait search method that the objective of the invention is to avoid above-mentioned weak point of the prior art and a kind of fusing text semantic and vision content are provided.The present invention is an object with the Web portrait picture retrieval in the Internet environment just, the Web text semantic is excavated the integration technology of differentiating with the image vision content carry out comparatively deep research, realized the prototype system of Web portrait picture retrieval under the Internet environment.

Purpose of the present invention can reach by following measure:

The Web portrait search method of a kind of fusing text semantic and vision content, in conjunction with utilization, its concrete steps of this method are as follows with text and visual signature:

The step 1 network crawl forms local original graph valut

Submit connection and the download function of " query string " realization to the commercial search engine server based on http protocol, download the picture result of commercial photographic search engine and related web page as local image library, the crucial label that extracts original web page simultaneously forms the XML file that the later stage text-processing is used;

Step 2 carries out picture material and text semantic excavates

Adopted good, the fireballing AdaBoost human face detection tech of current detection performance, on the other hand, we excavate the high-level semantic that the page script that comprises picture carries out vector model, and use experience weights and compare based on the dynamic weighting method of PLSA;

The dynamic fusion of step 3 vision and text feature

By a regulatory factor, will to image carry out visually with text on the dynamic combination of signature analysis result, obtain the relevancy ranking value of image and inquiry, thereby the tabulation of rearrangement search engine image result and feeds back to the user.

The present invention has following advantage compared to existing technology: by with the contrast experiment of baseline results tabulation, the result shows, the fused images vision content that we are designed and the retrieval ordering method of text high-level semantic have more excellent retrieval performance, and particularly former pages accuracy rate is excellent more.Usually, the user is only interested in the result who comes the front, so the present invention's advantage more.

Description of drawings

Fig. 1 overall system diagram of the present invention;

Fig. 2 system interface figure of the present invention;

Fig. 3 retrieval and rearrangement be figure as a result;

Fig. 4 PLSA viewpoint model;

Fig. 5 text ranking results;

Fig. 6 Feature Fusion ranking results.

Embodiment

Under this specific application background of Web, text-based image retrieval has been avoided the identification difficult problem to the visualized elements of complexity to a certain extent, meet the retrieval habit that people are familiar with, Web webpage context and hypertext structural information have been made full use of, realize simple, but because still be confined to remit the description image by index terms in the scope of text retrieval, therefore occur easily theme ambiguity, index differ, can't be to problems such as picture material understandings.And CBIR is just the opposite, and main utilization comes the index image to the analysis of the characteristic element of visual pattern, has certain objectivity, determines as the color histogram of every width of cloth image.But the algorithm of CBIR is complicated, realizes the cost height.Thereby in the present invention, we in conjunction with utilization, have realized the prototype system of a Web portrait picture retrieval with text and visual signature.Its concrete steps are as follows:

The step 1 network crawl forms local original graph valut

The present invention makes and has broken away from the traditional programming mode based on Socket of VC in this way by submit the connection and the download function of " query string " realization based on http protocol to the commercial search engine server, has reduced labor capacity, has improved efficiency.The present invention downloads the picture result of commercial photographic search engine and related web page as local image library, and the crucial label that extracts original web page simultaneously forms the XML file that the later stage text-processing is used.

Step 2 carries out picture material and text semantic excavates

Because what the present invention is directed to is the portrait picture, so from image vision content angle, we have adopted good, the fireballing AdaBoost human face detection tech of current detection performance, carry out whether containing in the picture personage's differentiation.On the other hand, we excavate the high-level semantic that the page script that comprises picture carries out vector model, and use experience weights and compare based on the dynamic weighting method of PLSA.

The dynamic fusion of step 3 vision and text feature

By with the contrast experiment of baseline results tabulation, the result shows that the fused images vision content that we are designed and the retrieval ordering method of text high-level semantic have more excellent retrieval performance, particularly former pages accuracy rate is excellent more.Usually, the user is only interested in the result who comes the front, so the present invention's advantage more.

Below in conjunction with the drawings and specific embodiments the present invention is further described.

According to the technical scheme of above introduction, we have realized the prototype system of an image retrieval according to framework of the present invention, shown in Fig. 2 system interface figure of the present invention.System interface mainly is made of 4 parts, and the frame of broken lines with red, green, blue, purple 4 kinds of different colours indicates and number consecutively respectively.Parameter setting and control section that the redness on the left side and 1, No. 2 green frame of broken lines are system are wherein reserved from primary input and the interface of selecting parameter to the user.

Parameter setting and control zone that No. 1 red block is the network crawl download module: in the setting area, the user can be from the primary input keyword, selection wants to link and download the commercial photographic search engine of original graph valut, (number of times is many more if download the number of times that repeated attempt connects when getting nowhere in selection, can guarantee the complete raw data of systems attempt ground download, but spend more time), the quantity of selection download pictures, the local path of picture library etc. is preserved in input; In the control zone, the user can control picture creep the beginning of downloading with stop, and logging off.The selection control zone that No. 2 green frames are shuffle algorithms provides the interface of 4 kinds of algorithms.

No. 3 blue frames and No. 4 purple frames are the picture viewing area, and what wherein show in No. 3 frames is the original image sequence of downloading from commercial photographic search engine, and are presented in No. 4 frames through the image sequence after certain Algorithm Analysis, the rearrangement; Under 3, No. 4 look frames, all have can before and after the control of page turning, make things convenient for and browse sequence of pictures before and after the user, checking and contrasting before and after resetting and before and after the sequence.

What Fig. 3 showed is the actual result example that the present invention moves.

In conjunction with the accompanying drawings, we elaborate the specific embodiment of the present invention.

Shown in Fig. 1 overall system diagram of the present invention, total system process flow diagram of the present invention comprises following components:

L, network crawl form local image library

Be based on first way of search on the search engines such as Google and Baidu what the present invention adopted, can reduce workload like this, the a large amount of data of time flower after creeping need not be sorted out, because Google and Baidu have set up index with all from the picture that each website obtains, thereby guaranteed that the data of being creeped all are pictures and do not comprise other data.In addition, this program is only carried out text analyzing to the webpage at picture place and is not done any processing for the webpage of other and this web page interlinkage, carries out once (degree of depth) again and creeps and get final product so " spider " only needs on kind of a child node (Google and Baidu etc.) basis each is linked.

At this step (seeing Fig. 1 top solid box), can download three class data: thumbnail, original image and comprise the original web page of original image as a result at each result.The local raw image data of the whole formation of these three classes data storehouse.And when downloading original web page, with the closely-related label substance of image, form an XML file and excavate use for text semantic afterwards in the extraction page script.

1. carrying out picture material and text semantic excavates

On the basis of the image data base that forms, the present invention use respectively two independently module carry out the differentiation of picture material and the excavation of text high-level semantic.Wherein, the AdaBoost people's face that has used OpenCV to provide detects the differentiation that integrated function carries out picture material, and uses vector model to carry out the excavation of text high-level semantic, and has used the experience weights and utilized the model of PLSA that it is carried out dynamic weighting.

The idiographic flow that the text high-level semantic excavates is as follows:

(1) Boolean type document vector model:

In the present invention, the keywords of the URL of the replacement text of the title of picture, picture, the URL of picture, former webpage, META label and description attribute and picture be used to make up the Web semantic information unit vector of Boolean type around text:

T = (t_{1} . . . t_{n_{t}}),

N wherein _l=7,

t_{n_{t}} &Element; {0,1}

Whether the expression query text appears in the corresponding semantic information unit.

(2) experience weight vectors:

Consider the difference of web producer at aspects such as background and making styles, importance to the relevant semantic information that implied the semantic information unit that is extracted from the web script file also exists difference, can thus be each semantic information unit and composes with corresponding weights to reflect above-mentioned difference.Order

W_{T} = (w_{1_{t}} \cdot \cdot \cdot w_{n_{t}})

Be the weight vector of semantic information unit, wherein w _jFor with the t of semantic information unit _jCorresponding weights.Observation by the Web page or leaf that contains portrait that the module of creeping the is returned observation of semantic information unit XML document (specifically to) is got in the present invention empirically

W_{T} = (w_{1_{t}} w_{2_{t}} \cdot \cdot \cdot w_{7_{t}}) = (1.5,2.0,0.8,0.8,1.0,1.0,0.5),

Big more, show that corresponding semantic information unit is important more.Thereby, the semantic relevancy R of Web document _TFor:

Semantic relevancy R according to each Web document _T, can realize the semantic ordering of script of different web documents.

(3) utilize the PLSA dynamic weighting

From Fig. 4 PLSA viewpoint model as can be known, PLSA viewpoint model is the hidden variable model of symbiosis data (co-occurrencedata), each group speech w ∈ W={w that observation is obtained ₁, w ₂..., w _MAnd document d ∈ D={d ₁, d ₂..., d _N, the implicit classification theme z ∈ Z={z that obtains with a non-observation ₁, z ₂..., z _kConnect.Simultaneously, also be defined as follows probability:

1) by probability P (d _i) selected file d _i

2) by probability P (z _k| d _i) choose an implicit class z _k

3) by probability P (w _j| z _k) speech w of generation _j

Therefore, just can ignore implicit classification theme z, obtain one group of group observation (d _i, w _j) joint ensemble, can be expressed as

(a)

P (d_{i}, w_{j}) = P (d_{i}) P (w_{j} | d_{i}), P (w_{j} | d_{i}) = \underset{z &Element; Z}{Σ} P (w_{j} | z) P (z | d_{i})

(b)

P (d_{i}, w_{j}) = \underset{z &Element; Z}{Σ} P (z) P (d_{j} | z) P (w_{j} | z)

Viewpoint model (b) (see figure 4) of PLSA is introduced the present invention, and the semantic information unit that is extracted from original web page (is crucial label field, field) as the correspondence of observing speech W, i.e. w → f; And use query word to replace hidden variable, be expressed as z → q.

Because under the actual conditions of the present invention's experiment, query word (implicit variable) is known, therefore in the present invention, symmetric parameter model (b) among selection Fig. 4 is asked for the joint probability research object more according to the invention of document and label field, can obtain the Field-Document joint probability thus:

P (d_{i}, f_{j}) = \underset{q &Element; Q}{Σ} P (q) P (d_{j} | q) P (f_{j} | q)

Wherein, Q is the query word set.For query word q, its prior probability P (q) can be counted as a constant.Like this, problem then can further be reduced to for given query word q, the class conditional probability P (f of semantic information unit and document _j| q) and P (d _i| q).P (f _j| q) be the f of semantic information unit _jThe word frequency TF (term frequency) that query word in picture library, occurs, i.e. P (f _j| q)=n/N _d, N _dBe the picture library sum, n is for the quantity of documents of query word wherein occurring.

Through the web script, with i web document d _iCorresponding vectorial type semantic description is d _i={ f _{I, 1}, f _{I, 2}..., f _{I, 7}, its j element definition is:

f _i，j＝tf _j*Portion _i，j

In the formula, m _iBe illustrated in and remove, the number of the unit of keyword in the 1st ~ 6 semantic information unit, occurs, totalNum and keyNum around outside the text semantic information word _iThe sum of representative ring word in textview field and the number of times of keyword occurs respectively.Here, tf _jWhat reflect is " word frequency " information that does not rely on certain single document, and Portion _{I, j}Then reflection is the scale factor of the association between the inner semantic information of i document unit.

Query word q for given supposes the N that returns via the module of creeping _dIndividual document Gaussian distributed then has:

P(d _i|q)＝G(d _i；μ _d，σ _d)

So,, can obtain (f with P according to formula 4.9 for each inquiry (query) _j, d _i)

Field-Document joint distribution matrix for element:

In (2), we have provided the fixedly semantic relevancy measure of experience weights realization web script have been adopted in all inquiries.But for different inquiries, more direct scheme is that different semantic information units is carried out dynamic weighting, thereby dynamically adapts to dissimilar query words.Therefore, the present invention has proposed two kinds again based on the scheme of PLSA to Boolean type semantic description vector dynamic weighting.

1. independent weight vector method: for i web document, according to Field-Document joint distribution matrix P P (f _j, d _i) j corresponding weights of semantic information unit of conduct and its, promptly

{w^{i}}_{j} = p (f_{j}, d_{i}),

The semantic relevancy that can get i web document thus is:

R_{T}^{i} = Σ_{j = 1}^{n_{t}} {t^{i}}_{j} \cdot {w^{i}}_{j}

Because this mode has nothing in common with each other to the weight vectors that each document adopted, by the decision of document self PLSA statistical probability, so this combination is called independent weight vector method.

2. statistical weight vector method: to the weight vectors that all documents use same statistics to draw,, on all documents, ask its expectation value (promptly the matrix corpse being asked average by row) promptly to j semantic information unit, as the weighted value in this territory:

w_{j} = E_{i} [p (d_{j}, f_{i})] = \frac{1}{1000} Σ_{i = 1}^{N_{d}} p (f_{j}, d_{i})

Thereby the semantic relevancy of i web document is:

R_{T}^{i} = Σ_{j = 1}^{n_{t}} {t^{i}}_{j} \cdot w_{j}

Because what in this mode all texts are adopted is the weight vectors that same statistics draws, so this mode is called the statistical weight vector method.It should be noted that, for different inquiries, the weight w of this moment _jIt is dynamic change.More than these two kinds of dynamic weighting methods, utilize the method for statistics to finish tolerance to web script semantic relevancy.

(4) vision content and text high-level semantic combines

In the present invention, for vision content being differentiated the result and text semantic combines, also give a weight vector for the vision content vector

W_{V} = (w_{1_{v}} \cdot \cdot \cdot w_{n_{v}}),

N herein _v=1, then

W_{V} = (w_{1_{t}}) .

Adopt the mode identical again, with it weight vector W with semantic information unit with proper vector _TLinear combination obtains the total weight value vector

W = (W_{T}, W_{V}) = (w_{1_{t}} {\cdot \cdot \cdot w}_{7_{t}} w_{1_{v}})

With the Boolean type feature description vector F=that merges (T, V)=(t ₁..., t ₇, v ₁).

Thereby, by the dot product of the total weight vectors of feature description vector sum, can try to achieve final degree of correlation R:

According to the final degree of correlation, can realize the vision content differentiation and the vectorial ordering that combines of Boolean type semantic description of merging.

Experimental result

Fig. 5 text ranking results, Fig. 6 Feature Fusion ranking results are the contrast and experiment figure of each sort method of proposing of the present invention, selected K=15 English name-to as query word randomly, these 15 people's names are respectively andrea, bruce, fred, gaby, jane, lynette, maria, peter, robinson, simon, wesley, eva, jackcafferty, brucelee, williamshakespeare.As can be seen from the figure, this experiment has increased significantly after than original ordering higher precision ratio, particularly Feature Fusion being arranged.

Claims

1, the Web portrait search method of a kind of fusing text semantic and vision content is characterized in that: in conjunction with utilization, its concrete steps of this method are as follows with text and visual signature:

The step 1 network crawl forms local original graph valut

Step 2 carries out picture material and text semantic excavates

The dynamic fusion of step 3 vision and text feature