CN101388022A - Web portrait search method for fusing text semantic and vision content - Google Patents

Web portrait search method for fusing text semantic and vision content Download PDF

Info

Publication number
CN101388022A
CN101388022A CNA2008101182533A CN200810118253A CN101388022A CN 101388022 A CN101388022 A CN 101388022A CN A2008101182533 A CNA2008101182533 A CN A2008101182533A CN 200810118253 A CN200810118253 A CN 200810118253A CN 101388022 A CN101388022 A CN 101388022A
Authority
CN
China
Prior art keywords
image
text
semantic
web
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008101182533A
Other languages
Chinese (zh)
Other versions
CN101388022B (en
Inventor
赵耀
谢琳
朱振峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN2008101182533A priority Critical patent/CN101388022B/en
Publication of CN101388022A publication Critical patent/CN101388022A/en
Application granted granted Critical
Publication of CN101388022B publication Critical patent/CN101388022B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a Web portrait retrieval method fused with text semantics and visual content, which comprises steps of submitting 'a query string'to a commercial search engine server to realize functions of connecting and downloading based on a HTTP protocol, downloading picture output of the commercial picture search engine and relevant websites to be a local image library, and simultaneously, extracting key tags of original websites to form XML files for post text processing, further, utilizing the AdaBoost face detection technique, mining the high-level semantics of vector models to webpage scripts containing pictures, comparing via using experienced weights and a method of dynamically weighting based on PLSA, dynamically combining visual analysis results of image characteristics and text analysis results of image characteristics via a regulating factor, obtaining rank values of relevancy of images and query, reordering the image output list of the search engine and feeding it back to users. The method has higher precision rate which is greatly increased after fusion of the characteristics.

Description

The Web portrait search method of a kind of fusing text semantic and vision content
Technical field
The present invention relates to a kind of portrait search method, be meant the Web search method of a kind of fusing text semantic and vision content especially.The present invention is an object with the Web portrait picture retrieval in the Internet environment, the Web text semantic is excavated the integration technology of differentiating with the image vision content carry out comparatively deep research, realized the prototype system of Web portrait picture retrieval under the Internet environment.
Background technology
Continuous development along with computer technology, network technology and high capacity memory technology etc., and PC and digital photographing apparatus is universal, the quantity of retrievable multimedia messages also increases with surprising rapidity on the internet, particularly image is widely used and is uploaded to the internet with its intuitive and comprise the rich of information.Internet information has also brought challenge when increasing sharply and providing affluent resources to the user: vast as the open sea various information is distributed in everywhere disorderly, often be difficult to effectively be utilized in default of due organization and management, this has caused the waste of resource to a certain extent.Therefore, a kind of active demand that people face is, how from the vast sea of information, fast and effeciently locatees and obtain interested resource.This demand has also been impelled the generation and the development of information retrieval technique.
Development along with related discipline, the research focus of this direction of information retrieval has experienced very big development and variation, from the text based information retrieval, to content-based image/video, audio retrieval and multimedia retrieval, and towards the multimedia retrieval of WWW.Retrieval technique is an applied technology that practicality is extremely strong, can be applied to as science and technology look into newly, various fields such as news paper advertising, safety are tracked down and arrest, designed and produced, amusement and recreation.And image retrieval is as the crucial branch of information retrieval, along with technology such as multimedia messages processing, database and computer internet merge and development mutually, today that resource is spread unchecked on Internet particularly, the design of CBIR system and application under the Internet environment, it is a research direction that has much vitality, at this direction further investigation, to have great value in theory and wide application prospect, its achievement will play positive facilitation to the formation of China's this type of information industry with development.
In the variety classes that multimedia messages comprised, image is therein in occupation of consequence.Correlative study shows, and is human in to intramundane perception, information source more than 80% arranged in vision.Image has characteristics such as visual in image, abundant in content as a kind of important information carrier, is to form multimedia important content.Simultaneously, image is not only and uses maximum media formats except that text on network, also is the mode of other multimedia messages most convenients of expression.Therefore; image retrieval technologies has become a very active research field gradually since the seventies in 20th century; and under data base set is unified the promotion of computer vision two big research fields, successively differentiate text based and content-based two different research angles again.
The text based information retrieval technique has obtained sufficient research in the past few decades, and successfully applies in the commercial search engine.The latter stage seventies, the text retrieval technology is applied in the early stage image retrieval, main method is to image file mark key word or text header and some additional informations, again image is carried out retrieval based on key word, this changes into image retrieval for the text retrieval problem in fact exactly, be text-based image retrieval (Text-BasedImage Retri eval, TBIR).Present most commercial Web photographic search engine, as Google, AltaVista, Lycos etc., mainly all be to adopt the TBIR method, and its performance is largely increased by some network analysis techniques, as the famous Page-Rank technology that Google adopted.But, TBIR has often only analyzed text message and has not considered the vision content of picture, yet surge along with the Web image, this employing can not adapt to the requirement of networked information retrieval to the mode of textual description information such as image labeling keyword, its limitation is also outstanding further: 1) the image labeling text message is needed by manually finishing, work longsomeness and workload are big, and the speed of artificial mark can't adapt to the velocity of propagation of multimedia messages explosive growth and network, and this just requires mark to change the automatic mode of computing machine into by manual type; 2) because some images content of forgiving is far from a small amount of text marking institute energy expressed intact, promptly so-called " figure wins thousand speeches ", perhaps as abstract graph and so on, its content is difficult to again express with literal, and different people may have different understanding again to same width of cloth figure, same individual also may have different understanding to same width of cloth figure under the varying environment condition, these have all caused inevitable subjectivity of text marking and inaccuracy.
So in the early 1990s the phase, (Content-Based ImageRetrieval's CBIR technology CBIR) arises at the historic moment.Be different from the way that image is manually marked based in the system of text retrieval, the content-based retrieval technology mainly is that vision content feature with image self is as its index, as bottom visual signatures such as color, texture, shape and spatial relationships.In retrieval, the user submits to a width of cloth can represent " the example image " of own demand to give system as inquiry, system can return image therewith on visual signature similar other images as result for retrieval.Why the CBTR technology is better than traditional retrieval method based on key word, be because it has merged image understanding, pattern-recognition and computer vision scheduling theory, and combine multi-field knowledge such as artificial intelligence, Object-oriented Technique, cognitive psychology and database, these researchs are once huge leap in the evolution of image retrieval.
In the last few years, lot of domestic and international research institution and establishment be all in the further investigation of carrying out CBIR, and correspondingly develop some valuable general-purpose systems.For example, external more famous having: " information retrieval based on contents system " QBIC (Query By Image Cont ent) of International Business Machines Corp.'s Almaden research centre exploitation, the Vi rage of Vi rage company exploitation, the Photobook of MTT Media Lab exploitation, the VisualSEEK of the common exploitation in electronic engineering of Columbia Univ USA and telecommunications research centre image and advanced television laboratory, and the MARS (Multimedia Analysis and Retrieval System) of the E Benna of U.S. University of Illinois-champagne branch school (UIUC-University of Illinois at Urbana-Champaign) etc.Domestic more representational " based on the multimedia information retrieval system of feature " MIRES as Inst. of Computing Techn. Academia Sinica and National Library of China joint development, Photo Nayigator, the PhotoEngine of Zhejiang University's exploitation and WebscopeCBR etc.
CBIR has obtained people's extensive concern always since producing, more and more researchers is put into this work.But, the problem of content-based retrieval method is, at present mostly the CBIR system uses is that the bottom visual signature of image overall is described image, and these features and people do not have the correlativity of uniform rules to the subjective judgement of image high-level semantic, though the extractive technique of Image Visual Feature has had more theory support, result for retrieval is still unsatisfactory.This is because level image visual signature and its high-level semantic do not have necessary relation, therefore in many cases, two dissimilar pictures might have similarly certain low-level image feature fully, particularly when its bottom visual signature and high-level semantic are inconsistent, the CBIR system often can not provide gratifying result, problem that Here it is so-called " semantic wide gap " (Semantic Gap) also is that CBIR wants the bottleneck that further develops.
At this problem, the researchist proposes to utilize man-machine interaction (Human-ComputerInteraction) to come assisted retrieval, typical technology be relevant feedback (Relevance Feedback, RF).Relevant feedback is utilized the user that the result who returns is estimated and is readjusted current inquiry, can make return results meet user's subjective demand more.But, and increase the complicacy of system just because of this Technology Need user once even interaction feedback repeatedly, has also caused burden to the user to a certain extent.
With image-related retrieval technique research and development the more than ten years, be still when previous important research project.On psychologic angle, embody multiple standard in the judgement of people to similarity between image, existing semantic criteria also have the visual signature standard, and different people is also variant on criterion, and a good searching system must be able to be simulated this subjective diversity.Because text-based image retrieval technology and two kinds of content-based image retrieval technologies emphasize particularly on different fields between image, semantic and visual signature, when displaying one's respective advantages, also be subjected to the serious restriction of " semantic wide gap " problem, hindered the further raising of image indexing system performance.
After the nineties,, add popularizing of digital photographing apparatus, on webpage, use image to become very easy along with rapid development of network technology.Image can enrich the ornamental of webpage greatly, strengthens user's the visual understanding to information, and oneself is through becoming the indispensable part of current webpage, and these Web images become the important source that the user obtains picture interested.So people have turned to the research of the CBIR under the network environment, but how effectively to collect the focus that these image documents become the research of current educational circles, thereby also proposed new challenge according to user's request.But, do not do other improvement if just the method for CBIR is removed to network environment, then " semantic wide gap " problem among the CBIR still can not solve.
Though the scientific research personnel of various countries has obtained some achievements in research in the CBIR field, regrettably,, be difficult to search out the business-like CBIR of a success system owing to be subjected to the restriction of " semantic wide gap ".At present, still be those based on the image search engine of keyword query in occupation of market.However, above-mentioned text based image search engine also exists " semantic wide gap " problem, only the wide gap performance of this moment for the people to the difference between the Web script markup information of the understanding of image and style varied, cause problems thus, as the result who searches is too many, the ordering instability of Search Results, semantic close picture can not immediately following together, the picture degree of correlation that retrieves do not make us feeling quite pleased, and redundant information is more etc.
But the researchist finds, the Web picture has the characteristic that is different from the traditional database picture, except picture itself, in comprising their webpage, abundant textual description may also be arranged, as picture header, picture URL, replace text (ALT) and around text etc., these all help to disclose its high-layer semantic information.And in general, text is easier to disclose the high-level semantic of picture than picture bottom visual signature, thereby it has also brought into play huge effect in commercial search engine.But, the design of many webpages and make not standard so, but optionally set type and do not provide necessary label or do suitable mark according to the demand of oneself.Therefore, the vision content of picture combined with the textual description of webpage just can provide more comprehensive and objective degree of correlation evaluation, and this also is the effective way that it(?) can effective way on the Web image retrieval improves performance.Because there are some researches show, a typical network user is when using search engine, on average only import 1-2 query word as key word, and 3 pages of contents of on average only browsing return results, and only be contracted to retrieval at portrait, the user is accustomed to only importing name naturally especially and inquires about as keyword, and wishes just to find in former pages or leaves the picture of query object.Therefore, how under the prerequisite that does not increase burden for users, as far as possible in advance and feed back to the user and meet present practical application request more with the result of coupling more.
At present, researchers have also carried out deep research to the method for this many Feature Fusion, proposed many relevant methods vision and text feature are carried out combination.As, people such as Cascia proposed one under the WWW environment in 1998, used the image indexing system of linear vector with text and visual cues combination; Zhao also proposed in 2002 to utilize LSI that the file that uses text and visual signature to represent is carried out semantic analysis, and proved the huge raising that the adopted introducing of analyzing of enigmatic language brings CBIR systematicness the subject of knowledge and the object of knowledge; People such as Y.Alp Aslandogan have proposed personage's picture Web search station (Web search agent) of " Diogenes " by name in 2000, checked the Dempster-Shafer method to carry out the combination of multi thread; The Cortina that was proposed by people such as Quack in 2004 then focuses on extensive image, has introduced the relevant feedback technology simultaneously; Jing etc. have also proposed one and have carried out the framework of image retrieval in conjunction with key word and visual signature in 2005, need the participation of relevant feedback equally; For alleviating the burden that relevant feedback causes to the user, the feature associated methods that people such as He used based on correlation rule (association rules) and clustering technique in 2006 proposes single step search (One-step search) or the like, and is numerous.Therefore, for " semantic wide gap " problem of image retrieval, retrieve if image vision and relevant textual information can be combined, the two remedies mutually and can improve retrieval performance.More existing researchers are used for visual media with thoughts such as artificial intelligence, neural network, concept learning, data minings and describe and retrieve in conjunction with the MPEG-7 international standard, study visual media search engine and correlation technique thereof on the so-called next generation network.
On present research level, the CBIR technology is primarily aimed at the retrieval of general image, and it is retrieved as the master with the similarity coupling of general image low-level image feature, and is auxiliary with the high-rise content characteristic of image.The description of image high-level characteristic need be by the knowledge of specialized field, relate to the accurate identification of special image, as fingerprint recognition, face recognition, iris recognition, Gait Recognition etc., this class identification has constituted current extremely active class image recognition technology branch---a biometrics identification technology branch.At present, the research of this class special image retrieval also is in full swing, has occurred as some research systems such as portrait searching system Diogenes.
Generally speaking, at present content-based image search engine technology is still quite immature, all have many problems to need to be resolved hurrily in theory with in the practicality, especially describe at characteristics of image, versatility design, system function optimization and on Internet aspect such as practicability, be still the problem that needs research.
Summary of the invention
The Web portrait search method that the objective of the invention is to avoid above-mentioned weak point of the prior art and a kind of fusing text semantic and vision content are provided.The present invention is an object with the Web portrait picture retrieval in the Internet environment just, the Web text semantic is excavated the integration technology of differentiating with the image vision content carry out comparatively deep research, realized the prototype system of Web portrait picture retrieval under the Internet environment.
Purpose of the present invention can reach by following measure:
The Web portrait search method of a kind of fusing text semantic and vision content, in conjunction with utilization, its concrete steps of this method are as follows with text and visual signature:
The step 1 network crawl forms local original graph valut
Submit connection and the download function of " query string " realization to the commercial search engine server based on http protocol, download the picture result of commercial photographic search engine and related web page as local image library, the crucial label that extracts original web page simultaneously forms the XML file that the later stage text-processing is used;
Step 2 carries out picture material and text semantic excavates
Adopted good, the fireballing AdaBoost human face detection tech of current detection performance, on the other hand, we excavate the high-level semantic that the page script that comprises picture carries out vector model, and use experience weights and compare based on the dynamic weighting method of PLSA;
The dynamic fusion of step 3 vision and text feature
By a regulatory factor, will to image carry out visually with text on the dynamic combination of signature analysis result, obtain the relevancy ranking value of image and inquiry, thereby the tabulation of rearrangement search engine image result and feeds back to the user.
The present invention has following advantage compared to existing technology: by with the contrast experiment of baseline results tabulation, the result shows, the fused images vision content that we are designed and the retrieval ordering method of text high-level semantic have more excellent retrieval performance, and particularly former pages accuracy rate is excellent more.Usually, the user is only interested in the result who comes the front, so the present invention's advantage more.
Description of drawings
Fig. 1 overall system diagram of the present invention;
Fig. 2 system interface figure of the present invention;
Fig. 3 retrieval and rearrangement be figure as a result;
Fig. 4 PLSA viewpoint model;
Fig. 5 text ranking results;
Fig. 6 Feature Fusion ranking results.
Embodiment
Under this specific application background of Web, text-based image retrieval has been avoided the identification difficult problem to the visualized elements of complexity to a certain extent, meet the retrieval habit that people are familiar with, Web webpage context and hypertext structural information have been made full use of, realize simple, but because still be confined to remit the description image by index terms in the scope of text retrieval, therefore occur easily theme ambiguity, index differ, can't be to problems such as picture material understandings.And CBIR is just the opposite, and main utilization comes the index image to the analysis of the characteristic element of visual pattern, has certain objectivity, determines as the color histogram of every width of cloth image.But the algorithm of CBIR is complicated, realizes the cost height.Thereby in the present invention, we in conjunction with utilization, have realized the prototype system of a Web portrait picture retrieval with text and visual signature.Its concrete steps are as follows:
The step 1 network crawl forms local original graph valut
The present invention makes and has broken away from the traditional programming mode based on Socket of VC in this way by submit the connection and the download function of " query string " realization based on http protocol to the commercial search engine server, has reduced labor capacity, has improved efficiency.The present invention downloads the picture result of commercial photographic search engine and related web page as local image library, and the crucial label that extracts original web page simultaneously forms the XML file that the later stage text-processing is used.
Step 2 carries out picture material and text semantic excavates
Because what the present invention is directed to is the portrait picture, so from image vision content angle, we have adopted good, the fireballing AdaBoost human face detection tech of current detection performance, carry out whether containing in the picture personage's differentiation.On the other hand, we excavate the high-level semantic that the page script that comprises picture carries out vector model, and use experience weights and compare based on the dynamic weighting method of PLSA.
The dynamic fusion of step 3 vision and text feature
By a regulatory factor, will to image carry out visually with text on the dynamic combination of signature analysis result, obtain the relevancy ranking value of image and inquiry, thereby the tabulation of rearrangement search engine image result and feeds back to the user.
By with the contrast experiment of baseline results tabulation, the result shows that the fused images vision content that we are designed and the retrieval ordering method of text high-level semantic have more excellent retrieval performance, particularly former pages accuracy rate is excellent more.Usually, the user is only interested in the result who comes the front, so the present invention's advantage more.
Below in conjunction with the drawings and specific embodiments the present invention is further described.
According to the technical scheme of above introduction, we have realized the prototype system of an image retrieval according to framework of the present invention, shown in Fig. 2 system interface figure of the present invention.System interface mainly is made of 4 parts, and the frame of broken lines with red, green, blue, purple 4 kinds of different colours indicates and number consecutively respectively.Parameter setting and control section that the redness on the left side and 1, No. 2 green frame of broken lines are system are wherein reserved from primary input and the interface of selecting parameter to the user.
Parameter setting and control zone that No. 1 red block is the network crawl download module: in the setting area, the user can be from the primary input keyword, selection wants to link and download the commercial photographic search engine of original graph valut, (number of times is many more if download the number of times that repeated attempt connects when getting nowhere in selection, can guarantee the complete raw data of systems attempt ground download, but spend more time), the quantity of selection download pictures, the local path of picture library etc. is preserved in input; In the control zone, the user can control picture creep the beginning of downloading with stop, and logging off.The selection control zone that No. 2 green frames are shuffle algorithms provides the interface of 4 kinds of algorithms.
No. 3 blue frames and No. 4 purple frames are the picture viewing area, and what wherein show in No. 3 frames is the original image sequence of downloading from commercial photographic search engine, and are presented in No. 4 frames through the image sequence after certain Algorithm Analysis, the rearrangement; Under 3, No. 4 look frames, all have can before and after the control of page turning, make things convenient for and browse sequence of pictures before and after the user, checking and contrasting before and after resetting and before and after the sequence.
What Fig. 3 showed is the actual result example that the present invention moves.
In conjunction with the accompanying drawings, we elaborate the specific embodiment of the present invention.
Shown in Fig. 1 overall system diagram of the present invention, total system process flow diagram of the present invention comprises following components:
L, network crawl form local image library
Be based on first way of search on the search engines such as Google and Baidu what the present invention adopted, can reduce workload like this, the a large amount of data of time flower after creeping need not be sorted out, because Google and Baidu have set up index with all from the picture that each website obtains, thereby guaranteed that the data of being creeped all are pictures and do not comprise other data.In addition, this program is only carried out text analyzing to the webpage at picture place and is not done any processing for the webpage of other and this web page interlinkage, carries out once (degree of depth) again and creeps and get final product so " spider " only needs on kind of a child node (Google and Baidu etc.) basis each is linked.
At this step (seeing Fig. 1 top solid box), can download three class data: thumbnail, original image and comprise the original web page of original image as a result at each result.The local raw image data of the whole formation of these three classes data storehouse.And when downloading original web page, with the closely-related label substance of image, form an XML file and excavate use for text semantic afterwards in the extraction page script.
1. carrying out picture material and text semantic excavates
On the basis of the image data base that forms, the present invention use respectively two independently module carry out the differentiation of picture material and the excavation of text high-level semantic.Wherein, the AdaBoost people's face that has used OpenCV to provide detects the differentiation that integrated function carries out picture material, and uses vector model to carry out the excavation of text high-level semantic, and has used the experience weights and utilized the model of PLSA that it is carried out dynamic weighting.
The idiographic flow that the text high-level semantic excavates is as follows:
(1) Boolean type document vector model:
In the present invention, the keywords of the URL of the replacement text of the title of picture, picture, the URL of picture, former webpage, META label and description attribute and picture be used to make up the Web semantic information unit vector of Boolean type around text: T = ( t 1 . . . t n t ) , N wherein l=7, t n t ∈ { 0,1 } Whether the expression query text appears in the corresponding semantic information unit.
(2) experience weight vectors:
Consider the difference of web producer at aspects such as background and making styles, importance to the relevant semantic information that implied the semantic information unit that is extracted from the web script file also exists difference, can thus be each semantic information unit and composes with corresponding weights to reflect above-mentioned difference.Order W T = ( w 1 t · · · w n t ) Be the weight vector of semantic information unit, wherein w jFor with the t of semantic information unit jCorresponding weights.Observation by the Web page or leaf that contains portrait that the module of creeping the is returned observation of semantic information unit XML document (specifically to) is got in the present invention empirically W T = ( w 1 t w 2 t · · · w 7 t ) = ( 1.5,2.0,0.8,0.8,1.0,1.0,0.5 ) ,
Figure A200810118253D00135
Big more, show that corresponding semantic information unit is important more.Thereby, the semantic relevancy R of Web document TFor:
Figure A200810118253D00136
Semantic relevancy R according to each Web document T, can realize the semantic ordering of script of different web documents.
(3) utilize the PLSA dynamic weighting
From Fig. 4 PLSA viewpoint model as can be known, PLSA viewpoint model is the hidden variable model of symbiosis data (co-occurrencedata), each group speech w ∈ W={w that observation is obtained 1, w 2..., w MAnd document d ∈ D={d 1, d 2..., d N, the implicit classification theme z ∈ Z={z that obtains with a non-observation 1, z 2..., z kConnect.Simultaneously, also be defined as follows probability:
1) by probability P (d i) selected file d i
2) by probability P (z k| d i) choose an implicit class z k
3) by probability P (w j| z k) speech w of generation j
Therefore, just can ignore implicit classification theme z, obtain one group of group observation (d i, w j) joint ensemble, can be expressed as
(a) P ( d i , w j ) = P ( d i ) P ( w j | d i ) , P ( w j | d i ) = Σ z ∈ Z P ( w j | z ) P ( z | d i )
(b) P ( d i , w j ) = Σ z ∈ Z P ( z ) P ( d j | z ) P ( w j | z )
Viewpoint model (b) (see figure 4) of PLSA is introduced the present invention, and the semantic information unit that is extracted from original web page (is crucial label field, field) as the correspondence of observing speech W, i.e. w → f; And use query word to replace hidden variable, be expressed as z → q.
Because under the actual conditions of the present invention's experiment, query word (implicit variable) is known, therefore in the present invention, symmetric parameter model (b) among selection Fig. 4 is asked for the joint probability research object more according to the invention of document and label field, can obtain the Field-Document joint probability thus:
P ( d i , f j ) = Σ q ∈ Q P ( q ) P ( d j | q ) P ( f j | q )
Wherein, Q is the query word set.For query word q, its prior probability P (q) can be counted as a constant.Like this, problem then can further be reduced to for given query word q, the class conditional probability P (f of semantic information unit and document j| q) and P (d i| q).P (f j| q) be the f of semantic information unit jThe word frequency TF (term frequency) that query word in picture library, occurs, i.e. P (f j| q)=n/N d, N dBe the picture library sum, n is for the quantity of documents of query word wherein occurring.
Through the web script, with i web document d iCorresponding vectorial type semantic description is d i={ f I, 1, f I, 2..., f I, 7, its j element definition is:
f i,j=tf j*Portion i,j
Figure A200810118253D00144
In the formula, m iBe illustrated in and remove, the number of the unit of keyword in the 1st ~ 6 semantic information unit, occurs, totalNum and keyNum around outside the text semantic information word iThe sum of representative ring word in textview field and the number of times of keyword occurs respectively.Here, tf jWhat reflect is " word frequency " information that does not rely on certain single document, and Portion I, jThen reflection is the scale factor of the association between the inner semantic information of i document unit.
Query word q for given supposes the N that returns via the module of creeping dIndividual document Gaussian distributed then has:
P(d i|q)=G(d i;μ d,σ d)
So,, can obtain (f with P according to formula 4.9 for each inquiry (query) j, d i)
Field-Document joint distribution matrix for element:
Figure A200810118253D00151
In (2), we have provided the fixedly semantic relevancy measure of experience weights realization web script have been adopted in all inquiries.But for different inquiries, more direct scheme is that different semantic information units is carried out dynamic weighting, thereby dynamically adapts to dissimilar query words.Therefore, the present invention has proposed two kinds again based on the scheme of PLSA to Boolean type semantic description vector dynamic weighting.
1. independent weight vector method: for i web document, according to Field-Document joint distribution matrix P P (f j, d i) j corresponding weights of semantic information unit of conduct and its, promptly w i j = p ( f j , d i ) , The semantic relevancy that can get i web document thus is:
R T i = Σ j = 1 n t t i j · w i j
Because this mode has nothing in common with each other to the weight vectors that each document adopted, by the decision of document self PLSA statistical probability, so this combination is called independent weight vector method.
2. statistical weight vector method: to the weight vectors that all documents use same statistics to draw,, on all documents, ask its expectation value (promptly the matrix corpse being asked average by row) promptly to j semantic information unit, as the weighted value in this territory:
w j = E i [ p ( d j , f i ) ] = 1 1000 Σ i = 1 N d p ( f j , d i )
Thereby the semantic relevancy of i web document is:
R T i = Σ j = 1 n t t i j · w j
Because what in this mode all texts are adopted is the weight vectors that same statistics draws, so this mode is called the statistical weight vector method.It should be noted that, for different inquiries, the weight w of this moment jIt is dynamic change.More than these two kinds of dynamic weighting methods, utilize the method for statistics to finish tolerance to web script semantic relevancy.
(4) vision content and text high-level semantic combines
In the present invention, for vision content being differentiated the result and text semantic combines, also give a weight vector for the vision content vector W V = ( w 1 v · · · w n v ) , N herein v=1, then W V = ( w 1 t ) . Adopt the mode identical again, with it weight vector W with semantic information unit with proper vector TLinear combination obtains the total weight value vector W = ( W T , W V ) = ( w 1 t · · · w 7 t w 1 v ) With the Boolean type feature description vector F=that merges (T, V)=(t 1..., t 7, v 1).
Thereby, by the dot product of the total weight vectors of feature description vector sum, can try to achieve final degree of correlation R:
According to the final degree of correlation, can realize the vision content differentiation and the vectorial ordering that combines of Boolean type semantic description of merging.
Experimental result
Fig. 5 text ranking results, Fig. 6 Feature Fusion ranking results are the contrast and experiment figure of each sort method of proposing of the present invention, selected K=15 English name-to as query word randomly, these 15 people's names are respectively andrea, bruce, fred, gaby, jane, lynette, maria, peter, robinson, simon, wesley, eva, jackcafferty, brucelee, williamshakespeare.As can be seen from the figure, this experiment has increased significantly after than original ordering higher precision ratio, particularly Feature Fusion being arranged.

Claims (1)

1, the Web portrait search method of a kind of fusing text semantic and vision content is characterized in that: in conjunction with utilization, its concrete steps of this method are as follows with text and visual signature:
The step 1 network crawl forms local original graph valut
Submit connection and the download function of " query string " realization to the commercial search engine server based on http protocol, download the picture result of commercial photographic search engine and related web page as local image library, the crucial label that extracts original web page simultaneously forms the XML file that the later stage text-processing is used;
Step 2 carries out picture material and text semantic excavates
Adopted good, the fireballing AdaBoost human face detection tech of current detection performance, on the other hand, we excavate the high-level semantic that the page script that comprises picture carries out vector model, and use experience weights and compare based on the dynamic weighting method of PLSA;
The dynamic fusion of step 3 vision and text feature
By a regulatory factor, will to image carry out visually with text on the dynamic combination of signature analysis result, obtain the relevancy ranking value of image and inquiry, thereby the tabulation of rearrangement search engine image result and feeds back to the user.
CN2008101182533A 2008-08-12 2008-08-12 Web portrait search method for fusing text semantic and vision content Expired - Fee Related CN101388022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101182533A CN101388022B (en) 2008-08-12 2008-08-12 Web portrait search method for fusing text semantic and vision content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101182533A CN101388022B (en) 2008-08-12 2008-08-12 Web portrait search method for fusing text semantic and vision content

Publications (2)

Publication Number Publication Date
CN101388022A true CN101388022A (en) 2009-03-18
CN101388022B CN101388022B (en) 2010-06-09

Family

ID=40477446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101182533A Expired - Fee Related CN101388022B (en) 2008-08-12 2008-08-12 Web portrait search method for fusing text semantic and vision content

Country Status (1)

Country Link
CN (1) CN101388022B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262642A (en) * 2011-01-28 2011-11-30 北京理工大学 Web image search engine and realizing method thereof
CN102289430A (en) * 2011-06-29 2011-12-21 北京交通大学 Method for analyzing latent semantics of fusion probability of multi-modality data
CN104182430A (en) * 2013-05-28 2014-12-03 腾讯科技(深圳)有限公司 Method and device for displaying image in text message
CN104361104A (en) * 2014-11-24 2015-02-18 中国科学技术大学 Efficient image retrieval result quality evaluation method
CN104376105A (en) * 2014-11-26 2015-02-25 北京航空航天大学 Feature fusing system and method for low-level visual features and text description information of images in social media
CN104933029A (en) * 2015-06-23 2015-09-23 天津大学 Text image joint semantics analysis method based on probability theme model
CN105849720A (en) * 2013-11-30 2016-08-10 北京市商汤科技开发有限公司 Visual semantic complex network and method for forming network
CN105912684A (en) * 2016-04-15 2016-08-31 湘潭大学 Cross-media retrieval method based on visual features and semantic features
CN106874862A (en) * 2017-01-24 2017-06-20 复旦大学 People counting method based on submodule technology and semi-supervised learning
CN107480196A (en) * 2017-07-14 2017-12-15 中国科学院自动化研究所 A kind of multi-modal lexical representation method based on dynamic fusion mechanism
CN107766853A (en) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 A kind of generation, display methods and the electronic equipment of the text message of image
CN107977948A (en) * 2017-07-25 2018-05-01 北京联合大学 A kind of notable figure fusion method towards sociogram's picture
CN108334627A (en) * 2018-02-12 2018-07-27 北京百度网讯科技有限公司 Searching method, device and the computer equipment of new media content
CN110019867A (en) * 2017-10-10 2019-07-16 阿里巴巴集团控股有限公司 Image search method, system and index structuring method and medium
CN110110116A (en) * 2019-04-02 2019-08-09 浙江工业大学 A kind of trademark image retrieval method for integrating depth convolutional network and semantic analysis
WO2019169872A1 (en) * 2018-03-09 2019-09-12 北京百度网讯科技有限公司 Method and device for searching for content resource, and server
US10958958B2 (en) 2018-08-21 2021-03-23 International Business Machines Corporation Intelligent updating of media data in a computing environment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4312148A1 (en) * 2022-07-29 2024-01-31 Amadeus S.A.S. Method of identifying ranking and processing information obtained from a document

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100414548C (en) * 2006-09-22 2008-08-27 南京搜拍信息技术有限公司 Search system and technique comprehensively using information of graphy and character
CN101211341A (en) * 2006-12-29 2008-07-02 上海芯盛电子科技有限公司 Image intelligent mode recognition and searching method

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102262642B (en) * 2011-01-28 2013-07-10 北京理工大学 Web image search engine and realizing method thereof
CN102262642A (en) * 2011-01-28 2011-11-30 北京理工大学 Web image search engine and realizing method thereof
CN102289430A (en) * 2011-06-29 2011-12-21 北京交通大学 Method for analyzing latent semantics of fusion probability of multi-modality data
CN102289430B (en) * 2011-06-29 2013-11-13 北京交通大学 Method for analyzing latent semantics of fusion probability of multi-modality data
CN104182430A (en) * 2013-05-28 2014-12-03 腾讯科技(深圳)有限公司 Method and device for displaying image in text message
CN105849720B (en) * 2013-11-30 2019-05-21 北京市商汤科技开发有限公司 Vision semanteme composite network and the method for being used to form the network
CN105849720A (en) * 2013-11-30 2016-08-10 北京市商汤科技开发有限公司 Visual semantic complex network and method for forming network
CN104361104A (en) * 2014-11-24 2015-02-18 中国科学技术大学 Efficient image retrieval result quality evaluation method
CN104361104B (en) * 2014-11-24 2018-01-30 中国科学技术大学 A kind of efficient image searching result quality evaluating method
CN104376105A (en) * 2014-11-26 2015-02-25 北京航空航天大学 Feature fusing system and method for low-level visual features and text description information of images in social media
CN104376105B (en) * 2014-11-26 2017-08-25 北京航空航天大学 The Fusion Features system and method for image low-level visual feature and text description information in a kind of Social Media
CN104933029A (en) * 2015-06-23 2015-09-23 天津大学 Text image joint semantics analysis method based on probability theme model
CN105912684B (en) * 2016-04-15 2019-07-26 湘潭大学 The cross-media retrieval method of view-based access control model feature and semantic feature
CN105912684A (en) * 2016-04-15 2016-08-31 湘潭大学 Cross-media retrieval method based on visual features and semantic features
CN107766853A (en) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 A kind of generation, display methods and the electronic equipment of the text message of image
CN107766853B (en) * 2016-08-16 2021-08-06 阿里巴巴集团控股有限公司 Image text information generation and display method and electronic equipment
CN106874862B (en) * 2017-01-24 2021-06-04 复旦大学 Crowd counting method based on sub-model technology and semi-supervised learning
CN106874862A (en) * 2017-01-24 2017-06-20 复旦大学 People counting method based on submodule technology and semi-supervised learning
CN107480196A (en) * 2017-07-14 2017-12-15 中国科学院自动化研究所 A kind of multi-modal lexical representation method based on dynamic fusion mechanism
CN107480196B (en) * 2017-07-14 2020-02-07 中国科学院自动化研究所 Multi-modal vocabulary representation method based on dynamic fusion mechanism
CN107977948A (en) * 2017-07-25 2018-05-01 北京联合大学 A kind of notable figure fusion method towards sociogram's picture
CN107977948B (en) * 2017-07-25 2019-12-24 北京联合大学 Salient map fusion method facing community image
CN110019867A (en) * 2017-10-10 2019-07-16 阿里巴巴集团控股有限公司 Image search method, system and index structuring method and medium
CN108334627A (en) * 2018-02-12 2018-07-27 北京百度网讯科技有限公司 Searching method, device and the computer equipment of new media content
CN108334627B (en) * 2018-02-12 2022-09-23 北京百度网讯科技有限公司 Method and device for searching new media content and computer equipment
WO2019169872A1 (en) * 2018-03-09 2019-09-12 北京百度网讯科技有限公司 Method and device for searching for content resource, and server
US10958958B2 (en) 2018-08-21 2021-03-23 International Business Machines Corporation Intelligent updating of media data in a computing environment
CN110110116B (en) * 2019-04-02 2021-04-06 浙江工业大学 Trademark image retrieval method integrating deep convolutional network and semantic analysis
CN110110116A (en) * 2019-04-02 2019-08-09 浙江工业大学 A kind of trademark image retrieval method for integrating depth convolutional network and semantic analysis

Also Published As

Publication number Publication date
CN101388022B (en) 2010-06-09

Similar Documents

Publication Publication Date Title
CN101388022B (en) Web portrait search method for fusing text semantic and vision content
Salloum et al. Mining social media text: extracting knowledge from Facebook
Beinglass et al. Articulated object recognition, or: How to generalize the generalized hough transform
JP3598742B2 (en) Document search device and document search method
US9262532B2 (en) Ranking entity facets using user-click feedback
CN110597981B (en) Network news summary system for automatically generating summary by adopting multiple strategies
CN110968782B (en) User portrait construction and application method for learner
CN103226578B (en) Towards the website identification of medical domain and the method for webpage disaggregated classification
CN103631794B (en) A kind of method, apparatus and equipment for being ranked up to search result
CN106815297A (en) A kind of academic resources recommendation service system and method
CN102955848B (en) A kind of three-dimensional model searching system based on semanteme and method
CN103455487B (en) The extracting method and device of a kind of search term
CN104268148B (en) A kind of forum page Information Automatic Extraction method and system based on time string
Zhou et al. Conceptlearner: Discovering visual concepts from weakly labeled image collections
TWI695277B (en) Automatic website data collection method
CN101359332A (en) Design method for visual search interface with semantic categorization function
CN104484431A (en) Multi-source individualized news webpage recommending method based on field body
CN106503211A (en) Information issues the method that the mobile edition of class website is automatically generated
Tekli An overview of cluster-based image search result organization: background, techniques, and ongoing challenges
CN110970112A (en) Method and system for constructing knowledge graph for nutrition and health
Chau et al. Comparison of two approaches to building a vertical search tool: a case study in the nanotechnology domain
Luo et al. Product review information extraction based on adjective opinion words
CN103136223A (en) Method and device for mining query with similar requirements
Saenko et al. Filtering abstract senses from image search results
Banu et al. A novel ensemble vision based deep web data extraction technique for web mining applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100609

Termination date: 20120812