CN101246502B - Method and system for searching pictures in network - Google Patents

Method and system for searching pictures in network Download PDF

Info

Publication number
CN101246502B
CN101246502B CN2008100880561A CN200810088056A CN101246502B CN 101246502 B CN101246502 B CN 101246502B CN 2008100880561 A CN2008100880561 A CN 2008100880561A CN 200810088056 A CN200810088056 A CN 200810088056A CN 101246502 B CN101246502 B CN 101246502B
Authority
CN
China
Prior art keywords
classification
picture
word
weight
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2008100880561A
Other languages
Chinese (zh)
Other versions
CN101246502A (en
Inventor
田密
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tencent Computer Systems Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN2008100880561A priority Critical patent/CN101246502B/en
Publication of CN101246502A publication Critical patent/CN101246502A/en
Application granted granted Critical
Publication of CN101246502B publication Critical patent/CN101246502B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for searching a picture in a network, comprising the steps of: determining a major classification of an inquired word according to a preset word class library; searching each picture relevant to the inquired word, obtaining classification weight of the each picture on the website to the major classification respectively according to a preset website class library; obtaining a description weight of the each picture on webpage to the major classification respectively according to the preset webpage class library; extracting a picture with a comprehensive relevance more than threshold according to the comprehensive relevance of each picture calculated by the classification weight and the description weight. The invention also discloses a system for searching a picture in a network. The invention solves the problem of weak relevance of searched picture to the inquired work in current technique and the problem of lower experiencing of user. The invention is capable of obtaining close relevance of searched picture to inquired work and improving experiencing of user.

Description

A kind of on network the method and system of search pictures
Technical field
The present invention relates to the picture searching field, particularly relate to a kind of on network the method and system of search pictures.
Background technology
The search picture relevant with query word is a kind of important application of search engine on network.During search, search engine is mainly described the correlativity of text and query word according to picture, judges whether being closely related of this picture and query word, in this way, then extracts this picture.But, because of picture the polysemy and the mistake of text are described, it is directly related with query word that picture is described text, can not guarantee that picture and query word are closely related, and makes the picture of search can not satisfy user's demand well.
For example, Tiger not only can be a kind of description text of animal picture, also can be the description text of certain golf star's picture; " apple " not only can be a kind of description text of fruit picture, can also be the description text of certain famous scientific ﹠ technical corporation picture.When the user used the query word Tiger to inquire about required animal picture, the picture that search engine is described text search according to picture just was likely certain golf star's picture.When the user used query word " apple " to inquire about required fruit picture, the picture that search engine is described text search according to picture just was likely certain famous scientific ﹠ technical corporation picture.
Again for example, the picture of a sheep, it describes text possibility " horse ", and a women online friend who makes laughs autodynes, and it describes text may be " beauty ".Like this, search engine according to picture the picture that text extracts is described may be disorderly and unsystematic.
At present, addressing the above problem the most frequently used method is manually to be given a mark in each big website, each website on the internet roughly is divided into " professional website ", " common website ", " rubbish website ", during search, describe under the approximate prerequisite of the degree of correlation of text and query word at picture, from the picture weight of professional website greater than common website, from the picture weight of common website greater than the rubbish website.Press the weight of picture again, sequencing display.
But a website is divided into professional website, can not guarantee that it is all professional to any query word, and professional website is only professional at class query word performance, and can not be all professional to all query words.For example, the professional website of a stars, for query word " horse ", what return is the picture in the singer horse sky, causes the picture and the query word correlativity of searching for poor, reduces sense of experience of users.
Website on the internet is ten hundreds of, relies on each website of artificial enquiry, and marking, not only wastes great amount of manpower, Cha Xun website limited amount also, and coverage rate is low, influences the effect of picture searching.
Summary of the invention
Technical matters to be solved by this invention provide a kind of on network the method for search pictures, poor to solve the picture and the query word correlativity of searching in the prior art, the problem that the user experience sense is lower.This method can make the picture of search and query word be closely related, and improves sense of experience of users.
Another object of the present invention provide a kind of on network the system of search pictures, poor to solve the picture and the query word correlativity of searching in the prior art, the problem that the user experience sense is lower.This system can make the picture of search and query word be closely related, and improves sense of experience of users.
The present invention disclose a kind of on network the method for search pictures, this method comprises: the word's kinds storehouse that contrast is preset, determine classification under the query word, described word's kinds storehouse comprises the feature word of each classification; Search for each picture relevant with described query word, the websites collection storehouse that contrast is preset obtains the classification weight of described each website, picture place for above-mentioned classification respectively, and described websites collection storehouse comprises the classification weight of each website for described classification; The Web page classifying storehouse that contrast is preset obtains the description weight of described each picture place webpage for above-mentioned classification respectively, and described Web page classifying storehouse comprises the description weight of each webpage for described classification; According to the comprehensive correlativity of described classification weight and described each picture of description weight calculation, extract the picture of comprehensive correlativity greater than threshold value.
Preferably, before the Web page classifying storehouse that contrast is preset, also comprise: dividing the picture searching field is some classification; Be each classification setting classified description speech; Utilize above-mentioned classified description speech to calculate on the internet each webpage respectively, form the Web page classifying storehouse at the description weight of each classification.
Preferably, utilize above-mentioned classified description speech to calculate respectively that each webpage is at the description weight of each classification on the internet, computing method are specially: each classified description speech of adding up certain classification multiply by corresponding coefficient at this webpage frequency of occurrence; Add up each classified description speech and the position occurs, multiply by corresponding coefficient at this webpage; With above-mentioned product addition, obtain the description weight of this webpage at this classification.
Preferably, before the websites collection storehouse that contrast is preset, also comprise: dividing the picture searching field is some classification; Be each classification setting classification benchmark speech; Utilize above-mentioned classification benchmark speech to calculate on the internet each website respectively, form the websites collection storehouse at the classification weight of each classification.
Preferably, utilize above-mentioned classification benchmark speech to calculate respectively that each website is at the classification weight of each classification on the internet, computing method are specially: the benchmark speech of respectively classifying of adding up certain classification multiply by corresponding coefficient at the frequency of occurrence of this website; The statistics benchmark speech picture that is associated in this website of respectively classifying is counted sum, multiply by corresponding coefficient; Calculate the ratio that the above-mentioned picture that is associated accounts for this website picture sum, multiply by corresponding coefficient; Above-mentioned three product additions obtain the classification weight of this website at this classification.
Preferably, utilize above-mentioned classification benchmark speech to calculate respectively that each website is at the classification weight of each classification on the internet, computing method are: the benchmark speech of respectively classifying of adding up certain classification is at the frequency of occurrence of this website, multiply by corresponding coefficient after, add 1; The statistics benchmark speech picture that is associated in this website of respectively classifying is counted sum, multiply by corresponding coefficient after, add 1; Calculate the ratio that the above-mentioned picture that is associated accounts for this website picture sum, multiply by corresponding coefficient after, add 1; Three of obtaining of aforementioned calculation and multiply each other after, subtract 1, obtain the classification weight of this website at this classification.
Preferably, before the word's kinds storehouse that contrast is preset, also comprise: add up the occurrence number of each word respectively in each website; At each word, extract the website of this word occurrence number greater than default value, obtain the highest classification of above-mentioned websites collection weight, incorporate the word of this word into for this classification, form the word's kinds storehouse.
Preferably, also comprise: extract the maximum website of this word occurrence number, obtain the highest classification of this websites collection weight, as the Main classification under this word, other is classified as the subseries under this word with this classification.
Preferably, also comprise: belong at least two classification as query word, for the picture of subseries under this query word is set up link; Show the picture of Main classification and the link of subseries.
Preferably, also comprise: the clicked number of times of adding up each category images; Obtain the maximum classification of the clicked number of times of picture, show the picture of this classification.
The present invention also disclose a kind of on network the system of search pictures, this system comprises the query word sort module, the classification weight computation module, weight computation module, comprehensive correlation calculations module, and picture extraction module are described: described query word sort module, be used to contrast the word's kinds storehouse of presetting, determine to classify under the query word, described word's kinds storehouse comprises the feature word of each classification; Described classification weight computation module, be used to search for each picture relevant with described query word, the websites collection storehouse that contrast is preset obtains the classification weight of described each website, picture place for this classification respectively, and described websites collection storehouse comprises the classification weight of each website for described classification; Described description weight computation module is used to contrast the Web page classifying storehouse of presetting, and obtains the description weight of described each picture place webpage for this classification respectively, and described Web page classifying storehouse comprises the description weight of each webpage for described classification; Described comprehensive correlation calculations module is used for the comprehensive correlativity according to described classification weight and described each picture of description weight calculation; Described picture extraction module is used to extract the picture of comprehensive correlativity greater than threshold value.
Preferably, this system comprises that also module is divided in the picture searching field, the classified description speech is provided with module, reaches Web page classifying storehouse composition module; Module is divided in described picture searching field, and being used to divide the picture searching field is some classification; Described classified description speech is provided with module, is used to each classification setting classified description speech; Module is formed in described Web page classifying storehouse, is used to utilize above-mentioned classified description speech to calculate on the internet each webpage respectively and forms the Web page classifying storehouse at the description weight of each classification.
Preferably, this system comprises that also classification benchmark speech is provided with module and module is formed in the websites collection storehouse; Described classification benchmark speech is provided with module, is used to each classification setting classification benchmark speech; Module is formed in described websites collection storehouse, is used to utilize above-mentioned classification benchmark speech to calculate on the internet each website respectively and forms the websites collection storehouse at the classification weight of each classification.
Preferably, also comprise word statistical module and word's kinds storehouse composition module: described word statistical module is used for adding up respectively the occurrence number of each word in each website; Module is formed in described word's kinds storehouse, is used at each word, extracts the website of this word occurrence number greater than default value, obtains the highest classification of above-mentioned websites collection weight, incorporates the word of this word for this classification into, forms the word's kinds storehouse.
Compared with prior art, the present invention has the following advantages:
The present invention is sub-divided into the classification weight of website and the description weight of webpage in each classification, the picture that obtains at the foundation query word, according to the website at picture place and webpage at the classification weight of classifying under the query word with describe weight, calculate the comprehensive correlativity of picture and query word, this comprehensive correlativity has been taken all factors into consideration the professional degree at this classification of classification, website and the webpage of query word, the picture and the query word of search are closely related, improve sense of experience of users.
Description of drawings
Fig. 1 is the method first embodiment process flow diagram of the present invention's search pictures on network;
Fig. 2 presets Web page classifying storehouse process flow diagram for the present invention;
Fig. 3 presets the process flow diagram in websites collection storehouse for the present invention;
Fig. 4 presets word's kinds storehouse process flow diagram for the present invention;
Fig. 5 is the method second embodiment process flow diagram of the present invention's search pictures on network;
Fig. 6 is method the 3rd embodiment process flow diagram of the present invention's search pictures on network;
Fig. 7 is system's first embodiment synoptic diagram of the present invention's search pictures on network;
Fig. 8 is system's second embodiment synoptic diagram of the present invention's search pictures on network.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
The present invention carries out subject classification to query word and website, and query word and websites collection coupling factor are joined in the comprehensive correlation calculations of picture.The classification weight height of the website at picture place to classifying under the query word, this website and query word degree of correlation height are described, the description weight height of the webpage at picture place to classifying under the query word, the relevant height of this webpage and query word is described, website and the webpage higher in classify relatively weight and description weight extract picture, can guarantee picture in the required theme of user, and be closely related with query word.
Consult Fig. 1, method first embodiment of the present invention's search pictures on network is shown, concrete steps are as follows.
Step S101, preset word's kinds storehouse, websites collection storehouse, and Web page classifying storehouse.Field common in the picture searching is divided into several classification, and the principle of division is that classification is distinct, intersects little between each classification.As be divided into " animals and plants " classification, " personage " classification, " landscape " classification, " military affairs " classification or the like.
The word's kinds storehouse comprises the feature word of each classification, and the word in the word's kinds storehouse is comprehensive relatively, can comprise the various query words of user's normal use.The recording user query word can be passed through in the word's kinds storehouse, for query word sort out mode obtain, can also on network, collect everyday expressions, the mode of sorting out for each word obtains.
The websites collection storehouse comprises on the internet each website for the classification weighted value of each classification, but the classification weight can embody a concentrated reflection of degree of confidence and the professional degree of this website for this classification.
The Web page classifying storehouse comprises that each webpage can embody a concentrated reflection of degree of confidence and the professional degree of this webpage for this classification for the description weight of each classification but describe weight on the Internet.
Determine to classify under the query word in step S102, contrast word's kinds storehouse.Extract the query word of user's input, compare, determine the classification under this query word with word in the word's kinds storehouse.
Step S103, each picture that search is relevant with query word.The web search server is searched for the picture directly related with this query word on network.Way of search can be obtained and describe the text picture directly related with query word by judging whether the description text of picture is relevant with query word.
Step S104, obtain the classification weight of each website, picture place respectively in the websites collection storehouse for this Main classification.Obtain the website at each picture place, contrast website class library obtains the classification weight of website for this classification.
Step S105, obtain the description weight for this classification of each picture place webpage respectively in the Web page classifying storehouse.Obtain the webpage at each picture place, contrast webpage class library obtains the classification weight of webpage for this Main classification.
The comprehensive correlation of step S106, foundation classification weight and each picture of description weight calculation.The calculating formula is as follows:
W=a×(Wsite?Rank)+b×(Page?Rank);
Wherein, Wsite Rank is the classification weight of website, and Page Rank is the description weight of webpage, and a, b are coefficient, can suitably adjust the value of a, b according to the difference of classification.
Certainly, the present invention can consider that also picture describes the text factor, and other correlative factor, above-mentioned factor is joined in the comprehensive correlation value calculation of picture, calculates the comprehensive correlation of picture.
Step S107, extract the picture of comprehensive correlation greater than threshold value.The comprehensive correlation of every pictures is compared with preset threshold, as greater than, then extract this picture, and send to subscription client; As less than, then abandon this picture.
The present invention is sub-divided into the classification weight of website and the description weight of webpage in each classification, the picture that obtains at the foundation query word, according to the website at picture place and webpage at the classification weight of classifying under the query word with describe weight, calculate the comprehensive correlativity of picture and query word, this comprehensive correlativity has been taken all factors into consideration the professional degree at this classification of classification, website and the webpage of query word, make the picture of extraction can be good at concentrating on the classification at query word place, improve the degree of correlation with query word.
The present invention is each classification setting classified description speech, in the occurrence number of webpage with the position occurs, calculates on the internet each webpage at the description weight of each classification according to the classified description speech.The core concept of describing weight calculation is that certain classified description speech of hitting of webpage is many more, and the position that these classified description speech occur in webpage is important more, but this webpage is just big more to the degree of confidence of this classification.
Consult Fig. 2, the present invention is shown presets Web page classifying storehouse flow process, specifically may further comprise the steps.
Step S201, division picture searching field are some classification.Field common in the picture searching is divided into several classification, and the principle of division is that classification is distinct, intersects little between each classification.
Step S202, be each classification setting classified description speech.Be each classification-designated some classified description speech, the classified description speech can be understood as the subclassification name of a classification, statement be some common themes in this classification.For example, the classified description speech of speech such as " football ", " basketball ", " table tennis " as " physical culture " classification.The classified description speech can obtain from the classification navigation directory page or leaf of some professional websites.
Step S203, utilize above-mentioned classified description speech to calculate on the internet each webpage respectively at the description weight of each classification.Computing formula can be:
Weight ( page , class ) = Σ i = 1 n Weight ( Location [ i ] ) ;
Weight(Location)=a*Weight(hit?word)+b*Weight(hit?word?loc);
Wherein, Weight (hit word) represents the number of times that certain classified description speech occurs, and Weight (hit wordloc) represents the position that this classified description speech occurs, and a, b are coefficient, can be according to the difference of classification and position different, suitably adjust the value of a, b.
The importance of webpage position can be divided into three class, and first grade comprises position such as navigation text etc., and second grade comprises position such as web page title, and third gear comprises around positions such as texts.The high more coefficient of correspondence of class is high more.For example, certain webpage hits the classified description speech " mammal " of " animals and plants " class, this classified description speech appears at navigational field, " homepage〉〉 picture materials〉〉 animal〉mammal ", this webpage is very big for the description weights of " animals and plants " class so, and this webpage is that the possibility of professional webpage of " animals and plants " classification is higher.
Step S204, composition Web page classifying storehouse.The description weight for each classification of each webpage on the internet is generalized into form,, is stored in the web search server as the Web page classifying storehouse.
The number that the present invention occurs in webpage according to the classified description speech and the position occurs is calculated the description weight of this webpage for this classification, describes weight and can be good at embodying degree of confidence and the professional degree of this webpage for this classification but make.
The present invention is each classification setting classification benchmark speech, utilizes classification benchmark speech to calculate on the internet each website respectively at the classification weight of each classification.The basic thought of classification weight calculation is that the classification benchmark speech that hits in this website is many more, it is big more that the picture that these classification benchmark speech hit is counted sum, the ratio that the picture that hits is counted the picture sum that sum accounts for this website is big more, but this website is also just big more to the degree of confidence of this classification.
Consult Fig. 3, the flow process that presets the websites collection storehouse is shown, concrete steps are as described below.
Step S301, division picture searching field are some classification.Field common in the picture searching is divided into several classification, and the principle of division is that classification is distinct, intersects little between each classification.
Step S302, be each classification setting classification benchmark speech.Classification benchmark speech is unique to belong to certain classification, is the word of this characteristic of division of concentrated expression, for example, classify for " animals and plants ", the animals and plants name that some are common is referred to as classification benchmark speech, and for " landscape " classification, the landscape sight spot that some are famous is as classification benchmark speech.Classification benchmark speech can obtain in the classified browse page or leaf of professional website.
Step S303, utilize classification benchmark speech to calculate on the internet each website respectively at the classification weight of each classification.The calculating formula can be:
Weight(site,class)=(1+αWeight(word?num))*(1+βWeight(pic?num))*(1+γ?Weight(pic?percent))-1;
Wherein, Weight (word num) the classification benchmark speech quantity of hitting for this website; Weight (pic num) hits picture for classification benchmark speech in this website quantity; The quantity that Weight (pic percent) hits picture accounts for the ratio of total picture number; α, β, γ are coefficient.
Calculating formula also can be:
Weight(site,class)=a*Weight(hit?word)+b*Weight(hit?pic)+c*Weight(percent);
Wherein, Weight (word num) the classification benchmark speech quantity of hitting for this website; Weight (pic num) hits picture for classification benchmark speech in this website quantity; The quantity that Weight (pic percent) hits picture accounts for the ratio of total picture number; A, b, c are coefficient.
For example, the classification benchmark speech of 50 " animals and plants " classes has been hit in certain website, these 50 classification benchmark speech have hit 800 pictures altogether, and this website has only 1000 pictures altogether, hit picture and account for 80%, this website is very big for the classification weights of " animals and plants " class so, and this website is that the professional website possibility of " animals and plants " classification is bigger.
Step S304, composition websites collection storehouse.The description weight for each classification of each website on the internet is generalized into form,, is stored in the web search server as the websites collection storehouse.
The present invention takes all factors into consideration the classification benchmark speech quantity of hitting the website, the picture number that this classification benchmark speech hits, and the shared ratio of picture number of hitting, but make the classification weight of website can be good at embodying degree of confidence and the professional degree of this website for this classification.
The present invention also can pass through the recording user query word, according to the word storehouse, adds up the frequency of occurrence of each word in each website respectively, frequency of occurrence is incorporated into greater than the word of this default value be the highest classification of this websites collection weighted value.
Consult Fig. 4, the present invention is shown presets word's kinds storehouse flow process, concrete steps are as described below.
Step S401, add up the occurrence number of each word respectively in each website.The present invention is by the query word of recording user, or transfers word in the word storehouse, or extracts mode such as word and obtain each word on professional website, adds up the occurrence number of each word in each website respectively.
Step S402, occurrence number is compared with default value, as greater than, obtain the highest classification of above-mentioned websites collection weight, incorporate this word into and be this classification word; As less than, abandon this word.
For example, word " apple " is 50 times in certain website occurrence number, and greater than default value 30 times, the classification weight for " fruit " classification of this website is the highest, and then word " apple " being incorporated into is " fruit " classification.
Step S403, extract the maximum website of this word occurrence number, obtain the highest classification of this websites collection weight, as the Main classification under this word, other is classified as the subseries under this word with this classification.
Step S404, composition word's kinds storehouse.With the word of each subfield and aforementioned be each classification setting classification benchmark speech together, form the word's kinds storehouse.
The present invention makes the correlativity of word in the embodiment that can concentrate and this website by the classification under judging word in the website occurrence number, just can embody a concentrated reflection of the feature of this website.By constantly the user inquiring speech being joined in the classificating word repertorie, make classificating word repertorie encompasses users query word commonly used, covering scope is wide.According to this classificating word repertorie, can accurately classify to the employed query word of user.
A query word may only belong to a classification, also may belong to a plurality of classification, if a query word only belongs to a classification, directly provide picture according to the search of this classification, if but a query word has a plurality of classification, only provide the picture of fixing according to the search of certain classification, can make and want that the user experience of checking other category images reduces.The present invention sets up a cover with the preferential indexed results of this classification and ordination to belonging to the query word of a plurality of classification for each classification, and more flexibility is arranged when representing search pictures.
The present invention can directly provide the search pictures of this query word Main classification, for the search pictures of this other classification of query word, can provide peer link, the user wants to check the search pictures of this other classification of query word, click this peer link, can show the search pictures of this classification.
Consult Fig. 5, method second embodiment of the present invention's search pictures on network is shown, concrete steps are as follows.
Step S501, preset classificating word repertorie, websites collection storehouse, and Web page classifying storehouse.
Determine to classify under the query word in step S502, contrast word's kinds storehouse.Extract the query word of user's input, compare, determine the classification under this query word with word in the word's kinds storehouse.
Step S503, web search server are searched for the picture directly related with this query word on network.
Step S504, obtain the classification weight of each website, picture place respectively in the websites collection storehouse for this classification.
Step S505, obtain the description weight for this classification of each picture place webpage respectively in the Web page classifying storehouse.
The comprehensive correlation of step S506, foundation classification weight and each picture of description weight calculation.
Step S507, extract the picture of comprehensive correlation greater than threshold value.
Step S508, judge whether this query word has a plurality of classification,, directly show the picture of extraction as not having; If any, forward step S509 to.
Step S509, the picture of each subseries under this query word is stored in the server respectively, and sets up peer link respectively, show picture and each peer link of Main classification.For example, the user uses query word " apple ", provides the search pictures of Main classification " animals and plants " class, provide " you will check that apple is at digital product class search pictures? " simultaneously link.
Step S510, click this link, server is transferred the search pictures of this classification.For example, the user clicks " you will check that apple is at digital product class search pictures? " link, obtain the search pictures of " digital product " class.
The present invention is by directly showing the search pictures of query word Main classification, then provides peer link for the picture of each subseries, can guarantee the comprehensive of the picture searched for, also can allow the picture of demonstration not mixed and disorderly, makes things convenient for the user to check.
The present invention also can add up the clicked number of times of each category images, obtains the maximum classification of the clicked number of times of picture, directly shows the search pictures of this classification, makes the user can view required picture quickly and easily.
Consult Fig. 6, method the 3rd embodiment of the present invention's search pictures on network is shown, concrete steps are as follows.
Step S601, preset classificating word repertorie, websites collection storehouse, and Web page classifying storehouse, add up the clicked number of times of each category images.Add up after the user used this query word search pictures in the past, the user clicks the number of times of each category images, is recorded in the classificating word repertorie.
The affiliated Main classification of query word is determined in step S602, contrast word's kinds storehouse.Extract the query word of user's input, compare, determine the classification under this query word with word in the word's kinds storehouse.
Step S603, web search server are searched for the picture directly related with this query word on network.
Step S604, obtain the classification weight of each website, picture place respectively in the websites collection storehouse for this classification.
Step S605, obtain the description weight for this classification of each picture place webpage respectively in the Web page classifying storehouse.
The comprehensive correlation of step S606, foundation classification weight and each picture of description weight calculation.
Step S607, extract the picture of comprehensive correlation greater than threshold value.
Step S608, obtain the maximum classification of the clicked number of times of picture, show the picture of this classification.For example, the user uses query word " apple " search pictures, after the user used query word " apple " search pictures in the past, click check to many be the picture of " digital product " classification, then directly show the picture of " digital product " classification.
After the present invention uses the query word search pictures by recording user, the classification that the clicked number of times of recordable picture is maximum, what show that the user needs most is the picture of this classification, then directly shows the picture of this classification, makes the user check the picture of this classification quickly and easily.
Based on above-mentioned on network the method for search pictures, the present invention also provide a kind of on network the system of search pictures.This system can make the picture of search and query word be closely related, and improves sense of experience of users.
Participate in Fig. 7, first embodiment of system of the present invention's search pictures on network is shown, comprise query word sort module 71, picture searching module 72, classification weight computation module 73, describe weight computation module 74, comprehensive correlation calculations module 75, and picture extraction module 76.
Determine to classify under the query word in the word's kinds storehouse that 71 contrasts of query word sort module are preset.Query word sort module 71 is extracted the query word of users' input, compares with word in the word's kinds storehouse, determines the classification under this query word, and this classified information is sent to classification weight computation module 73 and describes weight computation module 74.
Picture searching module 72 search each picture directly related with query word sends to classification weight computation module 73 and describes weight computation module 74.
The websites collection storehouse that 73 contrasts of classification weight computation module are preset obtains the classification weight of each website, picture place for this Main classification respectively, and sends to comprehensive correlation calculations module 75.
Describe the Web page classifying storehouse that weight computation module 74 contrasts are preset, obtain the description weight for this Main classification of each picture place webpage respectively, and send to comprehensive correlation calculations module 75.
Comprehensive correlation calculations module 75 is according to the comprehensive correlativity of classification weight and each picture of description weight calculation, and result of calculation sends to picture extraction module 76.
Picture extraction module 76 extracts the picture of comprehensive correlativity greater than threshold value in picture searching module 72.
Participate in Fig. 8, second embodiment of system of the present invention's search pictures on network is shown, query word sort module 71, picture searching module 72, classification weight computation module 73, describe weight computation module 74, comprehensive correlation calculations module 75, picture extraction module 76, picture searching field divide module 77, classified description speech be provided with module 78, Web page classifying storehouse form module 79, classification benchmark speech be provided with module 80, websites collection storehouse form module 81, word statistical module 82, and the word's kinds storehouse form module 83.
It is some classification that module 77 division picture searching fields are divided in the picture searching field, and the principle of division is that classification is distinct, intersects little between each classification.The picture searching field is divided module 77 and will be divided the result and send to that the classified description speech is provided with module 78 and the benchmark speech of classifying is provided with module 80.
The classified description speech is provided with module 78 and is each classification setting classified description speech, and the classified description speech can be understood as the subclassification name of a classification, statement be some common themes in this classification.The classified description speech is provided with module 78 the classified description speech is sent to Web page classifying storehouse composition module 79.
The Web page classifying storehouse is formed module 79 and is utilized above-mentioned classified description speech to calculate on the internet each webpage respectively to form the Web page classifying storehouse at the description weight of each classification, send to and describe weight computation module 74.
Classification benchmark speech is provided with module 80 and is each classification setting classification benchmark speech, and classification benchmark speech is unique to be belonged to certain and classify, and is the word of this characteristic of division of concentrated expression.Classification benchmark speech is provided with the module 80 benchmark speech of will classify and sends to websites collection storehouse composition module 81.
The websites collection storehouse is formed module 81 and is utilized above-mentioned classification benchmark speech to calculate on the internet each website respectively to form the websites collection storehouse at the classification weight of each classification, send to classification weight computation module 73.
Word statistical module 82 is added up the occurrence number of each word in each website respectively.Word statistical module 82 is by the query word of recording user, or transfers word in the word storehouse, or extracts mode such as word and obtain each word on professional website, adds up the occurrence number of each word in each website respectively.Word statistical module 82 sends to the word's kinds storehouse with statistics and forms module 83.
Module 83 is formed at each word in the word's kinds storehouse, extract the website of this word occurrence number, obtain the highest classification of above-mentioned websites collection weight, incorporate this word into and be this classification word greater than default value, form the word's kinds storehouse, send to query word sort module 71.
Query word sort module 71, picture searching module 72, classification weight computation module 73, describe weight computation module 74, comprehensive correlation calculations module 75, and picture extraction module 76 function in the present embodiment and effect with embodiment illustrated in fig. 7 identical, repeat no more.
More than to provided by the present invention a kind of on network the method and system of the picture of searching, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (14)

1. the method for a search pictures on network is characterized in that, this method comprises:
Determine to classify under the query word in the word's kinds storehouse that contrast is preset, described word's kinds storehouse comprises the feature word of each classification;
Search for each picture relevant with described query word, the websites collection storehouse that contrast is preset obtains the classification weight of described each website, picture place for above-mentioned classification respectively, and described websites collection storehouse comprises the classification weight of each website for described classification;
The Web page classifying storehouse that contrast is preset obtains the description weight of described each picture place webpage for above-mentioned classification respectively, and described Web page classifying storehouse comprises the description weight of each webpage for described classification;
According to the comprehensive correlativity of described classification weight and described each picture of description weight calculation, extract the picture of comprehensive correlativity greater than threshold value.
2. the method for claim 1 is characterized in that, before the Web page classifying storehouse that contrast is preset, also comprises:
Dividing the picture searching field is some classification;
Be each classification setting classified description speech;
Utilize above-mentioned classified description speech to calculate on the internet each webpage respectively, form the Web page classifying storehouse at the description weight of each classification.
3. method as claimed in claim 2 is characterized in that, utilizes above-mentioned classified description speech to calculate respectively that each webpage is at the description weight of each classification on the internet, and computing method are specially:
Each classified description speech of adding up certain classification multiply by corresponding coefficient at this webpage frequency of occurrence;
Add up each classified description speech and the position occurs, multiply by corresponding coefficient at this webpage;
With above-mentioned product addition, obtain the description weight of this webpage at this classification.
4. the method for claim 1 is characterized in that, before the websites collection storehouse that contrast is preset, also comprises:
Dividing the picture searching field is some classification;
Be each classification setting classification benchmark speech;
Utilize above-mentioned classification benchmark speech to calculate on the internet each website respectively, form the websites collection storehouse at the classification weight of each classification.
5. method as claimed in claim 4 is characterized in that, utilizes above-mentioned classification benchmark speech to calculate respectively that each website is at the classification weight of each classification on the internet, and computing method are specially:
The benchmark speech of respectively classifying of adding up certain classification multiply by corresponding coefficient at the frequency of occurrence of this website;
The statistics benchmark speech picture that is associated in this website of respectively classifying is counted sum, multiply by corresponding coefficient;
Calculate the ratio that the above-mentioned picture that is associated accounts for this website picture sum, multiply by corresponding coefficient;
Above-mentioned three product additions obtain the classification weight of this website at this classification.
6. method as claimed in claim 4 is characterized in that, utilizes above-mentioned classification benchmark speech to calculate respectively that each website is at the classification weight of each classification on the internet, and computing method are:
The benchmark speech of respectively classifying of adding up certain classification is at the frequency of occurrence of this website, multiply by corresponding coefficient after, add 1;
The statistics benchmark speech picture that is associated in this website of respectively classifying is counted sum, multiply by corresponding coefficient after, add 1;
Calculate the ratio that the above-mentioned picture that is associated accounts for this website picture sum, multiply by corresponding coefficient after, add 1;
Three of obtaining of aforementioned calculation and multiply each other after, subtract 1, obtain the classification weight of this website at this classification.
7. the method for claim 1 is characterized in that, before the word's kinds storehouse that contrast is preset, also comprises:
Add up the occurrence number of each word respectively in each website;
At each word, extract the website of this word occurrence number greater than default value, obtain the highest classification of above-mentioned websites collection weight, incorporate the word of this word into for this classification, form the word's kinds storehouse.
8. method as claimed in claim 7 is characterized in that, also comprises:
Extract the maximum website of this word occurrence number, obtain the highest classification of this websites collection weight, as the Main classification under this word, other is classified as the subseries under this word with this classification.
9. as each described method of claim 1 to 8, it is characterized in that, also comprise:
Belong at least two classification as query word, for the picture of subseries under this query word is set up link;
Show the picture of Main classification and the link of subseries.
10. as each described method of claim 1 to 8, it is characterized in that, also comprise:
Add up the clicked number of times of each category images;
Obtain the maximum classification of the clicked number of times of picture, show the picture of this classification.
11. the system of a search pictures on network is characterized in that, this system comprises the query word sort module, picture searching module, classification weight computation module, describes weight computation module, comprehensive correlation calculations module, and picture extraction module:
Described query word sort module is used to contrast the word's kinds storehouse of presetting, and determines to classify under the query word, and described word's kinds storehouse comprises the feature word of each classification;
Described picture searching module is used to search for each picture relevant with described query word;
Described classification weight computation module is used to contrast the websites collection storehouse of presetting, and obtains the classification weight of described each website, picture place for above-mentioned classification respectively, and described websites collection storehouse comprises the classification weight of each website for described classification;
Described description weight computation module is used to contrast the Web page classifying storehouse of presetting, and obtains the description weight of described each picture place webpage for above-mentioned classification respectively, and described Web page classifying storehouse comprises the description weight of each webpage for described classification;
Described comprehensive correlation calculations module is used for the comprehensive correlativity according to described classification weight and described each picture of description weight calculation;
Described picture extraction module is used to extract the picture of comprehensive correlativity greater than threshold value.
12. system as claimed in claim 11 is characterized in that, this system comprises that also module is divided in the picture searching field, the classified description speech is provided with module, reaches Web page classifying storehouse composition module;
Module is divided in described picture searching field, and being used to divide the picture searching field is some classification;
Described classified description speech is provided with module, is used to each classification setting classified description speech;
Module is formed in described Web page classifying storehouse, is used to utilize above-mentioned classified description speech to calculate on the internet each webpage respectively and forms the Web page classifying storehouse at the description weight of each classification.
13. system as claimed in claim 12 is characterized in that, this system comprises that also classification benchmark speech is provided with module and module is formed in the websites collection storehouse;
Described classification benchmark speech is provided with module, is used to each classification setting classification benchmark speech;
Module is formed in described websites collection storehouse, is used to utilize above-mentioned classification benchmark speech to calculate on the internet each website respectively and forms the websites collection storehouse at the classification weight of each classification.
14. system as claimed in claim 13 is characterized in that, also comprises word statistical module and word's kinds storehouse composition module:
Described word statistical module is used for adding up respectively the occurrence number of each word in each website;
Module is formed in described word's kinds storehouse, is used at each word, extracts the website of this word occurrence number greater than default value, obtains the highest classification of above-mentioned websites collection weight, incorporates the word of this word for this classification into, forms the word's kinds storehouse.
CN2008100880561A 2008-03-27 2008-03-27 Method and system for searching pictures in network Active CN101246502B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008100880561A CN101246502B (en) 2008-03-27 2008-03-27 Method and system for searching pictures in network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008100880561A CN101246502B (en) 2008-03-27 2008-03-27 Method and system for searching pictures in network

Publications (2)

Publication Number Publication Date
CN101246502A CN101246502A (en) 2008-08-20
CN101246502B true CN101246502B (en) 2010-07-21

Family

ID=39946953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008100880561A Active CN101246502B (en) 2008-03-27 2008-03-27 Method and system for searching pictures in network

Country Status (1)

Country Link
CN (1) CN101246502B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722483A (en) * 2011-03-29 2012-10-10 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for determining candidate-item sequence of input method

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081601B (en) * 2009-11-27 2013-01-09 北京金山软件有限公司 Field word identification method and device
CN102819595A (en) * 2012-08-10 2012-12-12 北京星网锐捷网络技术有限公司 Web page classification method, web page classification device and network equipment
CN103678400B (en) * 2012-09-21 2017-12-01 腾讯科技(深圳)有限公司 Web page classification method and device based on collective search behavior
CN103294825B (en) * 2013-06-21 2016-08-31 刘俊 Image file search system and method
CN103324760B (en) * 2013-07-11 2016-08-17 中国农业大学 Commentary document is used to automatically generate the method and system of Nutrition and health education video
CN104881428B (en) * 2015-04-02 2019-03-29 广州神马移动信息科技有限公司 A kind of hum pattern extraction, search method and the device of hum pattern webpage
CN106570116B (en) * 2016-11-01 2020-05-22 北京百度网讯科技有限公司 Search result aggregation method and device based on artificial intelligence
CN106649563B (en) * 2016-11-10 2022-02-25 新华三技术有限公司 Website classification dictionary construction method and device
CN107067032B (en) * 2017-03-30 2020-04-07 东软集团股份有限公司 Data classification method and device
CN110807138B (en) * 2019-09-10 2022-07-05 国网电子商务有限公司 Method and device for determining search object category

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102722483A (en) * 2011-03-29 2012-10-10 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for determining candidate-item sequence of input method
CN102722483B (en) * 2011-03-29 2017-07-25 百度在线网络技术(北京)有限公司 For determining method, device and equipment that the candidate item of input method sorts

Also Published As

Publication number Publication date
CN101246502A (en) 2008-08-20

Similar Documents

Publication Publication Date Title
CN101246502B (en) Method and system for searching pictures in network
CN105912669B (en) Method and device for complementing search terms and establishing individual interest model
CN106372249B (en) A kind of clicking rate predictor method, device and electronic equipment
CN102982153B (en) A kind of information retrieval method and device thereof
US9858308B2 (en) Real-time content recommendation system
CN105868237A (en) Multimedia data recommendation method and server
Brown Ranking journals using social science research network downloads
CN104298719A (en) Method and system for conducting user category classification and advertisement putting based on social behavior
CN102779136A (en) Method and device for information search
CN101299217B (en) Method, apparatus and system for processing map information
CN105653562B (en) The calculation method and device of correlation between a kind of content of text and inquiry request
CN108334610A (en) A kind of newsletter archive sorting technique, device and server
Coleman Identifying the “players” in sports analytics research
US20080015819A1 (en) Athletic Performance Data System and Method
CN103336848B (en) A kind of sort method of information of classifying
Smucker et al. Overview of the TREC 2012 Crowdsourcing Track.
CN106777282B (en) The sort method and device of relevant search
CN101196923A (en) Category-based advertising system and method
CN103235796B (en) Search method and system based on user click behavior
CN107122467A (en) The retrieval result evaluation method and device of a kind of search engine, computer-readable medium
CN105893390A (en) Application program processing method and electronic equipment
CN105824961B (en) A kind of label determines method and device
CN103020066A (en) Method and device for recognizing search demand
CN103268330A (en) User interest extraction method based on image content
CN104899335A (en) Method for performing sentiment classification on network public sentiment of information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20151223

Address after: The South Road in Guangdong province Shenzhen city Fiyta building 518057 floor 5-10 Nanshan District high tech Zone

Patentee after: Shenzhen Tencent Computer System Co., Ltd.

Address before: 2, 518044, East 410 room, SEG science and Technology Park, Zhenxing Road, Shenzhen, Guangdong, Futian District

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.