CN104731948B - High-quality image search resource collection method and device - Google Patents

High-quality image search resource collection method and device Download PDF

Info

Publication number
CN104731948B
CN104731948B CN201510149926.1A CN201510149926A CN104731948B CN 104731948 B CN104731948 B CN 104731948B CN 201510149926 A CN201510149926 A CN 201510149926A CN 104731948 B CN104731948 B CN 104731948B
Authority
CN
China
Prior art keywords
query
webpage
good
picture
quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510149926.1A
Other languages
Chinese (zh)
Other versions
CN104731948A (en
Inventor
陶哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510149926.1A priority Critical patent/CN104731948B/en
Publication of CN104731948A publication Critical patent/CN104731948A/en
Application granted granted Critical
Publication of CN104731948B publication Critical patent/CN104731948B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention provides a high-quality image search resource collection method and device. The method comprises the steps of obtaining original image search resources through conducting search on Query, conducting processing on the original image search resources according to preset rules, screening out high-quality image search resources aiming at the Query, collecting the high-quality image research resources and recording the high-quality image research resources as image research resources corresponding to the Query. By means of the high-quality image search resource collection method and device, more effective data collection can be conducted on the acquired search resources.

Description

The recording method and device of high-quality picture searching resource
Technical field
The present invention relates to field of Internet search, the recording method and dress of more particularly to a kind of high-quality picture searching resource Put.
Background technology
Growing with network technology, the Internet lives more and more tightr with user.In life, a large number of users passes through Search engine carries out information search.Search engine refers to automatically from the Internet collection information, after certain arrangement, there is provided to use The system that family is inquired about.Information vastness on the Internet is multifarious, and has no order, and all of information is as on vast sea Individual island, web page interlinkage is bridge crisscross between these islands, and search engine, then draw a mesh for user Right information map, consults at any time for user.
On the one hand system provides ageing data, and still further aspect, its more data of offer sort to the engine on line (Rank).But either which kind of, topmost purpose be lifted Search Results quality and dependency.Particularly, in crawl money In the case that source is certain, those more high-quality how are captured, more can be only with the data of engine available data complementation most important. That is, how including for data is more effectively carried out, especially for comprising quantity of information is larger, information does not allow picture easy to identify Searching resource.Particularly, for vertical search, its Data Source is often originated the webpage captured with Webpage search, these Data have existed for, and this just can scan for including for resource by data mining.
During enforcement, because the relevance evaluation on line is with Query as dimension, the result (such as picture) that user sees It is also with Query as dimension.Therefore picture resource is included inherently also for the phase for improving certain Query Search Results Guan Xing, in this regard, correlation technique does not provide specific method.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome the problems referred to above or at least in part solve on State the collection device of the high-quality picture searching resource of problem and the recording method of corresponding high-quality picture searching resource.
Based on one aspect of the present invention, there is provided a kind of recording method of high-quality picture searching resource, including:
Scan for obtaining original image searching resource for Query;
The original image searching resource is processed according to preset rules, is filtered out wherein for the high-quality of Query Picture searching resource;
The high-quality picture searching resource is included, the corresponding picture searching resources of the Query are recorded as.
Alternatively, the original image searching resource is processed according to preset rules, filters out and be wherein directed to Query High-quality searching resource, including:
Calculate the probability P (Image=Good | Query) that each picture searching resource is high-quality searching resource;
By the P of calculated each picture searching resource (Image=Good | Query) respectively with default high-quality resource Threshold value is compared;
To filter out comparative result be P (Image=Good | Query) more than the searching resource of the high-quality resource threshold value, is made It is the high-quality searching resource for Query.
Alternatively, in original image searching resource, probability P (Image of each picture searching resource for high-quality resource is calculated =Good | Query), including:
Picture in the original image searching resource is traveled through;
When traversing certain pictures, the attribute information of the picture is obtained;
The P (Image=Good | Query) of the figure is calculated according to the attribute information of the picture.
Alternatively, the initial search resource is processed according to preset rules, is filtered out wherein for the excellent of Query Matter searching resource, including:
In original image searching resource, high-quality picture searching resource therein is filtered out;
In the high-quality picture searching resource for filtering out, the part high-quality picture for Query is further filtered out Searching resource;
In the high-quality picture searching resource of part, calculate each high-quality picture searching resource P (Image=Good | Query);
By the P of calculated each high-quality picture searching resource (Image=Good | Query) respectively with default high-quality Resource threshold is compared;
To filter out comparative result be P (Image=Good | Query) more than the searching resource of the high-quality resource threshold value, is made It is the high-quality searching resource for Query.
Alternatively, in the high-quality picture searching resource for filtering out, further filter out excellent for the part of Query Matter picture searching resource, including:
The text description information of each picture is obtained by navigation patterns;
The similarity of Query and the text description information of each picture is calculated successively;
Part high-quality picture searching resource for Query is further filtered out according to calculated similarity.
Alternatively, if the text description information of a certain picture includes Query, the picture is excellent for the part of Query Matter picture searching resource.
Alternatively, in original image searching resource, probability P of each picture searching resource for high-quality searching resource is calculated (Image=Good | Query), including:
Webpage of the inquiry comprising the original image searching resource in search history record;
Filter out in the webpage for inquiring and meet webpages of the P (Page=Good | Query) more than preset web threshold value;
The P (Image=Good | Query) of each picture on the webpage that calculating sifting goes out.
Alternatively, filter out in the webpage for inquiring and meet P (Page=Good | Query) more than preset web threshold value Webpage, including:
The corresponding webpages of Query are filtered out in the webpage for inquiring;
The corresponding webpages of traversal Query;
When traversing certain webpage, the attribute information of the webpage is obtained;
Judge whether the webpage meets P (Page=Good | Query) more than preset web according to the attribute information of the webpage Threshold value.
Alternatively, P (Page=Good | Query) is determined according to following steps:
Behavior is clicked on when anonymous is searched in certain period of time in searching for daily record for the first of each webpage;
Behavior is clicked on when anonymous is searched in certain period of time in searching for daily record for the second of Query;
Comparison described first clicks on behavior and described second and clicks on the similarity of behavior;
P (Page=Good | Query) is determined according to both similarities.
Alternatively, relatively more described first the similarity of behavior and the second click behavior is clicked on, including:During according to clicking on Between and/or number of clicks it is more described first click on behavior and it is described second click on behavior similarity.
Alternatively, P (Page=Good | Query) is determined according to following steps:
The text description information of each webpage is obtained by navigation patterns;
The similarity of Query and the text description information of each webpage is calculated successively;
P (Page=Good | Query) is determined according to calculated similarity.
Alternatively, the text description information of each webpage includes at least one following:The title title of each webpage, just Text, summary.
Based on another aspect of the present invention, present invention also offers a kind of collection device of high-quality picture searching resource, Including:
Search module, is suitable to scan for obtaining original image searching resource for Query;
Screening module, is suitable to process the original image searching resource according to preset rules, filters out wherein pin High-quality picture searching resource to Query;
Module is included, is suitable to include the high-quality picture searching resource, be recorded as the corresponding picture searchings of the Query Resource.
Alternatively, the screening module is further adapted for:
In original image searching resource, probability P (Image=of each picture searching resource for high-quality searching resource is calculated Good|Query);
By the P of calculated each picture searching resource (Image=Good | Query) respectively with default high-quality resource Threshold value is compared;
To filter out comparative result be P (Image=Good | Query) more than the searching resource of the high-quality resource threshold value, is made It is the high-quality searching resource for Query.
Alternatively, the screening module is further adapted for:
Picture in the original image searching resource is traveled through;
When traversing certain pictures, the attribute information of the picture is obtained;
The P (Image=Good | Query) of the figure is calculated according to the attribute information of the picture.
Alternatively, the screening module is further adapted for:
In original image searching resource, high-quality picture searching resource therein is filtered out;
In the high-quality picture searching resource for filtering out, the part high-quality picture for Query is further filtered out Searching resource;
In the high-quality picture searching resource of part, calculate each high-quality picture searching resource P (Image=Good | Query);
By the P of calculated each high-quality picture searching resource (Image=Good | Query) respectively with default high-quality Resource threshold is compared;
To filter out comparative result be P (Image=Good | Query) more than the searching resource of the high-quality resource threshold value, is made It is the high-quality searching resource for Query.
Alternatively, the screening module is further adapted for:
The text description information of each picture is obtained by navigation patterns;
The similarity of Query and the text description information of each picture is calculated successively;
Part high-quality picture searching resource for Query is further filtered out according to calculated similarity.
Alternatively, the screening module is further adapted for:
Webpage of the inquiry comprising the original image searching resource in search history record;
Filter out in the webpage for inquiring and meet webpages of the P (Page=Good | Query) more than preset web threshold value;
The P (Image=Good | Query) of each picture on the webpage that calculating sifting goes out.
Alternatively, the screening module is further adapted for:
The corresponding webpages of Query are filtered out in the webpage for inquiring;
The corresponding webpages of traversal Query;
When traversing certain webpage, the attribute information of the webpage is obtained;
Judge whether the webpage meets P (Page=Good | Query) more than preset web according to the attribute information of the webpage Threshold value.
Alternatively, the screening module is further adapted for determining P (Page=Good | Query) according to following steps:
Behavior is clicked on when anonymous is searched in certain period of time in searching for daily record for the first of each webpage;
Behavior is clicked on when anonymous is searched in certain period of time in searching for daily record for the second of Query;
Comparison described first clicks on behavior and described second and clicks on the similarity of behavior;
P (Page=Good | Query) is determined according to both similarities.
Alternatively, the screening module is further adapted for:Row is clicked on according to the time of click and/or number of clicks more described first It is the similarity for clicking on behavior with described second.
Alternatively, the screening module is further adapted for determining P (Page=Good | Query) according to following steps:
The text description information of each webpage is obtained by navigation patterns;
The similarity of Query and the text description information of each webpage is calculated successively;
P (Page=Good | Query) is determined according to calculated similarity.
In embodiments of the present invention, scan for obtaining original image searching resource for Query, further, to original Picture searching resource is screened, and filters out high-quality picture searching resource therein, and is recorded as the corresponding figures of the Query Piece searching resource.It is directed to the original image searching resource that Query scans for obtaining more rough, which includes high-quality Picture searching resource, also includes substantial amounts of non-prime picture searching resource, and the present invention is directly lifted from search result relevance Angle illustrate go which high-quality picture searching resource included, in the case where searching resource is certain, filter out therein High-quality picture searching resource corresponding with the Query of identification.When user is inquired about using the Query of identification, by high-quality figure Piece searching resource preferentially presents to user, improves the dependency of the Query of identification.Lay particular emphasis on because provided in an embodiment of the present invention Searching resource to having obtained carries out more efficiently data acquisition, especially relatively has directiveness to vertical search.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of description, and in order to allow the above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the specific embodiment of the present invention.
According to the detailed description below in conjunction with accompanying drawing to the specific embodiment of the invention, those skilled in the art will be brighter Above-mentioned and other purposes, the advantages and features of the present invention.
Description of the drawings
By the detailed description for reading hereafter preferred implementation, various other advantages and benefit is common for this area Technical staff will be clear from understanding.Accompanying drawing is only used for illustrating the purpose of preferred implementation, and is not considered as to the present invention Restriction.And in whole accompanying drawing, it is denoted by the same reference numerals identical part.In the accompanying drawings:
Fig. 1 shows the handling process of the recording method of high-quality picture searching resource according to an embodiment of the invention Figure;And
Fig. 2 shows the structural representation of the collection device of high-quality picture searching resource according to an embodiment of the invention Figure.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure and should not be by embodiments set forth here Limited.On the contrary, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
To solve above-mentioned technical problem, a kind of recording method of high-quality picture searching resource is embodiments provided. Fig. 1 shows the process chart of the recording method of high-quality picture searching resource according to an embodiment of the invention.Referring to figure 1, the recording method of high-quality picture searching resource at least includes step S102 to step S106:
Step S102, scan for obtaining original image searching resource for Query;
Step S104, original image searching resource is processed according to preset rules, filtered out wherein for Query's High-quality picture searching resource;
Step S106, high-quality picture searching resource is included, be recorded as the corresponding picture searching resources of the Query.
In embodiments of the present invention, scan for obtaining original image searching resource for Query, further, to original Picture searching resource is screened, and filters out high-quality picture searching resource therein, and is recorded as the corresponding figures of the Query Piece searching resource.It is directed to the original image searching resource that Query scans for obtaining more rough, which includes high-quality Picture searching resource, also includes substantial amounts of non-prime picture searching resource, and the present invention is directly lifted from search result relevance Angle illustrate go which high-quality picture searching resource included, in the case where searching resource is certain, filter out therein High-quality picture searching resource corresponding with the Query of identification.When user is inquired about using the Query of identification, by high-quality figure Piece searching resource preferentially presents to user, improves the dependency of the Query of identification.Lay particular emphasis on because provided in an embodiment of the present invention Searching resource to having obtained carries out more efficiently data acquisition, especially relatively has directiveness to vertical search.
Wherein, step S104 refers to that needs are processed initial search picture resource according to preset rules, filter out it In for Query high-quality searching resource.Specifically, for initial search picture resource, a kind of screening high-quality picture searching money Source it is preferred embodiment as follows:
Step A, calculate each picture searching resource for high-quality searching resource probability P (Image=Good | Query), its In, P (Image=Good | Query) it is that search word is Query, and search for the probability that the picture resource for obtaining is high-quality resource;
It is step B, the P of calculated each picture searching resource (Image=Good | Query) is excellent with default respectively Matter resource threshold is compared;
Step C, filter out comparative result for P (Image=Good | Query) more than high-quality resource threshold value searching resource, As the high-quality searching resource for Query.
Wherein, a kind of specific embodiment of step A is embodiments provided, including:
First, the picture in original image searching resource is traveled through;
Secondly, when certain pictures is traversed, the attribute information of the picture is obtained;
Again, the P (Image=Good | Query) of the figure is calculated according to the attribute information of the picture.
The attribute information of picture includes all properties information related to picture, the definition of such as picture, the name of picture Title, image content (personage, flowers and plants, house, rock, nature, daytime, night etc.), the explanation of picture, the size of picture, figure The pixel of piece, the tone of picture, saturation of picture, etc..
Step S104 can also adopt the embodiment of the screening high-quality searching resource of another kind of high-quality, including:
Step A, in original image searching resource, filter out high-quality picture searching resource therein;
Step B, in the high-quality picture searching resource for filtering out, further filter out the part high-quality figure for Query Piece searching resource;
Step C, in the high-quality picture searching resource of part, calculate each high-quality picture searching resource P (Image=Good | Query);
Step D, by the P of calculated each high-quality picture searching resource (Image=Good | Query) respectively with it is default High-quality resource threshold value be compared;
Step E, filter out comparative result for P (Image=Good | Query) more than high-quality resource threshold value searching resource, As the high-quality searching resource for Query.
Wherein, step B implement when, can by for the picture resource for having obtained, calculate P (Image=Good | Query), and then according to this probability the high-quality picture resource for Query therein is filtered out.
Wherein, set up between picture and Query and associate, carry out Similarity Measure, the information of picture is by its text description letter Breath embodies, therefore, to calculate P (Query | Image=Good), it is possible to use text description information and the Query of picture enters Row Similarity Measure, and then P (Query | Image=Good) is determined according to calculated similarity.Specifically, by browsing Behavior obtains the text description information of each picture;The phase of the text description information of the Query and each picture for identifying is calculated successively Like degree;P (Query | Image=Good) is determined according to calculated similarity, the part for Query is further filtered out High-quality picture searching resource.
It should be noted that, if the text description information of a certain picture includes Query, the picture is necessarily directed to The high-quality picture searching resource of Query.If the text of picture includes the pith of certain Query, Query and picture are calculated Text description information relevance score, score exceed certain threshold value, then choose.
Further, since all properties of picture are got by analyzing web page, so data acquisition also needs to solution one Individual problem, that is, the problem of " for specific certain Query, needing which page to parse ".
From web page analysis angle, embodiments provide in original image searching resource, calculate each picture Searching resource is another embodiment of the probability P (Image=Good | Query) of high-quality searching resource, specifically includes:
Webpage of the inquiry comprising original image searching resource in search history record;
Filter out in the webpage for inquiring and meet webpages of the P (Page=Good | Query) more than preset web threshold value;
The P (Image=Good | Query) of each picture on the webpage that calculating sifting goes out.
Wherein, filter out in the webpage for inquiring and meet P (Page=Good | Query) more than preset web threshold value Webpage, further includes:
The corresponding webpages of Query are filtered out in the webpage for inquiring;
The corresponding webpages of traversal Query;
When traversing certain webpage, the attribute information of the webpage is obtained;
Judge whether the webpage meets P (Page=Good | Query) more than preset web according to the attribute information of the webpage Threshold value.
Wherein, embodiments provide it is a kind of be preferred embodiment used to determining P (Page=Good | Query), Concrete steps include:
Anonymous is searched in search daily record in certain period of time for the first click behavior of each webpage;
Anonymous is searched in search daily record in certain period of time for the second click behavior of Query;
Relatively first clicks on behavior and second clicks on the similarity of behavior;
P (Page=Good | Query) is determined according to both similarities.
Relatively first clicks on behavior and second clicks on the similarity of behavior, specifically, including:According to click on the time and/or Number of clicks compares the first click behavior and second and clicks on the similarity of behavior.
It is worth being explanatorily that the embodiment of the present invention can determine P (Page=Good | Query) according to following steps:
The text description information of each webpage is obtained by navigation patterns;
The similarity of Query and the text description information of each webpage is calculated successively;
P (Page=Good | Query) is determined according to calculated similarity.
Wherein, the text description information of each webpage includes at least one following:The title (title) of each webpage, text, pluck Will.
Wherein, some threshold values mentioned above by those skilled in the art according to the corresponding parameter type of each threshold value, concrete The factor concrete decision such as implementation environment, those skilled in the art can be according to being embodied as determining the numerical value of some threshold values.
Now the high-quality picture resource based on user behavior feature provided in an embodiment of the present invention is included from another angle Method carries out parsing explanation.
The problem that the present embodiment is solved can be defined as:For certain Query (has occurred or future has been likely to occur Query), it should include which picture to improve the dependency of search.
First, Query excavations are carried out, it is determined that needing the Query for improving dependency.Query excavates namely determination and needs Target Query to be solved set, which Query wishes the dependency that Query is lifted by way of supplement includes data. So user is unsatisfied with, and Search Results are bad, and the low Query of clicking rate is exactly target Query of the present embodiment.
Still further aspect, the expressing the meaning property of the bad Query of many Search Results, Query itself be not just it is very strong, these Query needs to be filtered;Furthermore, it is desirable to these Query have certain predictability, the data recalled by these Query, Bring the improvement on dependency can to other Query (especially new Query).
Specifically, whether the dependency of Query is good, can click on (Click)/retrieval (Srp) ratio according to equation below Value, and then according to clicking on/retrieval ratio value to be estimating the dependency Relative (Query) of Query.
Confidence (srp (Query)) is the polygronal function that Query retrieves number of times, and basic value is for 0-1's Boolean, retrieves the too low Query of number of times and just directly discards.
Due to the bad Query of many dependencys may expressing the meaning for itself be all very indefinite, low main of score Reason is more in fact Query itself, the Query not strong enough for these directive significances, is filtered this out, here be not intended to Justice is judged from the angle of text, calculates text score TextScore (Query) of Query.
Wherein, DF (Termi) is the file frequency (Doc Frequency) of the Term of Query, and W (Termi) is Term Part of speech scoring function, P (I) is the polygronal function of Term numbers.
As the above analysis, target Query of the present embodiment is Relative (Qeury) < R1&&TextScore (Query) this part Query of > S1.
After target Query is identified, the problem of the embodiment of the present invention can be with short description:" now with certain Query, needs include the good picture relevant with this Query ", that is, P (Image=Good | Query) is included than larger Picture.That is, for target Query, the high-quality picture relevant with the Query is included.
Wherein, P (Image=Good | Query)=P (Query | Image=Good) * P (Image are understood according to theory of probability =Good)/P (Query).
For a specific Query, P (Query) is a determination value, can not be considered, then problem is just changed into: P (Image=Good | Query)=P (Query | Image=Good) * P (Image=Good).
P (Image=Good) be this Image be in itself Good probability;P (Query | Image=Good) it is at certain Occur the probability of this Query, that is, the similarity of this Image and Query in individual specific Image objects.
P (Image=Good) can pass through the text situation of Image, and/or the situation of Image places webpage is estimated Meter;P (Query | Image=Good) is then the similarity of Image and Query, and we can be entered by the similarity of text Row estimation.
According to analysis before, and P (Image=Good | Query)=P (Query | Image=Good) * P (Image= Good)。
Before figure is not caught, the quality of Image can only be estimated by some other modes:
Score (Image)=Score (Page) * core (Text)
Score (Page) is a polygronal function of Image places Page scores, and Score (Text) is then Image related The score of text.
The degree of association of Query and Image, represented with the degree of association of text here (actual picture new data it is initial Rank is also to be calculated by text.)
If the text of Image includes completely certain Query, the Image is selected as high-quality picture.
If the text of Image includes the important Term of certain Query, calculate Query's and Text according to equation below Relevance score, score exceedes certain threshold value, then selected as high-quality picture:
W2 (Term)=Type (Term) * Pos (Term) * Length (Term)
Wherein, Type (Term) for Term type scores, Pos (Term) for Term Position scores, Length (Term) it is the length score of Term.
W1 (Term) meanings are the maximum for taking this Term scores in Query and Text.
Score (Text, Query) is referred to the common factor of Query and Text divided by their union.
Further, since all properties of Image are got by PA analyzing web pages, so data acquisition also needs to solve One problem, that is, the problem of " for specific certain Query, needing which page to do PA parsings ".
The attribute of resource is obtained by PA analyzing web pages,
For Query, need which page to do PA parsings,
P (Page=Good | Query)
=P (Page | Image=Good) * P (Page=Good)/P (Query)
Equally, because for a specific Query, P (Query) is a determination value, can not be considered, then ask Topic is just changed into:
P (Page=Good | Query)=P (Page | Image=Good) * P (Page=Good)
P (Page=Good) is the probability of Page Good itself, and P (Page=Good | Query) it is then Page and Query Similarity degree, can simulate from text or click behavior come.
Based on same inventive concept, the embodiment of the present invention additionally provides a kind of collection device of high-quality picture searching resource. Fig. 2 shows the structural representation of the collection device of high-quality picture searching resource according to an embodiment of the invention.Referring to figure 2, the device at least includes:
Search module 210, is suitable to scan for obtaining original image searching resource for Query;
Screening module 220, couples with search module 210, is suitable to carry out original image searching resource according to preset rules Process, filter out wherein for the high-quality picture searching resource of Query;
Module 230 is included, is coupled with screening module 220, be suitable to include picture high-quality searching resource, be recorded as this The corresponding picture searching resources of Query.
In a preferred embodiment, screening module 220 is further adapted for:
In original image searching resource, probability P (Image=of each picture searching resource for high-quality searching resource is calculated Good|Query);
By the P of calculated each picture searching resource (Image=Good | Query) respectively with default high-quality resource Threshold value is compared;
Filter out comparative result for P (Image=Good | Query) more than high-quality resource threshold value searching resource, as pin High-quality searching resource to Query.
In a preferred embodiment, screening module 220 is further adapted for:
Picture in original image searching resource is traveled through;
When traversing certain pictures, the attribute information of the picture is obtained;
The P (Image=Good | Query) of the figure is calculated according to the attribute information of the picture.
In a preferred embodiment, screening module 220 is further adapted for:
In original image searching resource, high-quality picture searching resource therein is filtered out;
In the high-quality picture searching resource for filtering out, the part high-quality picture searching for Query is further filtered out Resource;
In the high-quality picture searching resource of part, calculate each high-quality picture searching resource P (Image=Good | Query);
By the P of calculated each high-quality picture searching resource (Image=Good | Query) respectively with default high-quality Resource threshold is compared;
Filter out comparative result for P (Image=Good | Query) more than high-quality resource threshold value searching resource, as pin High-quality searching resource to Query.
In a preferred embodiment, screening module 220 is further adapted for:
The text description information of each picture is obtained by navigation patterns;
The similarity of Query and the text description information of each picture is calculated successively;
Part high-quality picture searching resource for Query is further filtered out according to calculated similarity.
In a preferred embodiment, screening module 220 is further adapted for:
Webpage of the inquiry comprising original image searching resource in search history record;
Filter out in the webpage for inquiring and meet webpages of the P (Page=Good | Query) more than preset web threshold value;
The P (Image=Good | Query) of each picture on the webpage that calculating sifting goes out.
In a preferred embodiment, screening module 220 is further adapted for:
The corresponding webpages of Query are filtered out in the webpage for inquiring;
The corresponding webpages of traversal Query;
When traversing certain webpage, the attribute information of the webpage is obtained;
Judge whether the webpage meets P (Page=Good | Query) more than preset web according to the attribute information of the webpage Threshold value.
In a preferred embodiment, screening module 220 be further adapted for according to following steps determine P (Page=Good | Query):
Behavior is clicked on when anonymous is searched in certain period of time in searching for daily record for the first of each webpage;
Behavior is clicked on when anonymous is searched in certain period of time in searching for daily record for the second of Query;
Relatively first clicks on behavior and second clicks on the similarity of behavior;
P (Page=Good | Query) is determined according to both similarities.
In a preferred embodiment, screening module 220 is further adapted for:Compared according to click time and/or number of clicks First clicks on behavior and second clicks on the similarity of behavior.
In a preferred embodiment, screening module 220 be further adapted for according to following steps determine P (Page=Good | Query):
The text description information of each webpage is obtained by navigation patterns;
The similarity of Query and the text description information of each webpage is calculated successively;
P (Page=Good | Query) is determined according to calculated similarity.
Can be reached using the recording method and device of high-quality picture searching resource provided in an embodiment of the present invention is had as follows Beneficial effect:
In embodiments of the present invention, scan for obtaining original image searching resource for Query, further, to original Picture searching resource is screened, and filters out high-quality picture searching resource therein, and is recorded as the corresponding figures of the Query Piece searching resource.It is directed to the original image searching resource that Query scans for obtaining more rough, which includes high-quality Picture searching resource, also includes substantial amounts of non-prime picture searching resource, and the present invention is directly lifted from search result relevance Angle illustrate go which high-quality picture searching resource included, in the case where searching resource is certain, filter out therein High-quality picture searching resource corresponding with the Query of identification.When user is inquired about using the Query of identification, by high-quality figure Piece searching resource preferentially presents to user, improves the dependency of the Query of identification.Lay particular emphasis on because provided in an embodiment of the present invention Searching resource to having obtained carries out more efficiently data acquisition, especially relatively has directiveness to vertical search.
In description mentioned herein, a large amount of details are illustrated.It is to be appreciated, however, that the enforcement of the present invention Example can be put into practice in the case of without these details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help understand one or more in each inventive aspect, exist Above in the description of the exemplary embodiment of the present invention, each feature of the present invention is grouped together into single enforcement sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor The more features of feature that the application claims ratio of shield is expressly recited in each claim.More precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as the separate embodiments of the present invention.
Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Unit or component are combined into a module or unit or component, and can be divided in addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit is excluded each other, can adopt any Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification is (including adjoint power Profit is required, summary and accompanying drawing) disclosed in each feature can it is identical by offers, be equal to or the alternative features of similar purpose carry out generation Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment required for protection one of arbitrarily Can in any combination mode using.
The present invention all parts embodiment can be realized with hardware, or with one or more processor operation Software module realize, or with combinations thereof realization.It will be understood by those of skill in the art that can use in practice Microprocessor or digital signal processor (DSP) to realize device according to embodiments of the present invention in some or all portions The some or all functions of part.The present invention is also implemented as the part for performing method as described herein or complete The equipment or program of device (for example, computer program and computer program) in portion.Such program for realizing the present invention Can store on a computer-readable medium, or there can be the form of one or more signal.Such signal can be with Download from internet website and obtain, or provide on carrier signal, or provide in any other form.
It should be noted that above-described embodiment the present invention will be described rather than limits the invention, and ability Field technique personnel can design without departing from the scope of the appended claims alternative embodiment.In the claims, Any reference markss between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not Element listed in the claims or step.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and be run after fame Claim.
So far, although those skilled in the art will appreciate that detailed herein illustrate and describe multiple showing for the present invention Example property embodiment, but, without departing from the spirit and scope of the present invention, still can be direct according to present disclosure It is determined that or deriving many other variations or modifications for meeting the principle of the invention.Therefore, the scope of the present invention is understood that and recognizes It is set to and covers all these other variations or modifications.
The invention discloses A1, a kind of recording method of high-quality picture searching resource, including:
Scan for obtaining original image searching resource for Query;
The original image searching resource is processed according to preset rules, is filtered out wherein for the high-quality of Query Picture searching resource;
The high-quality picture searching resource is included, the corresponding picture searching resources of the Query are recorded as.
A2, the method according to claim A1, wherein, the original image searching resource is entered according to preset rules Row is processed, and filters out the high-quality searching resource for being wherein directed to Query, including:
Calculate the probability P (Image=Good | Query) that each picture searching resource is high-quality searching resource;
By the P of calculated each picture searching resource (Image=Good | Query) respectively with default high-quality resource Threshold value is compared;
To filter out comparative result be P (Image=Good | Query) more than the searching resource of the high-quality resource threshold value, is made It is the high-quality searching resource for Query.
A3, the method according to claim A2, wherein, in original image searching resource, calculate each picture searching Resource for high-quality resource probability P (Image=Good | Query), including:
Picture in the original image searching resource is traveled through;
When traversing certain pictures, the attribute information of the picture is obtained;
The P (Image=Good | Query) of the figure is calculated according to the attribute information of the picture.
A4, the method according to claim A1, wherein, according to preset rules to the initial search resource at Reason, filters out the high-quality searching resource for being wherein directed to Query, including:
In original image searching resource, high-quality picture searching resource therein is filtered out;
In the high-quality picture searching resource for filtering out, the part high-quality picture for Query is further filtered out Searching resource;
In the high-quality picture searching resource of part, calculate each high-quality picture searching resource P (Image=Good | Query);
By the P of calculated each high-quality picture searching resource (Image=Good | Query) respectively with default high-quality Resource threshold is compared;
To filter out comparative result be P (Image=Good | Query) more than the searching resource of the high-quality resource threshold value, is made It is the high-quality searching resource for Query.
A5, the method according to claim A4, wherein, in the high-quality picture searching resource for filtering out, enter One step filters out the part high-quality picture searching resource for Query, including:
The text description information of each picture is obtained by navigation patterns;
The similarity of Query and the text description information of each picture is calculated successively;
Part high-quality picture searching resource for Query is further filtered out according to calculated similarity.
A6, the method according to claim A5, wherein, if the text description information of a certain picture includes Query, The picture is the part high-quality picture searching resource for Query.
A7, the method according to claim any one of A2 to A6, wherein, in original image searching resource, calculate Each picture searching resource for high-quality searching resource probability P (Image=Good | Query), including:
Webpage of the inquiry comprising the original image searching resource in search history record;
Filter out in the webpage for inquiring and meet webpages of the P (Page=Good | Query) more than preset web threshold value;
The P (Image=Good | Query) of each picture on the webpage that calculating sifting goes out.
A8, the method according to claim A7, wherein, filter out in the webpage for inquiring and meet P (Page= Good | Query) it is more than the webpage of preset web threshold value, including:
The corresponding webpages of Query are filtered out in the webpage for inquiring;
The corresponding webpages of traversal Query;
When traversing certain webpage, the attribute information of the webpage is obtained;
Judge whether the webpage meets P (Page=Good | Query) more than preset web according to the attribute information of the webpage Threshold value.
A9, the method according to claim A7 or A8, wherein, according to following steps determine P (Page=Good | Query):
Behavior is clicked on when anonymous is searched in certain period of time in searching for daily record for the first of each webpage;
Behavior is clicked on when anonymous is searched in certain period of time in searching for daily record for the second of Query;
Comparison described first clicks on behavior and described second and clicks on the similarity of behavior;
P (Page=Good | Query) is determined according to both similarities.
A10, the method according to claim A9, wherein, comparison described first is clicked on behavior and described second and is clicked on The similarity of behavior, including:Behavior and described second is clicked on according to the time of click and/or number of clicks more described first to click on The similarity of behavior.
A11, the method according to claim A7 or A8, wherein, according to following steps determine P (Page=Good | Query):
The text description information of each webpage is obtained by navigation patterns;
The similarity of Query and the text description information of each webpage is calculated successively;
P (Page=Good | Query) is determined according to calculated similarity.
A12, the method according to claim A11, wherein, the text description information of each webpage include it is following extremely It is one of few:The title title of each webpage, text, summary.
The invention also discloses B13, a kind of collection device of high-quality picture searching resource, including:
Search module, is suitable to scan for obtaining original image searching resource for Query;
Screening module, is suitable to process the original image searching resource according to preset rules, filters out wherein pin High-quality picture searching resource to Query;
Module is included, is suitable to include the high-quality picture searching resource, be recorded as the corresponding picture searchings of the Query Resource.
B14, the device according to claim B13, wherein, the screening module is further adapted for:
In original image searching resource, probability P (Image=of each picture searching resource for high-quality searching resource is calculated Good|Query);
By the P of calculated each picture searching resource (Image=Good | Query) respectively with default high-quality resource Threshold value is compared;
To filter out comparative result be P (Image=Good | Query) more than the searching resource of the high-quality resource threshold value, is made It is the high-quality searching resource for Query.
B15, the device according to claim B14, wherein, the screening module is further adapted for:
Picture in the original image searching resource is traveled through;
When traversing certain pictures, the attribute information of the picture is obtained;
The P (Image=Good | Query) of the figure is calculated according to the attribute information of the picture.
B16, the device according to claim B13, wherein, the screening module is further adapted for:
In original image searching resource, high-quality picture searching resource therein is filtered out;
In the high-quality picture searching resource for filtering out, the part high-quality picture for Query is further filtered out Searching resource;
In the high-quality picture searching resource of part, calculate each high-quality picture searching resource P (Image=Good | Query);
By the P of calculated each high-quality picture searching resource (Image=Good | Query) respectively with default high-quality Resource threshold is compared;
To filter out comparative result be P (Image=Good | Query) more than the searching resource of the high-quality resource threshold value, is made It is the high-quality searching resource for Query.
B17, the device according to claim B16, wherein, the screening module is further adapted for:
The text description information of each picture is obtained by navigation patterns;
The similarity of Query and the text description information of each picture is calculated successively;
Part high-quality picture searching resource for Query is further filtered out according to calculated similarity.
B18, the device according to claim any one of B14 to B17, wherein, the screening module is further adapted for:
Webpage of the inquiry comprising the original image searching resource in search history record;
Filter out in the webpage for inquiring and meet webpages of the P (Page=Good | Query) more than preset web threshold value;
The P (Image=Good | Query) of each picture on the webpage that calculating sifting goes out.
B19, the device according to claim B18, wherein, the screening module is further adapted for:
The corresponding webpages of Query are filtered out in the webpage for inquiring;
The corresponding webpages of traversal Query;
When traversing certain webpage, the attribute information of the webpage is obtained;
Judge whether the webpage meets P (Page=Good | Query) more than preset web according to the attribute information of the webpage Threshold value.
B20, the device according to claim B18 or B19, wherein, the screening module is further adapted for according to following step It is rapid to determine P (Page=Good | Query):
Behavior is clicked on when anonymous is searched in certain period of time in searching for daily record for the first of each webpage;
Behavior is clicked on when anonymous is searched in certain period of time in searching for daily record for the second of Query;
Comparison described first clicks on behavior and described second and clicks on the similarity of behavior;
P (Page=Good | Query) is determined according to both similarities.
B21, the device according to claim B20, wherein, the screening module is further adapted for:According to click on the time and/ Or number of clicks more described first clicks on behavior and described second and clicks on the similarity of behavior.
B22, the device according to claim B18 or B19, wherein, the screening module is further adapted for according to following step It is rapid to determine P (Page=Good | Query):
The text description information of each webpage is obtained by navigation patterns;
The similarity of Query and the text description information of each webpage is calculated successively;
P (Page=Good | Query) is determined according to calculated similarity.

Claims (13)

1. a kind of recording method of high-quality picture searching resource, including:
Scan for obtaining original image searching resource for Query;
Webpage of the inquiry comprising the original image searching resource in search history record;Filter out in the webpage for inquiring Meet the webpage of probability P (Page=Good | Query) that web page resources are high-quality web page resources more than preset web threshold value;Meter Each picture searching resource on the webpage for filtering out for high-quality picture searching resource probability P (Image=Good | Query);
Calculated probability P (Image=Good | Query) is compared respectively with default high-quality resource threshold value;
To filter out comparative result be P (Image=Good | Query) more than the picture searching resource of the high-quality resource threshold value, is made It is the high-quality picture searching resource for Query;
The high-quality picture searching resource is included, the corresponding picture searching resources of the Query are recorded as.
2. method according to claim 1, wherein, probability P (Image=Good | Query) is calculated, including:
Picture in each picture searching resource is traveled through;
When traversing certain pictures, the attribute information of the picture is obtained;
P (Image=Good | Query) is calculated according to the attribute information of the picture.
3. method according to claim 1, wherein, filter out in the webpage for inquiring meet P (Page=Good | Query) more than the webpage of preset web threshold value, including:
The corresponding webpages of Query are filtered out in the webpage for inquiring;
The corresponding webpages of traversal Query;
When traversing certain webpage, the attribute information of the webpage is obtained;
Judge whether the webpage meets P (Page=Good | Query) more than preset web threshold according to the attribute information of the webpage Value.
4. method according to claim 1, wherein, determine P (Page=Good | Query) according to following steps:
Anonymous is searched in search daily record and clicks on behavior for the first of each webpage in certain period of time;
Anonymous is searched in search daily record and clicks on behavior for the second of Query in certain period of time;
Comparison described first clicks on behavior and described second and clicks on the similarity of behavior;
P (Page=Good | Query) is determined according to both similarities.
5. method according to claim 4, wherein, comparison described first clicks on behavior and described second and clicks on the phase of behavior Seemingly spend, including:Behavior and described second is clicked on according to the time of click and/or number of clicks more described first and clicks on the phase of behavior Like degree.
6. method according to claim 1, wherein, determine P (Page=Good | Query) according to following steps:
The text description information of each webpage is obtained by navigation patterns;
The similarity of Query and the text description information of each webpage is calculated successively;
P (Page=Good | Query) is determined according to calculated similarity.
7. method according to claim 6, wherein, the text description information of each webpage includes at least one following: The title of each webpage, text, summary.
8. a kind of collection device of high-quality picture searching resource, including:
Search module, is suitable to scan for obtaining original image searching resource for Query;
Screening module, is suitable to webpage of the inquiry comprising the original image searching resource in search history is recorded, and is inquiring Webpage in filter out and meet probability P (Page=Good | Query) that web page resources are high-quality web page resources more than preset web The webpage of threshold value, each picture searching resource on the webpage that calculating sifting goes out is the probability P (Image of high-quality picture searching resource =Good | Query);Calculated probability P (Image=Good | Query) is entered respectively with default high-quality resource threshold value Row compares;Filter out comparative result and provide more than the picture searching of the high-quality resource threshold value for P (Image=Good | Query) Source, as the high-quality picture searching resource for Query;
Module is included, is suitable to include the high-quality picture searching resource, be recorded as the corresponding picture searching moneys of the Query Source.
9. device according to claim 8, wherein, the screening module is further adapted for calculating probability P according to following steps (Image=Good | Query):
Picture in each picture searching resource is traveled through;
When traversing certain pictures, the attribute information of the picture is obtained;
P (Image=Good | Query) is calculated according to the attribute information of the picture.
10. device according to claim 8, wherein, the screening module is further adapted for being inquired according to following steps Filter out in webpage and meet webpages of the P (Page=Good | Query) more than preset web threshold value:
The corresponding webpages of Query are filtered out in the webpage for inquiring;
The corresponding webpages of traversal Query;
When traversing certain webpage, the attribute information of the webpage is obtained;
Judge whether the webpage meets P (Page=Good | Query) more than preset web threshold according to the attribute information of the webpage Value.
11. devices according to claim 8, wherein, the screening module is further adapted for determining P (Page according to following steps =Good | Query):
Anonymous is searched in search daily record and clicks on behavior for the first of each webpage in certain period of time;
Anonymous is searched in search daily record and clicks on behavior for the second of Query in certain period of time;
Comparison described first clicks on behavior and described second and clicks on the similarity of behavior;
P (Page=Good | Query) is determined according to both similarities.
12. devices according to claim 11, wherein, the screening module is further adapted for according to following steps more described One clicks on behavior and described second clicks on the similarity of behavior:It is more described first point according to the time of click and/or number of clicks Hit behavior and described second and click on the similarity of behavior.
13. devices according to claim 8, wherein, the screening module is further adapted for determining P (Page according to following steps =Good | Query):
The text description information of each webpage is obtained by navigation patterns;
The similarity of Query and the text description information of each webpage is calculated successively;
P (Page=Good | Query) is determined according to calculated similarity.
CN201510149926.1A 2015-03-31 2015-03-31 High-quality image search resource collection method and device Expired - Fee Related CN104731948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510149926.1A CN104731948B (en) 2015-03-31 2015-03-31 High-quality image search resource collection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510149926.1A CN104731948B (en) 2015-03-31 2015-03-31 High-quality image search resource collection method and device

Publications (2)

Publication Number Publication Date
CN104731948A CN104731948A (en) 2015-06-24
CN104731948B true CN104731948B (en) 2017-05-03

Family

ID=53455835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510149926.1A Expired - Fee Related CN104731948B (en) 2015-03-31 2015-03-31 High-quality image search resource collection method and device

Country Status (1)

Country Link
CN (1) CN104731948B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8774526B2 (en) * 2010-02-08 2014-07-08 Microsoft Corporation Intelligent image search results summarization and browsing
CN101976252B (en) * 2010-10-26 2012-10-10 百度在线网络技术(北京)有限公司 Picture display system and display method thereof
CN102750385B (en) * 2012-06-29 2014-05-07 南京邮电大学 Correlation-quality sequencing image retrieval method based on tag retrieval
CN103744970B (en) * 2014-01-10 2016-11-23 北京奇虎科技有限公司 A kind of method and device of the descriptor determining picture
CN104331513A (en) * 2014-11-24 2015-02-04 中国科学技术大学 High-efficiency prediction method for image retrieval performance

Also Published As

Publication number Publication date
CN104731948A (en) 2015-06-24

Similar Documents

Publication Publication Date Title
Rosenberg A new critical estimate of named species-level diversity of the recent Mollusca
US8989450B1 (en) Scoring items
CN105930470B (en) A kind of document retrieval method based on feature weight analytical technology
GB2509773A (en) Automatic genre determination of web content
JP2006107433A (en) System and method for incorporating anchor text into ranking of search result
CN105930473B (en) A kind of similar documents search method based on random forest technology
WO2018113468A1 (en) Search term recommendation method, device, program and medium
CN103617213B (en) Method and system for identifying newspage attributive characters
CN109582849A (en) A kind of Internet resources intelligent search method of knowledge based map
CN107180093A (en) Information search method and device and ageing inquiry word recognition method and device
US20160103913A1 (en) Method and system for calculating a degree of linkage for webpages
CN105095175A (en) Method and device for obtaining truncated web title
CN106599215A (en) Question generation method and question generation system based on deep learning
CN106874335A (en) Behavioral data processing method, device and server
US20160188537A1 (en) Suggesting patterns in unstructured documents
US20150302090A1 (en) Method and System for the Structural Analysis of Websites
CN103605744B (en) The analysis method and device of site search engine data on flows
US20110295861A1 (en) Searching using taxonomy
CN105786810A (en) Method and device for establishment of category mapping relation
CN104731948B (en) High-quality image search resource collection method and device
CN107908649A (en) A kind of control method of text classification
CN104317903B (en) The recognition methods of the chapters and sections integrality of chapters and sections formula text and device
Strona et al. Predicting what helminth parasites a fish species should have using parasite co-occurrence modeler (PaCo)
CN103744852B (en) Snap processing method, snapshot display method, server, browser and system
CN106649750B (en) Searching method and device for multi-meaning term entry

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170503

Termination date: 20210331