CN103226601B - A kind of method and apparatus of picture searching - Google Patents

A kind of method and apparatus of picture searching Download PDF

Info

Publication number
CN103226601B
CN103226601B CN201310148051.4A CN201310148051A CN103226601B CN 103226601 B CN103226601 B CN 103226601B CN 201310148051 A CN201310148051 A CN 201310148051A CN 103226601 B CN103226601 B CN 103226601B
Authority
CN
China
Prior art keywords
query
interest
point
search
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310148051.4A
Other languages
Chinese (zh)
Other versions
CN103226601A (en
Inventor
黄际洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310148051.4A priority Critical patent/CN103226601B/en
Publication of CN103226601A publication Critical patent/CN103226601A/en
Application granted granted Critical
Publication of CN103226601B publication Critical patent/CN103226601B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of method and apparatus of picture searching, wherein method includes search phase on excavation phase and line under line, online lower excavation phase executes respectively for each query in search behavior log: S11, the related query that current query and current query are collected from search behavior log constitute the collection of search terms of current query;S12, the query for expressing identical semanteme in the collection of search terms of current query is normalized to a point of interest, obtains each point of interest of current query;S13, each point of interest of current query is stored in interest point data base;The search phase includes: S21, inquiry interest point data base on line, determines the point of interest of user query currently entered;The picture search result of the point of interest of S22, acquisition user query currently entered, shows the picture search result of each point of interest and each point of interest in the search results pages of user query currently entered.The present invention can meet the picture searching demand of user more quickly, save Internet resources.

Description

A kind of method and apparatus of picture searching
[technical field]
The present invention relates to information search technique field, in particular to a kind of method and apparatus of picture searching.
[background technique]
With the continuous development of computer and network technologies, search engine is increasingly becoming the important hand that people obtain information Section, picture searching are also more and more widely used as one of which.Such as when user is in the search box of photographic search engine Middle some search terms (query) of input, photographic search engine search include in text around picture the keyword picture, then User is returned to after the picture found is ranked up.
Photographic search engine is when returning to picture search result in the prior art, is the picture that will retrieve according to correlation Magnanimity after being ranked up as a result, whether it is picture that oneself is needed that user needs to check one by one from search result, such as when When user inputs " Liu ", the relevant picture with Liu found is returned to, as shown in fig. 1.
On the one hand, user checks whether picture is that oneself is wanted one by one from the picture search result of magnanimity, toward contact It needs to carry out multiple page turning, has not only wasted user time but also wasted Internet resources.
On the other hand, user generally requires to try to figure out the query that will be inputted when scanning for accurately to describe as far as possible The demand of oneself, but often many times user can not once accurately input, but after once input query, it checks The picture search result of return, it is defeated again by extension query or transformation query etc. if the needs of not meeting oneself Enter a query, then check the picture search result of return again, such process can be carried out repeatedly, equally both waste user Time wastes Internet resources again.Such as after user inputs " Liu ", finds picture search result and do not meet the need of oneself It asks, transformation query such as " Liu's living photo ", " Liu's concert ", " wife Liu " etc. will be attempted.
[summary of the invention]
In view of this, the present invention provides a kind of method and apparatus of picture searching, in order to meet user more quickly Picture searching demand, save Internet resources.
Specific technical solution is as follows:
A kind of method of picture searching, this method comprises:
Excavation phase under line executes respectively for each query in search behavior log:
S11, the related query that current query and current query are collected from search behavior log are constituted current The collection of search terms of query;
S12, the query for expressing identical semanteme in the collection of search terms of current query is normalized to a point of interest, obtained To each point of interest of current query;
S13, each point of interest of current query is stored in interest point data base;
Search phase on line:
S21, the inquiry interest point data base, determine the point of interest of user query currently entered;
The picture search result of the point of interest of S22, the acquisition user query currently entered, currently inputs in user Query search results pages in show the picture search result of each point of interest and each point of interest.
A preferred embodiment according to the present invention, the related query of the current query include: the current query The query of synonymous query, the query comprising the current query and the synonymous query comprising the current query.
A preferred embodiment according to the present invention, the current query and current of being collected from search behavior log The related query of query are as follows:
From in search behavior log include the current query session Session in collect the current query and The related query of current query.
A preferred embodiment according to the present invention, in the step S12 further include: according to point of interest institute source query Searching times determine the search temperature of each point of interest;
The search temperature of each point of interest is further stored in interest point data base in the step S13;
In the step S22, the search temperature in described search result page according to each point of interest clicks through each interest Row sequence.
A preferred embodiment according to the present invention, in the step S12 further include: determine searching for the current query Classification belonging to each point of interest in rope item set;
Further by classification storage belonging to each point of interest in interest point data base in the step S13;
In the step S22, classification belonging to each point of interest is further showed in described search result page.
A preferred embodiment according to the present invention, in described search result page, according to user historical search behavior and At least one of search temperature respectively classified is ranked up each classification;
The search temperature respectively classified is determined by the searching times of the query in each point of interest institute source in classification.
A preferred embodiment according to the present invention determines the interest of user query currently entered in the step S21 It puts and includes:
It inquires in the interest point data base and expresses identical semanteme with the presence or absence of with user query currently entered Query, if so, determining the query's for expressing identical semanteme in the interest point data base with user query currently entered Point of interest.
A preferred embodiment according to the present invention, determines whether two query express the mode of identical semanteme specifically:
Two query are segmented and are removed the processing of stop words;
By treated, two query are compared, if a query is language than the part that another query has more Adopted redundancy phrase, then it is assumed that the two query express identical semanteme, or if different piece is synonym in two query, Then think that the two query express identical semanteme.
A preferred embodiment according to the present invention, if there is no currently entered with user in the interest point data base Query expresses the query of identical semanteme, then inquires in the interest point data base with the presence or absence of currently entered with user Query expresses the point of interest of identical semanteme, executes step S23 if so, going to;
The picture search result of the point of interest of identical semanteme is expressed simultaneously with user query currently entered described in S23, acquisition Show in the search results pages of user query currently entered.
A preferred embodiment according to the present invention, in the step S23 further include: determine in interest point data base with Family query currently entered expresses other points of interest in classification belonging to the point of interest of identical semanteme, in described search result page In further show the picture search results of other points of interest, wherein will language identical as user's query expression currently entered Before the picture search result of the point of interest of justice comes most.
A kind of device of picture searching, the device include that search unit on unit and line is excavated under line;
Wherein unit is excavated under the line to be respectively processed for each query in search behavior log, comprising:
Collection of search terms determines subelement, for collecting current query's and current query from search behavior log Related query constitutes the collection of search terms of current query;
Interest point extraction subelement, for the query normalizing of identical semanteme will to be expressed in the collection of search terms of current query A point of interest is turned to, each point of interest of current query is obtained, each point of interest of current query is stored in interest point data Library;
Search unit includes: on the line
Point of interest determines subelement, for inquiring the interest point data base, determines that user query's currently entered is emerging Interesting point;
Picture searching subelement, the picture search result of the point of interest for obtaining user query currently entered, Show the picture search result of each point of interest and each point of interest in the search results pages of user query currently entered.
A preferred embodiment according to the present invention, the related query of the current query include: the current query The query of synonymous query, the query comprising the current query and the synonymous query comprising the current query.
A preferred embodiment according to the present invention, described search item set determine that subelement includes from search behavior log The related query of the current query and current query are collected in the Session of the current query.
A preferred embodiment according to the present invention, the interest point extraction subelement are also used to according to point of interest institute source The searching times of query determine the search temperature of each point of interest, and the search temperature of each point of interest is further stored in interest Point data base;
The picture searching subelement is also used to the search temperature in described search result page according to each point of interest to each Point of interest is ranked up.
A preferred embodiment according to the present invention, the interest point extraction subelement are also used to determine the current query Collection of search terms in classification belonging to each point of interest, and further by classification storage belonging to each point of interest in interest point data Library;
The picture searching subelement is also used to show classification belonging to each point of interest in described search result page.
A preferred embodiment according to the present invention, in described search result page, according to user historical search behavior and At least one of search temperature respectively classified is ranked up each classification;
The search temperature respectively classified is determined by the searching times of the query in each point of interest institute source in classification.
A preferred embodiment according to the present invention, the point of interest determine that subelement is determining that user is currently entered It is specific to execute when the point of interest of query:
It inquires in the interest point data base and expresses identical semanteme with the presence or absence of with user query currently entered Query, if so, determining the query's for expressing identical semanteme in the interest point data base with user query currently entered Point of interest.
A preferred embodiment according to the present invention, when determining whether two query express the mode of identical semanteme, specifically In the following ways:
Two query are segmented and are removed the processing of stop words;
By treated, two query are compared, if a query is language than the part that another query has more Adopted redundancy phrase, then it is assumed that the two query express identical semanteme, or if different piece is synonym in two query, Then think that the two query express identical semanteme.
A preferred embodiment according to the present invention, if there is no currently entered with user in the interest point data base Query expresses the query of identical semanteme, then the point of interest, which determines that subelement is further inquired in the interest point data base, is It is no to there is the point of interest that identical semanteme is expressed with user query currently entered;
The picture searching subelement determines that subelement inquires in the interest point data base and exists in the point of interest When expressing the point of interest of identical semanteme with user query currently entered, obtains described and user query currently entered and express The picture search result of the point of interest of identical semanteme simultaneously shows in the search results pages of user query currently entered.
A preferred embodiment according to the present invention, the picture searching subelement are also used to determine in interest point data base Other points of interest in classification belonging to the point of interest of identical semanteme are expressed with user query currently entered, in described search knot Further show the picture search result of other points of interest in fruit page, wherein phase will be expressed with user query currently entered Before the picture search result of same semantic point of interest comes most.
As can be seen from the above technical solutions, the present invention excavates point of interest possessed by query in advance, then in user When searching for the query, the picture search result of the point of interest of the query is obtained, and by each point of interest of the query and each emerging The picture search result of interest point is presented in the search results pages of the query, is wished to so that user be facilitated to be quickly found out Picture meets the picture demand of user, without checking or converting repeatedly input one by one in rambling mass picture result Query scan for, saved Internet resources.
[Detailed description of the invention]
Fig. 1 is the picture search result that input " Liu " is returned in the prior art;
Fig. 2 is system construction drawing applied by the present invention;
Fig. 3 is the method flow diagram that the embodiment of the present invention one provides;
Fig. 4 is the instance graph for the picture search result that the embodiment of the present invention one provides;
Fig. 5 is structure drawing of device provided by Embodiment 2 of the present invention.
[specific embodiment]
To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments The present invention is described in detail.
By it has been observed that user has usually contained a variety of points of interest, for example searched for " Liu when searching for certain query So-and-so " after, " wife Liu ", " Liu girl friend ", " Liu's concert ", " Liu's living photo " can be searched for toward contact Deng on the one hand illustrating once to search for user and not obtaining the picture oneself wished to, and then constantly convert query and searched On the other hand rope also just illustrating at " Liu " this query there are a variety of points of interest, and user usually only to wherein certain A kind of or several point of interest has demand.Therefore, core of the invention thought is exactly to excavate interest possessed by query in advance Point carries out showing for picture search result according to point of interest possessed by the query then when user searches for the query, from And user is facilitated to be quickly found out the picture wished to.That is, the invention mainly comprises two stage realizations: the first stage It is the query point of interest mining process under line, second stage is the query picture searching on line.Below by specific embodiment The present invention will be described in detail.
System applied by the present invention is briefly described first, as shown in Fig. 2, when user inputs in search box After query, the query that user inputs is sent to search server by the browser or client of user equipment, by search service Device returns to search result to the browser or client of user equipment after carrying out picture searching, will be searched for by browser or client Result presentation is to user.Method provided by the invention is completed by the device that search server end is arranged in.
Embodiment one,
Fig. 3 is the method flow diagram that the embodiment of the present invention one provides, and this method mainly includes two stages, wherein step 301 be the query point of interest mining process under the line of first stage to step 304, i.e., for each in search behavior log Query executes step 301 to step 304 respectively, and step 305 to step 306 is the query picture searching on the line of second stage Process.It is as shown in Figure 3:
Step 301: the related query of the query of query and user's search, structure are collected from search behavior log At the collection of search terms of the query.
The related query of certain query can include but is not limited to: the synonymous query of certain query, comprising it is described certain The query of the query of query and the synonymous query comprising certain query.
Preferably, when collecting query and correlation query for certain query, can only include from search behavior log It is collected in the session (session) of certain query.For example, determining search terms being collected for " Liu " It when set, is only collected from the session comprising " Liu ", although certain session is contained with " Liu is old Mother-in-law " this correlation query then cannot be used for collecting simultaneously structure but if in the session not including " Liu " this query At the collection of search terms of " Liu ".These session are not limited to the session of same user, more need different use The session at family.
It gives one example, it is assumed that there are 5 session to include query " Liu " in User action log, and this 5 The search behavior of the query " Liu " and its correlation query that include in session such as institute in table 1, table 2, table 3, table 4 and table 5 Show.
Table 1
Time query
22:51:25 Liu
22:52:27 The son of Liu
22:53:07 The daughter of Liu
22:53:45 Wife Liu
22:54:17 Liu former girlfriend
Table 2
Time query
22:42:41 Wife Liu so-and-so
22:43:15 The wife of Liu
22:43:30 Wife Liu
22:44:05 Liu former girlfriend
22:47:26 Liu
22:48:27 Appearance at Liu 20 years old
22:49:03 The not mature photograph of Liu
Table 3
Time query
10:32:03 Liu's concert
10:54:09 00 concert of Liu
10:54:23 01 concert of Liu
11:17:36 02 concert of Liu
11:18:40 The concert of Liu Shanghai
11:28:22 Wife Liu
11:42:24 Liu
Table 4
Time query
10:32:03 Liu's older picture
10:54:09 Liu
10:55:17 Liu's childhood shines
10:55:56 Photo when teenager Liu
Table 5
Query shown in so above-mentioned table 1 to table 5 is just collected the collection of search terms of composition " Liu ", searches at this It also will record the searching times of each query in rope item set.
Step 302: the query that identical semanteme is expressed in the collection of search terms of the query is normalized to a point of interest, Obtain each point of interest of the query.
Normalized process may include: to be segmented and removed stop words to each query in collection of search terms first Processing;Then by treated, each query is compared, if a query is than the part having more another query Semantic redundancy phrase, then it is assumed that the two query express identical semanteme, or if different piece is synonymous in two query Synonym in dictionary, then it is assumed that the two query express identical semanteme;Wherein so-called semantic redundancy phrase can be weak restriction Word, qualifier or preposition etc..For example, if what a query had more than another query is such as number, place name, name Etc. proper nouns, these proper nouns be usually the weak determiner or qualifier of context, do not have big shadow usually to semantic It rings, therefore, it is considered that being semantic redundancy phrase.The query for expressing identical semanteme is normalized to identical statement, such as one The case where query has more part than another query can be normalized to shorter query, in then thesaurus The case where synonym, can be normalized in the word that thesaurus is specified, also be not excluded for other certainly and be normalized to identical table The mode stated, is only for example herein.Each statement obtained after normalization characterizes each point of interest, the search of each point of interest after normalization Number is to obtain the sum of the searching times of each query of the point of interest.
For example, " wife Liu " and " wife of Liu so-and-so " the two query are segmented and are removed After stop words, " wife Liu " and " wife Liu so-and-so " is obtained, wherein " wife Liu so-and-so " is than " Liu The part " certain so-and-so " that wife " has more is a proper noun, then it is assumed that the two query are identical semanteme, are all normalized to " wife Liu " is used as a point of interest.
For another example for " wife Liu " and " wife Liu " the two query, different parts " wife " and " wife " belongs to synonym, then it is assumed that the two query are identical semanteme, are all normalized to " wife Liu ".
In addition, the affiliated classification of each point of interest in collection of search terms can be further determined that in this step, to obtain Point of interest under the corresponding each classification of the query.In this case, the realization of this step can use following two mode:
First way: first classify to each query in the collection of search terms of the query, by table in each classification Query up to identical semanteme is normalized to a point of interest.
The second way: first by the query that identical semanteme is expressed in the collection of search terms of the query be normalized to one it is emerging It is interesting, then obtained each point of interest is classified.
Wherein sorting technique has been the technology of existing comparative maturity, can use such as maximum entropy classifiers, supporting vector The classifiers such as machine (SVM) can build corresponding classification system according to application demand and effect requirements in practical applications, Classifier is trained according to the classification system.Classifier can be trained using the training corpus marked in advance, according to training Good classifier can be realized as the classification of each query or point of interest.
Since first way is the classification carried out before query is normalized, classifying quality is more accurate, because This preferred first way connects upper example in the first manner herein:
Classify to each query in the collection of search terms that each query is constituted into table 5 of table 1, classification results such as table 6 It is shown.
Table 6
Then the query for expressing identical semanteme is normalized to a point of interest in each classification, each point of interest is with normalizing Change obtained description to be characterized, the results are shown in Table 7, and wherein the searching times of point of interest are obtain the point of interest each The sum of searching times of query.
Table 7
Step 303: the searching times according to query determine the search temperature of each point of interest.
Here the search temperature of each point of interest is determined primarily to the subsequent search result row in picture search process Sequence will specifically describe in the next steps.It should be noted that if in the sequence of subsequent search result and not based on point of interest Search temperature, then can not execute this step.
The search temperature of point of interest can be with are as follows: all points of interest searches in the searching times and collection of search terms of the point of interest The ratio of the sum of rope number.
If classification belonging to each point of interest has been determined in step 302, each classification is further determined that in this step Search for temperature, the search temperature of classification can be with are as follows: in the classification the sum of searching times of all points of interest in collection of search terms The ratio of the sum of the searching times of all points of interest.
To in table 7 classification and point of interest scan for temperature calculate after, calculated result is as shown in table 8.
Table 8
Step 304: the point of interest of the query excavated and corresponding temperature information are stored in interest point data Library.
After the excavation for carrying out above-mentioned steps 301 and step 304, many query can be excavated from search behavior log Point of interest, the temperature information of the point of interest of each query and each point of interest is stored in interest point data base.
If it is determined that classification belonging to each point of interest, then by belonging to the point of interest of each query, each point of interest points Class and each point of interest and the search temperature of classification are stored in interest point data base.
The temperature information of the point of interest of each query and each point of interest can store as a file, as query_ interests.As shown in table 9.
Table 9
It should be noted is that there are it to correspond to point of interest and each interest for " Liu " this query The file liumoumou_interests that the temperature information of point is constituted.For " wife Liu " this query, execute above-mentioned After step 301 to 304 excavation, there is also the texts that the temperature information of the point of interest of " wife Liu " and each point of interest is constituted Part liumoumou ' swife_interests.That is, a certain query may be as the point of interest of other query, it can also Can have the point of interest of oneself.
Step 305: after receiving the query of user's input, inquiring interest point data base, determine the interest of the query Point.
It is actually to inquire to whether there is language identical as the query expression that user inputs in interest point data base in this step The query of justice determines in interest point data base and expresses the corresponding point of interest of the query of identical semanteme with the query of user's input, Then step 306 is executed.If there is no the query for expressing identical semanteme with the query that user inputs, then interest point is inquired According to library with the presence or absence of the point of interest for expressing identical semanteme with the query that user inputs, determine defeated with user in interest point data base The query entered expresses query belonging to the point of interest and the point of interest of identical semanteme.If there is no what is inputted with user Query expresses the query and point of interest of identical semanteme, then the picture of the query is returned according to way of search in the prior art Search result.
Determine two query whether express identical semanteme method it is identical as the method described in step 302, i.e., to two After query is segmented and removed stop words, if a query is semantic superfluous than the part that another query has more Remaining phrase, then it is assumed that the two query express identical semanteme, or if different piece is in thesaurus in two query Synonym, then it is assumed that the two query express identical semanteme.Wherein so-called semantic redundancy phrase can be weak determiner, repair Excuse or preposition etc..For example, if what a query had more than another query is that number, place name, name etc. are proprietary Noun, these proper nouns are usually the weak determiner or qualifier of context, usually do not have big influence to semanteme, therefore It is considered semantic redundancy phrase.
Step 306: obtaining the search result of each point of interest, the search temperature according to each point of interest arranges each point of interest Sequence, the ranking results according to each point of interest show the picture search result of each point of interest and each point of interest currently defeated in user Enter in the search results pages of query.
That is, the picture search result returned through the embodiment of the present invention is actually the picture searching of each point of interest As a result, for example user inputs " Liu ", return is no longer picture searching knot that search engine scans for " Liu " Fruit, but after being scanned for each point of interest of " Liu ", then carry out the picture search result after integration sequence.
When returning to search result, since a query usually corresponds to multiple points of interest, it is therefore desirable to be clicked through to each interest Row sequence.Search temperature in the step 306 of the present embodiment using each point of interest is foundation, it is well understood that according to search temperature Sequence from high to low is ranked up.This mode is roughly the same it is assumed that i.e. based on search interest of the user for picture The interested thing of most users, a new user are often also interested.
In addition to this it is possible to classification belonging to the corresponding each query of the query be further determined that, in search results pages Show classification belonging to each point of interest simultaneously.The search temperature that classification can be wherein first depending on is ranked up each classification, so Each point of interest is ranked up according to the search temperature of each point of interest in each classification afterwards.According to each classification in search results pages Sequence show the search result of each point of interest in each classification, the search temperature in each classification according to each point of interest shows each emerging The search result of interest point.Wherein due to the limitation of page resource, the classification number of display can be limited, each point of interest in each classification Number and each point of interest search result number, for not shown content can user click result of page searching In " more " option when shown.
As shown in Figure 4, user inputs " Liu " in the search box of picture searching, and that classifies belonging to each point of interest searches Rope temperature is from high to low successively are as follows: kith and kin, amusement, different times photo and moulding, each point of interest in the classification of " kith and kin " Search for temperature from high to low successively are as follows: wife Liu, Liu former girlfriend, Liu's older picture, son Liu, Liu Daughter;The search temperature of each point of interest is from high to low in the classification of " amusement " are as follows: Liu's concert;In " different times photograph The search temperature of each point of interest is from high to low in the classification of piece " are as follows: Liu's childhood shines, Liu it is not mature shine, Liu it is juvenile When photo, Liu 20 years old when appearance;The search temperature of each point of interest is from high to low in the classification of " moulding " are as follows: Liu It tatoos, Liu's bob.Assuming that 2 points of interest are at most shown in page limitation at most 4 classification of display and each classification, then The picture search result of return is as shown in Figure 4.Respectively classify in figure and the display format of point of interest is by taking the form of " axis " as an example, this hair It is bright specific display format not to be limited.It can be seen that user knows each figure in which can be convenient from this search results pages The point of interest of piece belongs to and classification ownership, to help user to position the picture wished to, without in rambling sea It is checked one by one in spirogram piece;In addition, if user be it is interested in some point of interest in query, although such as user Query " Liu " is had input, but is actually intended to obtain the picture concerned of " wife Liu ", without converting input repeatedly Query, it is only necessary to be found in the corresponding picture search result of " wife Liu " this point of interest.
In addition, other than according to the search temperature of classification, there is also other sequence sides when being ranked up to classification Formula, such as historical search behavior according to the user determine user to the interest level of each classification, according to user to each classification Interest level to it is each classification be ranked up, such as can the historical search behavior in advance to user model, determine use To the interest level of classification belonging to each point of interest, this mode is actually to use personalized sortord at family.It can also To combine search temperature and the user of each classification to be ranked up the interest level of each classification to each classification.If certainly with Search history is not present in family, then is not necessarily to consider the ordering factor of user's history search behavior.
If determining that there is no the query for expressing identical semanteme with the query that user inputs in step 305, and determine Express the point of interest of identical semanteme in interest point data base with the query of user's input out, that is to say, that user's input Query is a deterministic point of interest, such as user has input " Liu former girlfriend ", and that hit is the one of " Liu " A point of interest then can directly show the search result of " Liu former girlfriend " in search results pages, in addition to this it is possible to Show the corresponding picture search result of other points of interest in classification belonging to the point of interest simultaneously, will be inputted with user in the sequence Before the point of interest that query expresses identical semanteme comes most, the search of the sort by point of interest of other points of interest in same classification Temperature is ranked up.I.e. other than the search result of return " Liu former girlfriend ", same classification " Liu can also be returned simultaneously So-and-so kith and kin " in other points of interest " wife Liu ", " daughter Liu " picture search result.In addition to this it is possible to The search result for showing each point of interest in other classification simultaneously in search results pages, expresses phase for the query inputted with user Before being come most with classification belonging to semantic point of interest, other classification according to classification search temperature and/or user's history behavior into Row sequence.
It is the description carried out to method provided by the present invention above, below by embodiment two to dress provided by the present invention It sets and is described.
Embodiment two,
Fig. 5 is structure drawing of device provided by Embodiment 2 of the present invention, which is set to search server end, specifically includes Search unit 10 on unit 00 and line is excavated under line, is excavated unit 00 under middle line and is dug for realizing the query point of interest under line It digs, search unit 10 is on line for realizing the query picture searching on line.
Unit 00 is excavated under its middle line to be respectively processed for each query in search behavior log, it respectively will be each A query is excavated as current query, and specifically, under line excavating unit 00 includes: that collection of search terms determines subelement 01 With interest point extraction subelement 02.
Collection of search terms determines subelement 01 for collecting current query and current query from search behavior log Related query, constitute the collection of search terms of current query.
Wherein the related query of current query includes: the synonymous query of current query, comprising current query The query of query and the synonymous query comprising current query.
Preferably, collection of search terms determines subelement 01 from the session in search behavior log comprising current query Collect the related query of current query and current query.These session are not limited to the session of same user, More need the session of different user.
In addition to this, the searching times of each query also be will record in the collection of search terms.
Interest point extraction subelement 02 is for returning the query for expressing identical semanteme in the collection of search terms of current query One turns to a point of interest, obtains each point of interest of current query, and each point of interest of current query is stored in interest point According to library.
Normalized process may include: to be segmented and removed stop words to each query in collection of search terms first Processing;Then by treated, each query is compared, if a query is than the part having more another query Semantic redundancy phrase, then it is assumed that the two query express identical semanteme, or if different piece is synonymous in two query Synonym in dictionary, then it is assumed that the two query express identical semanteme;The query for expressing identical semanteme is normalized to phase It the case where with stating, such as having more part than another query for a query, can be normalized to shorter Query the case where to synonym in then thesaurus, can be normalized, certainly in the word that thesaurus is specified Other modes for being normalized to identical statement are not excluded for, are only for example herein.Each statement obtained after normalization characterizes each interest Point, the searching times of each point of interest are to obtain the sum of the searching times of each query of the point of interest after normalization.
Search unit 10 specifically includes on line: point of interest determines subelement 11 and picture searching subelement 12.
When the user that search server gets the browser from user equipment or client transmission is currently entered After query, point of interest determines that subelement 11 inquires interest point data base, determines the point of interest of user query currently entered.? Exactly it is to inquire in interest point data base with the presence or absence of the query for expressing identical semanteme with the query that user inputs, determines interest The corresponding point of interest of the query of identical semanteme is expressed with the query of user's input in point data base.
Point of interest determines that subelement 11 determines whether two query express the method and interest point extraction list of identical semanteme Method used by member 02 is identical, i.e., after being segmented to two query and remove stop words, if a query is than another The part that an outer query has more is semantic redundancy phrase, then it is assumed that the two query express identical semanteme, or if two Different piece is the synonym in thesaurus in a query, then it is assumed that the two query express identical semanteme.
Then picture searching subelement 12 obtains the picture search result of the point of interest of user query currently entered, Show the picture search result of each point of interest and each point of interest in the search results pages of user query currently entered.Also It is to say, is actually the picture search result of each point of interest by the picture search result returned at this time.
Since a query would generally correspond to multiple points of interest, it is therefore desirable to each point of interest is ranked up, sequence according to According to the search temperature that can be each point of interest.At this point, above-mentioned interest point extraction subelement 02 is also used to according to point of interest institute source The searching times of query determine the search temperature of each point of interest, and the search temperature of each point of interest is further stored in interest Point data base.Wherein the search temperature of point of interest can be with are as follows: is interested in the searching times and collection of search terms of the point of interest The ratio of the sum of the searching times of point.
Picture searching subelement 12 is also used to the search temperature in search results pages according to each point of interest to each point of interest It is ranked up, i.e., is ranked up according to the sequence of search temperature from high to low.This mode is searching for picture based on user Rope interest is roughly the same it is assumed that i.e. the interested thing of most users, a new user are often also interested.
In addition to this, interest point extraction subelement 02 can be also used for determining each emerging in the collection of search terms of current query Classification belonging to interest point, and further by classification storage belonging to each point of interest in interest point data base.It specifically, can be first right Each query in the collection of search terms of the query classifies, and normalizes the query for expressing identical semanteme in each classification For a point of interest;Can also first by the query that identical semanteme is expressed in the collection of search terms of the query be normalized to one it is emerging It is interesting, then obtained each point of interest is classified.
Wherein sorting technique has been the technology of existing comparative maturity, can be using the classification such as maximum entropy classifiers, SVM Device can build corresponding classification system according to application demand and effect requirements in practical applications, according to the classification system Train classifier.Classifier can be trained using the training corpus marked in advance, according to trained classifier energy Enough realize the classification of each query or point of interest.In search, picture searching subelement 12 shows each interest in search results pages Classification belonging to point.
Preferably for there is the case where classification, picture searching subelement 12 can be in search results pages, according to user Historical search behavior and at least one of the search temperature of each classification each classification is ranked up, specific sortord can be with Referring to the associated description of step 306 in embodiment one.
For interest point extraction subelement 02 in the search temperature for determining each classification, the search temperature respectively classified can be by classifying In the searching times of query in each point of interest institute source determine that specifically, the search temperature of classification can be with are as follows: institute in the classification Ratio of the sum of the searching times of interesting point with the sum of the searching times of points of interest all in collection of search terms.
There is also it is following the fact that: if be not present in interest point data base and user query currently entered is expressed The query of identical semanteme, then point of interest determines that subelement 11 is further inquired to whether there is in interest point data base and works as with user The query of preceding input expresses the point of interest of identical semanteme.It is emerging that picture searching subelement 12 in point of interest determines that subelement inquires Exist when expressing the point of interest of identical semanteme with user query currently entered in interesting point data base, it is available to work as with user The query of preceding input expresses the picture search result of the point of interest of identical semanteme and shows in user query's currently entered In search results pages.
Picture searching subelement 12 can also further determine that in interest point data base with user's query table currently entered Other points of interest in classification belonging to point of interest up to identical semanteme, further show other points of interest in search results pages Picture search result, wherein by the picture search result for the point of interest for expressing identical semanteme with user query currently entered Before coming most.In addition to this it is possible to show the search result of each point of interest in other classification simultaneously in search results pages, it will Other search classified according to classification before classification comes most are expressed belonging to the point of interest of identical semanteme with the query of user's input Temperature and/or user's history behavior are ranked up.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various It can store the medium of program code.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims (20)

1. a kind of method of picture searching, which is characterized in that this method comprises:
Excavation phase under line, using each query in search behavior log as current query to execute:
S11, the related query that current query and current query are collected from search behavior log, constitute current query's Collection of search terms;
S12, the query for expressing identical semanteme in the collection of search terms of current query is normalized to a point of interest, is worked as Each point of interest of preceding query;
S13, each point of interest of current query is stored in interest point data base;
Search phase on line:
S21, the inquiry interest point data base, determine the point of interest of user query currently entered;
The picture search result of the point of interest of S22, the acquisition user query currently entered, it is currently entered in user Show the picture search result of each point of interest and each point of interest in the search results pages of query.
2. the method according to claim 1, wherein the related query of the current query includes: described works as The synonymous query of preceding query, the query comprising the current query and the synonymous query's comprising the current query query。
3. method according to claim 1 or 2, which is characterized in that described to collect current query from search behavior log And the related query of current query are as follows:
From include in search behavior log the current query session Session in collect the current query and current The related query of query.
4. the method according to claim 1, wherein in the step S12 further include: come according to point of interest The searching times of source query determine the search temperature of each point of interest;
The search temperature of each point of interest is further stored in interest point data base in the step S13;
In the step S22, the search temperature in described search result page according to each point of interest arranges each point of interest Sequence.
5. method according to claim 1 or 4, which is characterized in that in the step S12 further include: work as described in determining Classification belonging to each point of interest of preceding query;
Further by classification storage belonging to each point of interest in interest point data base in the step S13;
In the step S22, classification belonging to each point of interest is further showed in described search result page.
6. according to the method described in claim 5, it is characterized in that, the history according to user is searched in described search result page At least one of Suo Hangwei and the search temperature of each classification are ranked up each classification;
The search temperature respectively classified is determined by the searching times of the query in each point of interest institute source in classification.
7. the method according to claim 1, wherein determining that user is currently entered in the step S21 The point of interest of query includes:
It inquires with the presence or absence of the query for expressing identical semanteme with user query currently entered in the interest point data base, such as Fruit is the point of interest for determining the query for expressing identical semanteme in the interest point data base with user query currently entered.
8. method according to claim 1 or claim 7, which is characterized in that determine whether two query express the side of identical semanteme Formula specifically:
Two query are segmented and are removed the processing of stop words;
By treated, two query are compared, if a query is semantic superfluous than the part that another query has more Remaining phrase, then it is assumed that the two query express identical semanteme, or if different piece is synonym in two query, recognize Identical semanteme is expressed for the two query.
9. the method according to the description of claim 7 is characterized in that if there is no work as with user in the interest point data base The query of preceding input expresses the query of identical semanteme, then inquires in the interest point data base with the presence or absence of currently defeated with user The query entered expresses the point of interest of identical semanteme, executes step S23 if so, going to;
The picture search result of the point of interest of identical semanteme is expressed with user query currently entered described in S23, acquisition and is showed In the search results pages of user query currently entered.
10. according to the method described in claim 9, it is characterized in that, in the step S23 further include: determine interest point According to other points of interest expressed in library with user query currently entered in classification belonging to the point of interest of identical semanteme, described Further show the picture search result of other points of interest in search results pages, wherein will be with user query currently entered Express the point of interest of identical semanteme picture search result come most before.
11. a kind of device of picture searching, which is characterized in that the device includes excavating search unit on unit and line under line;
Unit is wherein excavated under the line to handle each query in search behavior log as current query, is wrapped It includes:
Collection of search terms determines subelement, for collecting the correlation of current query and current query from search behavior log Query constitutes the collection of search terms of current query;
Interest point extraction subelement, for the query for expressing identical semanteme in the collection of search terms of current query to be normalized to One point of interest obtains each point of interest of current query, and each point of interest of current query is stored in interest point data base;
Search unit includes: on the line
Point of interest determines subelement, for inquiring the interest point data base, determines the interest of user query currently entered Point;
Picture searching subelement, the picture search result of the point of interest for obtaining user query currently entered, with Show the picture search result of each point of interest and each point of interest in the search results pages of family query currently entered.
12. device according to claim 11, which is characterized in that the related query of the current query includes: described Synonymous query, the query comprising the current query and the synonymous query comprising the current query of current query Query.
13. device according to claim 11 or 12, which is characterized in that described search item set determines subelement from search The correlation of the current query and current query are collected in Session in user behaviors log comprising the current query query。
14. device according to claim 11, which is characterized in that the interest point extraction subelement is also used to according to emerging The searching times of interest point institute source query determine the search temperature of each point of interest, and further by the search temperature of each point of interest It is stored in interest point data base;
The picture searching subelement is also used to the search temperature in described search result page according to each point of interest to each interest Point is ranked up.
15. device described in 1 or 14 according to claim 1, which is characterized in that the interest point extraction subelement is also used to really Classification belonging to each point of interest of the fixed current query, and further by classification storage belonging to each point of interest in point of interest Database;
The picture searching subelement is also used to show classification belonging to each point of interest in described search result page.
16. device according to claim 15, which is characterized in that the history in described search result page, according to user At least one of search behavior and the search temperature of each classification are ranked up each classification;
The search temperature respectively classified is determined by the searching times of the query in each point of interest institute source in classification.
17. device according to claim 11, which is characterized in that the point of interest determines that subelement is determining that user is current It is specific to execute when the point of interest of the query of input:
It inquires with the presence or absence of the query for expressing identical semanteme with user query currently entered in the interest point data base, such as Fruit is the point of interest for determining the query for expressing identical semanteme in the interest point data base with user query currently entered.
18. device described in 1 or 17 according to claim 1, which is characterized in that determining whether two query express identical language When the mode of justice, specifically in the following ways:
Two query are segmented and are removed the processing of stop words;
By treated, two query are compared, if a query is semantic superfluous than the part that another query has more Remaining phrase, then it is assumed that the two query express identical semanteme, or if different piece is synonym in two query, recognize Identical semanteme is expressed for the two query.
19. device according to claim 17, which is characterized in that if be not present in the interest point data base and user Query currently entered expresses the query of identical semanteme, then the point of interest determines that subelement further inquires the point of interest With the presence or absence of the point of interest for expressing identical semanteme with user query currently entered in database;
The picture searching subelement the point of interest determine subelement inquire in the interest point data base exist with When family query currently entered expresses the point of interest of identical semanteme, obtain described identical as user's query expression currently entered The picture search result of semantic point of interest simultaneously shows in the search results pages of user query currently entered.
20. device according to claim 19, which is characterized in that the picture searching subelement is also used to determine interest Other points of interest in classifying belonging to the point of interest of identical semanteme are expressed with user query currently entered in point data base, Further show the picture search result of other points of interest in described search result page, wherein will be currently entered with user Before the picture search result that query expresses the point of interest of identical semanteme comes most.
CN201310148051.4A 2013-04-25 2013-04-25 A kind of method and apparatus of picture searching Active CN103226601B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310148051.4A CN103226601B (en) 2013-04-25 2013-04-25 A kind of method and apparatus of picture searching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310148051.4A CN103226601B (en) 2013-04-25 2013-04-25 A kind of method and apparatus of picture searching

Publications (2)

Publication Number Publication Date
CN103226601A CN103226601A (en) 2013-07-31
CN103226601B true CN103226601B (en) 2019-03-29

Family

ID=48837046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310148051.4A Active CN103226601B (en) 2013-04-25 2013-04-25 A kind of method and apparatus of picture searching

Country Status (1)

Country Link
CN (1) CN103226601B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902679B (en) * 2014-03-21 2018-07-10 百度在线网络技术(北京)有限公司 Method and apparatus are recommended in search
CN104008180B (en) * 2014-06-09 2017-04-12 北京奇虎科技有限公司 Association method of structural data with picture, association device thereof
CN105159884B (en) * 2015-09-23 2018-06-29 百度在线网络技术(北京)有限公司 The method for building up and device of industry dictionary and industry recognition methods and device
CN109471969A (en) * 2018-10-31 2019-03-15 广东小天才科技有限公司 Application search method, device and equipment
CN111026937B (en) 2019-11-13 2021-02-19 百度在线网络技术(北京)有限公司 Method, device and equipment for extracting POI name and computer storage medium
CN112100480A (en) * 2020-09-15 2020-12-18 北京百度网讯科技有限公司 Search method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102087669A (en) * 2011-03-11 2011-06-08 北京汇智卓成科技有限公司 Intelligent search engine system based on semantic association
CN102254039A (en) * 2011-08-11 2011-11-23 武汉安问科技发展有限责任公司 Searching engine-based network searching method
CN102289436A (en) * 2010-06-18 2011-12-21 阿里巴巴集团控股有限公司 Method and device for determining weighted value of search term and method and device for generating search results
CN102999625A (en) * 2012-12-05 2013-03-27 北京海量融通软件技术有限公司 Method for realizing semantic extension on retrieval request

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009117830A1 (en) * 2008-03-27 2009-10-01 Hotgrinds Canada System and method for query expansion using tooltips

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289436A (en) * 2010-06-18 2011-12-21 阿里巴巴集团控股有限公司 Method and device for determining weighted value of search term and method and device for generating search results
CN102087669A (en) * 2011-03-11 2011-06-08 北京汇智卓成科技有限公司 Intelligent search engine system based on semantic association
CN102254039A (en) * 2011-08-11 2011-11-23 武汉安问科技发展有限责任公司 Searching engine-based network searching method
CN102999625A (en) * 2012-12-05 2013-03-27 北京海量融通软件技术有限公司 Method for realizing semantic extension on retrieval request

Also Published As

Publication number Publication date
CN103226601A (en) 2013-07-31

Similar Documents

Publication Publication Date Title
CN107193803B (en) Semantic-based specific task text keyword extraction method
CN102043833B (en) Search method and device based on query word
US8312034B2 (en) Concept bridge and method of operating the same
US8812534B2 (en) Machine assisted query formulation
CN103226601B (en) A kind of method and apparatus of picture searching
JP6355840B2 (en) Stopword identification method and apparatus
KR20160124079A (en) Systems and methods for in-memory database search
CN109948154B (en) Character acquisition and relationship recommendation system and method based on mailbox names
CN106294358A (en) The search method of a kind of information and system
US20120130999A1 (en) Method and Apparatus for Searching Electronic Documents
JP2013168177A (en) Information provision program, information provision apparatus, and provision method of retrieval service
JP5315726B2 (en) Information providing method, information providing apparatus, and information providing program
US9195940B2 (en) Jabba-type override for correcting or improving output of a model
CN114238735B (en) Intelligent internet data acquisition method
Moumtzidou et al. Discovery of environmental nodes in the web
Gupta et al. Search bot: Search intention based filtering using decision tree based technique
US11726972B2 (en) Directed data indexing based on conceptual relevance
CN112989163A (en) Vertical search method and system
JP4484957B1 (en) Retrieval expression generation device, retrieval expression generation method, and program
CN116738065B (en) Enterprise searching method, device, equipment and storage medium
CN112084290B (en) Data retrieval method, device, equipment and storage medium
CN106156141B (en) Method and device for constructing semantic query word template
CN118113806A (en) Interpretable event context generation method for large model retrieval enhancement generation
Elshater et al. Web service discovery for large scale iot deployments
JP2021047553A (en) Document search device, document search method, and document search program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant