Summary of the invention
Generally, be called that burst is ageing by due to burst and the ageing of hot ticket initiation.From the angle of search service, when occurring the ageing event of burst, usually all along with resource burst and relevant search burst, that is, the remarkable increase of related news and volumes of searches.
Based on above-mentioned cognition, the application carrys out sequence statistic analytical model Time Created by the online related resource of real-time statistics user search and the quantity of relevant search, judges whether user has ageing demand by series model analysis time.Above-mentioned analysis can be white noise verification or the trend analysis based on some default time point.Under the prerequisite not demonstrated mutability by white noise verification or trend analysis, find the catastrophe point in sequence again, ageing intensity and trend is judged, thus as the further guidance of adjusting sequence Search Results by the difference of the sequence data before and after contrast catastrophe point.
So the present invention can judge the ageing demand of user rapidly and accurately when lacking ageing Feature Words.
In one aspect of the invention, disclose a kind of searching method, comprising: receive the searching request of user to obtain search word; Obtain the Search Results, Search Results quantity and the number of searches relevant to described search word that obtain based on described search word; Time series analysis is carried out, to judge the ageing of described search word to the described Search Results quantity in certain hour section and described number of searches; Have ageing in response to the described search word of judgement, as tune sequence foundation, putting in order of described Search Results is adjusted using ageing.
Like this, by only carrying out time series analysis to the quantity of Search Results and the quantity of searching request, just can infer the ageing intention of the user search lacking time word, promoting the accuracy of returned content thus, thus improve the search experience of user.
Preferably, time series analysis is carried out to Search Results quantity and number of searches, can comprise with the step of the ageing feature judging described search word: the Search Results quantity in the described time period and number of searches are temporally divided at interval, generates very first time sequence data; White noise verification is carried out to described very first time sequence data and judges the ageing feature of described search word according to the result of described white noise verification.
Above-mentioned " time interval " can be such as one day, such as the data in two months is daily divided, and obtains very first time sequence data.Like this, by the time series analysis of common white noise verification to existing incremental data, relatively simply realize for searching method of the present invention provides one.
Preferably, white noise verification carried out to very first time sequence data and judge that according to the result of described white noise verification the step of the ageing feature of described search word can comprise: supposing described very first time sequence data x
1, x
2, x
3... x
nqLB statistic meet card side distribution:
Wherein n is the value obtained after the described time interval divides the described time period,
m is degree of freedom,
it is coefficient of autocorrelation; P value in response to described QLB statistic is less than the first threshold of the level of significance as agreement, judges that search word has ageing feature.
Like this, the Q value known by this area and P value carry out white noise verification, relatively simply realize for searching method of the present invention provides one.
Preferably, time series analysis is carried out to described Search Results number, to judge can step comprising of the ageing feature of described search word: select and the default time point of current time at a distance of Different periods, calculate current time to the Search Results quantity M in the day part of default time point
1, M
2..., M
j-1, M
j, wherein j is the number of default time point, and the longest described time period no longer than obtaining Search Results quantity in the described period; Ask for from the most long duration to described time period the described period by the mean value M of each that the described time interval divides
m, calculate M
1, M
2..., M
j-1, M
jwith M
mratio:
In response to there being the value of any one R to be greater than Second Threshold, judge that described search word has ageing feature.
Like this, carry out trend analysis by arranging default time point to existing Search Results incremental data, just judge the ageing of search word by simply calculating, the object thus for realizing the application provides another simple implementation.
Preferably, if it is ageing to judge that described search word has, then time series analysis can also comprise and carries out Singularity detection, characterizes described ageing catastrophe point position to find out.
Ageing owing to not only judging, also specifically find out the position of catastrophe point, just more accurately for user returns Search Results, the search experience of user can be improved thus.
Preferably, Singularity detection can carry out for above-mentioned very first time sequence data, and comprises: find out and make five of following formula values all be greater than the k value of the 3rd threshold value to determine the position of catastrophe point:
diff
1=x
k-x
k-1
diff
2=x
k-x
k-2
diff
3=x
k-x
k-3。
diff
4=x
k+1-x
k-1
diff
5=x
k+2-x
k-1
Like this, judge just can find out catastrophe point position by simple difference, and more accurately can carry out tune sequence to Search Results according to the position of catastrophe point, to return the content more meeting user's needs.
Preferably, can also according to the k value found out, described very first time sequence data is divided into 2 independent time series data S
1=x
1, x
2... x
k-1, S
2=x
k, x
k+1... x
n, and according to S
1and S
2data model judge ageing be enhancing, decay or tend to be steady; And according to ageing be in enhancing, decay or tend to be steady as tune sequence according to adjusting putting in order of described Search Results.
Like this, by the modeling respectively of the data before and after k point, just can judge ageing trend, thus the accurate tune sequence to Search Results further.
According to a further aspect in the invention, disclose a kind of searcher, comprising: receiving element, for receiving the searching request of user to obtain search word; Acquiring unit, for obtaining the Search Results, Search Results quantity and the number of searches relevant to described search word that obtain based on described search word; Time series analysis unit, for carrying out time series analysis, to judge the ageing of described search word to the described Search Results quantity in certain hour section and described number of searches; And adjust sequence unit, for having ageing in response to the described search word of judgement, putting in order of described Search Results is adjusted using ageing as tune sequence foundation.
Preferably, time series analysis unit can also be used for the Search Results quantity in the described time period and number of searches temporally to divide at interval, generates very first time sequence data; And white noise verification is carried out to described very first time sequence data and judges the ageing of described search word according to the result of described white noise verification.
Preferably, time series analysis unit also can be used for: suppose described very first time sequence data x
1, x
2, x
3... x
nqLB statistic meet card side distribution:
Wherein n is the value obtained after the described time interval divides the described time period,
m is degree of freedom,
it is coefficient of autocorrelation; And in response to the first threshold that the P-value of described QLB statistic is less than the level of significance as agreement, judge that described search word has ageing feature.
Preferably, time series analysis unit also can be used for: select and the default time point of current time at a distance of Different periods, calculate current time to the Search Results quantity M in the day part of default time point
1, M
2..., M
j-1, M
j, wherein j is the number of default time point, and the most long duration in the described period is no longer than the described time period obtaining Search Results quantity; Ask for from the most long duration to described time period the described period by the mean value M of each that the described time interval divides
m, calculate M
1, M
2..., M
j-1, M
jwith M
mratio:
In response to there being the value of any one R to be greater than Second Threshold, judge that described search word has ageing feature.
Preferably, if search word has ageing described in described time series analysis unit judges, then described time series analysis unit can also carry out Singularity detection, characterizes described ageing catastrophe point position to find out.
Preferably, five values that described Singularity detection can comprise following formula when finding out all are greater than the k value of the 3rd threshold value to determine the position of catastrophe point:
diff
1=x
k-x
k-1
diff
2=x
k-x
k-2
diff
3=x
k-x
k-3。
diff
4=x
k+1-x
k-1
diff
5=x
k+2-x
k-1
Preferably, described time series analysis unit can also according to the k value found out, described very first time sequence data is divided into 2 independent time series data S
1=x
1, x
2... x
k-1, S
2=x
k, x
k+1... x
n, and according to S
1and S
2data model judge ageing be enhancing, decay or tend to be steady; And adjust sequence unit can according to ageing be in enhancing, decay or tend to be steady as tune sequence according to adjusting putting in order of described Search Results.
According to another aspect of the invention, disclose a kind of search server, comprising: storer, store the searching record of user to search word for the network information with search word association store, receiving trap, for receiving the searching request of user, processor, be connected to described storer and described receiving trap, for obtaining search word from the described searching request received by described receiving trap, the Search Results obtained based on described search word is obtained from storer, Search Results quantity and the number of searches relevant to described search word, time series analysis is carried out to judge the ageing of described search word to the described Search Results quantity in certain hour section and described number of searches, and have ageing in response to the described search word of judgement, putting in order of described Search Results is adjusted using ageing as tune sequence foundation, dispensing device, have adjusted as tune sequence foundation the described Search Results put in order using ageing for sending to the client device of user.
Thus, the support on device is just provided for searching method according to the present invention.
Embodiment
Below with reference to accompanying drawings preferred implementation of the present disclosure is described in more detail.Although show preferred implementation of the present disclosure in accompanying drawing, but should be appreciated that, the disclosure can be realized in a variety of manners and not should limit by the embodiment of setting forth here.On the contrary, provide these embodiments to be to make the disclosure more thorough and complete, and the scope of the present disclosure intactly can be conveyed to those skilled in the art.
Fig. 1 is a kind of according to an embodiment of the invention indicative flowchart of searching method.
In step S110, receive the searching request of user to obtain search word.
In step S120, obtain the Search Results, Search Results quantity and the number of searches relevant to this search word that obtain based on this search word.
In step S130, time series analysis is carried out to the Search Results quantity in certain hour section and number of searches, to judge the ageing of described search word.
In step S140, have ageing in response to the described search word of judgement, as tune sequence foundation, putting in order of described Search Results is adjusted using ageing.
Thus, just can carry out time series analysis to realize the judgement ageing to search word by means of only to the quantity of Search Results in certain hour and the quantity of search, and thus putting in order of Search Results be adjusted accordingly.
Here, time series analysis refer to concept known in this area, i.e. " by one group of observation data arranged in chronological order (being called sequential) and certain parameter model matching and analyze ".
Step S110 is further illustrated.Search engine is all recall corresponding document according to the searching request of user.Generally, the searching request of user is various.If only literally inquired about, miss a large amount of correlated results possibly.Thus, need to carry out certain process to user's request, such as, need some the unessential words removed in searching request, and suitable conversion is carried out to the partial words in user search, increase the data of the result of recalling.
From searching request, obtain the process not main contents of the present invention of search word, do not repeat them here.
In a preferred embodiment, step S130 can comprise and the Search Results quantity in above-mentioned certain hour section and number of searches being divided by certain hour interval, generates very first time sequence data.Above-mentioned " time interval " can be such as one day, and above-mentioned " time period " can be such as two months (for convenience of calculating, being considered as 60 days), therefore such as can nearly bimestrial data daily divide to generate very first time sequence data.Data encasement to Search Results quantity and number of searches will be described in detail in detail in conjunction with example as follows.
The present invention be used for judging the ageing demand of user according to mainly two number certificates, portion is the quantity (such as, the quantity of related news) of the relevent information that user asks, and portion is the quantity that other users carry out similar search in addition.In a preferred embodiment, can in advance by this two number according to building up information index and inquiry log index respectively so that follow-up data statistics and recalling.
In a preferred embodiment, information index can comprise two parts: sky level upgrades index and real-time update index.The Data Source of information index can be such as artificial and the kind subpage frame of the select news of machine, and the link of what reptile was real-time the crawl page on kind of subpage frame, sets up index by the news data crawled.
In a preferred embodiment, inquiry log index can comprise two parts: sky level upgrades index and hour level upgrades index.The data of this inquiry log can from the real-time query daily record of user search on line.
In a preferred embodiment, information index can be the information page (such as, the news pages that reptile crawls) of index in two months from current time, and the value of acquisition is the quantity of the relevent information page.
In a preferred embodiment, inquiry log index can be correspondingly the inquiry log of user searchs all in two months from current time, acquisition be relevant or the quantity of similar inquiry.
In a preferred embodiment, above-mentioned sky level upgrades index can be by 1 day, 2 days, 3 days ... the index upgraded by sky for 60 days.It can be such as upgraded with 1 hour, 3 hours, 6 hours, 9 hours, 12 hours, 15 hours, 18 hours, 21 hours, 24 hours that hour level upgrades.
It should be understood that above about two months, be only the example provided for convenience of description by sky, the restriction of every three hours, those skilled in the art can choose different values according to specific implementation.Obtaining " the certain hour section " of raw data can be any proper time period outside two months, such as two weeks, one month, half a year etc.Can be any time interval outside one day for dividing " time interval " of this certain hour section, such as half a day, every other day, every three days etc." hour level upgrade " can be such as per hour, every two hours or even interval renewal not etc.It is evident that, these changes are all positioned within the scope that the principle of the invention contains.
In a preferred embodiment, the quantity of user being inquired about resource quantity and the relevant search of recalling divides according to certain hour interval, obtains a vector with time index, data is sorted according to the distance from current time, rise time sequence data x
1, x
2, x
3... x
n.Such as, the data of nearly two months (here for convenience of calculating, getting two months is 60 days) daily divide, and obtain time series data x
1, x
2, x
3... x
n, n gets the integer between 1 to 60 here.
When to the very first time, sequence data is analyzed, first following basic assumption is set up: the correlated results of recalling under normal random challenge and relevant inquiring meet normal distribution, the time series generated is white noise, and when inquiry has ageing feature, there is catastrophe point in the time series of recalling, and the distribution of data before catastrophe point and after catastrophe point is obviously different.
Subsequently, can to time series data x
1, x
2, x
3... x
ncarry out the calculating of basic statistics amount.Basic statistics amount can comprise average, variance, autocovariance and coefficient of autocorrelation etc.
Average:
Variance:
Autocovariance:
Coefficient of autocorrelation:
In a preferred embodiment, step S130 can also comprise and carries out white noise verification to described very first time sequence data and judge the ageing of search word according to the result of white noise verification.
In a preferred embodiment, described very first time sequence data x is supposed
1, x
2, x
3... x
nqLB statistic meet card side distribution:
Wherein n is the value obtained after the described time interval divides the described time period,
m is degree of freedom,
it is coefficient of autocorrelation; P value (P-value) in response to described QLB statistic is less than the first threshold of the level of significance as agreement, judges that described search word has ageing feature.
Particularly, the value that can directly use daily level to gather here.When calculate only 30 day data, if today is No. 30, the sequence data so detected be exactly this month No. 30, No. 29, No. 28 ... the data of No. 3, No. 2 and No. 1.
In a preferred embodiment, step S130 can comprise selection and the default time point of current time at a distance of Different periods, calculates current time to the Search Results quantity M in the day part of default time point
1, M
2..., M
j-1, M
j, wherein j is the number of default time point, and the longest described time period no longer than obtaining Search Results quantity in the described period; Ask for from the most long duration to described time period the described period by the mean value M of each that the described time interval divides
m, calculate M
1, M
2..., M
j-1, M
jwith M
mratio:
In response to there being the value of any one R to be greater than Second Threshold, judge that search word has ageing feature.
Particularly, in a preferred embodiment, following five default time points can be selected when original time series comprises the data of nearly two months (such as, 60 days): from current point in time 1 hour, 3 hours, 1 day, 3 days and 7 days.Data in the time period marked off by these five time points are sued for peace.To the 6th period, namely the 7th day to the 60th day, calculate the average M of sequence data basic time in this period
m.In a preferred embodiment, can removing a mxm. and remove a minimum when computation of mean values, there is large departing from the statistics caused to prevent abnormal data and real model.Thus, the average resource quantity of above-mentioned six periods is obtained:
M
h1,M
h3,M
d1,M
d3,M
d7,M
m
Ask for the ratio that the first five period is equivalent to the 6th background periods:
By analyzing above-mentioned ratio data, if having any one to be greater than a certain specific threshold in above-mentioned R value, then think that Search Results embodies ageing feature.
It can thus be appreciated that, time series analysis can be carried out to the Search Results quantity in certain hour section and number of searches, by judging the ageing of search word to the white noise verification of Search Results quantity and number of searches.Also can carry out the trend analysis of default time point to Search Results quantity, supplementing as what carry out judging above by white noise verification or replacing.
In a preferred embodiment, after judging that the current queries of user has ageing feature by above-mentioned approach, finer analysis can be carried out to determine ageing intensity and trend to time series data.
In a preferred embodiment, can judge search word have ageing after, Singularity detection is carried out to very first time series model, with find out characterize described ageing catastrophe point position.
When the distributional difference of the seasonal effect in time series data that data cannot be split by white noise verification and/or default time point is obvious.Can think active user inquire about probably have ageing.Thus can according to hypothesis before, think that the time series data of this ageing inquiry correspondence exists catastrophe point, by finding the change of the distribution of the sequence data before and after catastrophe point and analysis catastrophe point, us can be helped to analyze and judge ageing intensity and trend.
In a preferred embodiment, Differential Detection can be used to find catastrophe point.Particularly, can calculate whether there occurs violent change at the difference value of some time points.Here, the time series data x in precedent can still be used
1, x
2, x
3... x
n.When n gets the integer between 1 to 60, in the practice of nearly six months, search catastrophe point (being limited with some day).In a preferred embodiment, can only use the data of nearly 30 days to calculate.
In a preferred embodiment, 5 difference values that can be calculated as follows roughly can determine the position of seasonal effect in time series catastrophe point.
diff
1=x
k-x
k-1
diff
2=x
k-x
k-2
diff
3=x
k-x
k-3
diff
4=x
k+1-x
k-1
diff
5=x
k+2-x
k-1
When the absolute value of these 5 values is all greater than a certain specific threshold value, can infer that this value is exactly this seasonal effect in time series catastrophe point.That is, the now recurring structure that is distributed in of time series data suddenlys change.
It should be understood that and also can calculate more and less individual difference values (such as, 3,7) according to specific implementation, within this scope all contained in the principle of the invention.
After finding out catastrophe point, then can according to the k value found out, very first time sequence data is divided into 2 independent time series data S
1=x
1, x
2... x
k-1, S
2=x
k, x
k+1... x
n, and according to S
1and S
2data model judge ageing be in enhancing, decay or tend to be steady, and according to ageing be in enhancing, decaying or tending to be steady is used as Search Results and adjusts sequence foundation.
Generally, if the data mean value after catastrophe point is much larger than the data mean value before catastrophe point, then can think that the resource of respective queries there occurs outburst after the catastrophe point moment.If the difference after catastrophe point is just continuously, then can think that the event corresponding with search word or object are in lasting fermentation, if second order difference is continuously just so, more evidence suggests that it is ageing in continuous enhancing.If analyze difference to be continuously negative after certain point, then can think that the quantity of corresponding resource or the number of searches of user are in decline, what show that active user inquires about ageingly there occurs decay.Be more or less the same if positive and negative, then can think ageing and slowly return steadily.
As above composition graphs 1 describes searching method and preferred embodiment thereof.In device described below, corresponding units is identical with the function above with reference to Fig. 1 and the corresponding steps subsequently described by preferred embodiment respectively with the function of parts.In order to avoid repeating, the emphasis tracing device structure that can have and parts, then repeat no more for some details here, can with reference to corresponding description above.
Fig. 2 is a kind of according to an embodiment of the invention schematic block diagram of searcher 20.Searcher 20 can comprise receiving element 100, acquiring unit 200, time series analysis unit 300 and adjust sequence unit 400.
Receiving element 100 can receive the searching request of user to obtain search word.
Acquiring unit 200 can obtain the Search Results, Search Results quantity and the number of searches relevant to described search word that obtain based on described search word.
Time series analysis unit 300 can carry out time series analysis to the Search Results quantity in certain hour section and number of searches, to judge the ageing of described search word.
Adjust sequence unit 400 can have ageing in response to this search word of judgement, as tune sequence foundation, putting in order of described Search Results is adjusted using ageing.
In a preferred embodiment, Search Results quantity in the above-mentioned time period and number of searches can temporally divide at interval by time series analysis unit 300, generate very first time sequence data, white noise verification is carried out to very first time sequence data and judges the ageing feature of search word according to the result of white noise verification.The white noise verification undertaken by this time series analysis unit 300 can be identical with aforesaid concrete grammar, do not repeat them here.
In a preferred embodiment, this time series analysis unit 300 can also carry out the trend analysis giving tacit consent to catastrophe point place as mentioned above.
In a preferred embodiment, this time series analysis unit 300 can also carry out Singularity detection.Time series analysis unit 300 can carry out Singularity detection when it judges that search word has ageing to very first time series model, characterizes ageing catastrophe point position to find out.
In a preferred embodiment, Singularity detection also can be by analogously finding catastrophe point with the difference value method described for searching method above, and generates former and later two different temporal models to judge ageing trend, does not repeat them here.
In a preferred embodiment, adjust sequence unit 400 can according to ageing be in enhancing, decaying or tending to be steady is used as Search Results and adjusts sequence foundation.
Above composition graphs 2 describes the Implement of Function Module according to searching method of the present invention.Following hardware supported composition graphs 3 being described related device.
Fig. 3 is the hardware composition diagram of a kind of search server 30 according to an embodiment of the invention.This search server 30 can comprise processor 31, storer 32, receiving trap 33 and dispensing device 34.
Storer 32 can store the searching record of user to search word with the network information of search word association store.
Receiving trap 33 can receive the searching request of user.
Processor 31 is connected to storer 32, receiving trap 33 and dispensing device 34.Processor 31 can be processed the searching request received by receiving trap 33 to obtain search word, the Search Results, Search Results quantity and the number of searches relevant to described search word that obtain based on described search word can be obtained from storer 32, time series analysis is carried out to judge the ageing of described search word to the Search Results quantity in certain hour section and described number of searches, and have ageing in response to the described search word of judgement, putting in order of described Search Results is adjusted using ageing as tune sequence foundation.
Dispensing device 34 can send to the client device of user and have adjusted as tune sequence foundation the described Search Results put in order using ageing.
Search server 30 can be the same device characterizing hardware and functional module respectively with the searcher 20 of Fig. 2, also can be different device.They can realize the method described in Fig. 1 example and preferred embodiment thereof.
Above be described in detail with reference to the attached drawings according to searching method of the present invention and device.
In addition, can also be embodied as a kind of computer program according to method of the present invention, this computer program comprises the computer program code instruction for performing the above steps limited in said method of the present invention.Or, a kind of computer program can also be embodied as according to method of the present invention, this computer program comprises computer-readable medium, stores the computer program for performing the above-mentioned functions limited in said method of the present invention on the computer-readable medium.Those skilled in the art will also understand is that, may be implemented as electronic hardware, computer software or both combinations in conjunction with various illustrative logical blocks, module, circuit and the algorithm steps described by disclosure herein.
Process flow diagram in accompanying drawing and block diagram show the architectural framework in the cards of the system and method according to multiple embodiment of the present invention, function and operation.In this, each square frame in process flow diagram or block diagram can represent a part for module, program segment or a code, and a part for described module, program segment or code comprises one or more executable instruction for realizing the logic function specified.Also it should be noted that at some as in the realization of replacing, the function marked in square frame also can be different from occurring in sequence of marking in accompanying drawing.Such as, in fact two continuous print square frames can perform substantially concurrently, and they also can perform by contrary order sometimes, and this determines according to involved function.Also it should be noted that, the combination of the square frame in each square frame in block diagram and/or process flow diagram and block diagram and/or process flow diagram, can realize by the special hardware based system of the function put rules into practice or operation, or can realize with the combination of specialized hardware and computer instruction.
Be described above various embodiments of the present invention, above-mentioned explanation is exemplary, and non-exclusive, and be also not limited to disclosed each embodiment.When not departing from the scope and spirit of illustrated each embodiment, many modifications and changes are all apparent for those skilled in the art.The selection of term used herein, is intended to explain best the principle of each embodiment, practical application or the improvement to the technology in market, or makes other those of ordinary skill of the art can understand each embodiment disclosed herein.