CN106484671B - A kind of recognition methods of timeliness inquiry content - Google Patents
A kind of recognition methods of timeliness inquiry content Download PDFInfo
- Publication number
- CN106484671B CN106484671B CN201510526945.1A CN201510526945A CN106484671B CN 106484671 B CN106484671 B CN 106484671B CN 201510526945 A CN201510526945 A CN 201510526945A CN 106484671 B CN106484671 B CN 106484671B
- Authority
- CN
- China
- Prior art keywords
- timeliness
- inquiry
- content
- document
- period
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The present invention provides a kind of recognition methods of timeliness inquiry content, by establishing the index of timeliness document resources, the number that statistical query content occurs in the timeliness document resources and carrying out timeliness judgement to the inquiry content, and then identify that timeliness inquires content.Recognition methods proposed by the present invention can quickly and comprehensively identify that timeliness inquires content;It is lower to resource requirement, and is all suitable for common query and long-tail inquiry;Increase recall rate simultaneously;And it remains to identify to the timeliness inquiry in the outburst decline phase;The timeliness intensity that inquiry can be provided, realizing subsequent module can be according to its timeliness intensity using different strategies;It ensure that the accuracy and reliability of identification.
Description
Technical field
The present invention relates to inquiry content recognition fields, and in particular to a kind of recognition methods of timeliness inquiry content.
Background technique
In the big data era of current information explosion, search engine has become the indispensable hand that people obtain information
Section.Input inquiry obtains search result to user in a search engine, therefrom finds required information.In some cases,
User query have very strong timeliness, for example, user inputs " world cup " and mainly closes during Brazilian world cup in 2014
The relevant content of the Brazilian world cup of note, rather than the relevant information of previous session world cup.In this case, search engine is answered first
The judgement " world cup " had been the inquiry of timeliness type at that time, and newer correlated results is preferentially then showed user.
According to statistics, the inquiry accounting with timeliness demand is up to 30% or so.Therefore, the identification of timeliness inquiry is searched for for improving
Outcome quality has very important significance.
Existing timeliness inquiry identifying method is typically based in search engine inquiry log and gives inquiry two sections of front and back
The variation of queries in time illustrates to be timeliness inquiry if queries has apparent increase.Existing judgment method
Include:
(1) the increased inquiry quantity of surrounding time section
If the increased inquiry quantity of surrounding time section is greater than threshold value, then it is assumed that be timeliness inquiry.This method lacks
Point is insensitive for the inquiry of long-tail, for example queries becomes 200 from 100, and queries is double but difference only has
100。
(2) the variation ratio of the increased queries of surrounding time section
If the ratio of queries is more than certain threshold value in the increased queries of surrounding time section and first time period, recognize
For be timeliness inquiry.The shortcomings that this method is avoided that first method, but long-tail is inquired too sensitive.For example it looks into
Inquiry amount becomes 10 from 5, although double difference only has 5 to queries.
The angle of the trajectory line of (3) two periods
This method is that Chinese patent invention (patent No. CN201410211458.1) proposes, wherein setting second time period
For a part of first time period.This method thinks, if first period queries is slowly increased, and second period
Queries rapidly increases, then it is assumed that the inquiry is timeliness inquiry.
Existing method has following disadvantage:
(1) whether there is the trend of outburst based on search engine logs statistical query, search engine logs are more expensive
Resource, usually only several large-scale search engine producers just have, and this greatly limits the availabilities of method.
(2) it is normally based on entire query string statistical query amount, it is similar with the burst inquiry in search log in this way
But the inquiry not occurred integrally in search log does not just identify not come out, and reduces recall rate.For example, May 27 in 2015
Day or so, " Huang Xiaoming baby neck card " is popular search in search log, and still " baby neck card " may not be popular search,
If directly pressing entire query string statistics, " baby neck card " just identification is not come out.
(3) based on the method for variation tendency, in the upward period (trough to wave crest) of queries, identify that timeliness inquires ratio
It is easier to, but is easy to miss (in general, at this time inquiry still belongs to timeliness in the downward period of queries (getting off from wave crest)
Inquiry, because hot spot always has certain continuity).For example, for the method that patent No. CN201410211458.1 is proposed,
The increase of first time period queries is than faster, and second period is then slowly increased and even declines, and is not inconsistent
Close Rule of judgment.
Summary of the invention
In view of this, a kind of recognition methods of timeliness inquiry content provided by the invention, this method to resource requirement compared with
It is low, and common query and long-tail inquiry are all suitable for;Increase recall rate simultaneously;And it looks into the timeliness in the outburst decline phase
Inquiry remains to identify;The timeliness intensity that inquiry can be provided, realizing subsequent module can be according to its timeliness intensity using different
Strategy;It ensure that the comprehensive of identification, accuracy and reliability.
The purpose of the present invention is what is be achieved through the following technical solutions:
A kind of recognition methods of timeliness inquiry content, which comprises
Step 1. establishes the index of timeliness document resources;
The mean value of the number of number, appearance that step 2. statistical query content occurs in the timeliness document resources and
Variance index;
Step 3. carries out timeliness judgement to the inquiry content, and then identifies that timeliness inquires content.
Preferably, the timeliness document resources in the step 1 are the set of timeliness document;
The timeliness document is search engine inquiry log or news documents.
Preferably, the step 1, comprising:
1-1. adds the new timeliness document to the timeliness document resources in real time, at the same record every it is described when
Effect property document is added to the time of the timeliness document resources;
1-2. carries out Chinese word segmentation to the current timeliness document, obtains Chinese word segmentation result;
1-3. is according to the Chinese word segmentation as a result, the timeliness document to be added to the rope of timeliness document resources in real time
In drawing.
Preferably, the step 2, comprising:
2-1. carries out Chinese word segmentation to the inquiry content, obtains inquiry participle;
2-2. retrieves the timeliness document resources by the index, obtains including whole inquiry point
The timeliness document of word;
2-3. count it is described inquiry content occur in the timeliness document resources number, appearance number mean value
And variance index.
Preferably, the 2-3, comprising:
A. using present period as node cutting a cycle forward, wherein the period is divided into more with constant duration
A period;The total quantity of the period including 1 current period is N+1;
B. statistics includes each period T of the present periodiIn (- N≤i≤0), it is described inquiry content it is described when
The number C occurred in effect property documenti;
C. calculating not includes (- N≤i≤- 1) in the history cycle of present period, it is described inquiry content the timeliness
The mean value of the number occurred in property documentWith standard deviation SD:
Preferably, the step 3, comprising:
3-1. judges frequency of occurrence C of the inquiry content in the present period0Whether threshold value is greater than, wherein institute
Threshold value is stated to be determined according to the scale of resources bank;
If so, into 3-2;
If it is not, then identifying that the inquiry content is that non-timeliness inquires content;
3-2. judges frequency of occurrence C of the inquiry content in the present period0With the mean valueAnd standard deviation
Whether the relationship of SD meetsWherein, α is the empirical coefficient greater than 1;
If so, the identification inquiry content is timeliness inquiry content, and according to C0WithRatio, determine described in look into
Ask the timeliness intensity of content;
If it is not, then entering 3-3;
3-3. before present period and be located at the period in a gap periods in, counted in the inquiry respectively
Hold the frequency of occurrence C in each period in the gap periodsj, wherein there are the M periods in the gap periods,
And M < N ,-M≤j≤- 1;
3-4. judges to whether there is in the gap periods
If so, the identification inquiry content is that timeliness inquires content;And according to CjWithRatio, determine described in look into
Ask the timeliness intensity of content;
If it is not, then identifying that the inquiry content is that non-timeliness inquires content.
It can be seen from the above technical scheme that leading to the present invention provides a kind of recognition methods of timeliness inquiry content
Cross the index for establishing timeliness document resources, the number that statistical query content occurs in the timeliness document resources and to institute
It states inquiry content and carries out timeliness judgement, and then identify that timeliness inquires content.Recognition methods proposed by the present invention, can be fast
Speed and comprehensively identify timeliness inquire content;It is lower to resource requirement, and is all suitable for common query and long-tail inquiry;
Increase recall rate simultaneously;And it remains to identify to the timeliness inquiry in the outburst decline phase;The timeliness that inquiry can be provided is strong
Degree, realizing subsequent module can be according to its timeliness intensity using different strategies;It ensure that the accuracy of identification and reliable
Property.
Compared with the latest prior art, technical solution provided by the invention has following excellent effect:
1, in technical solution provided by the present invention, by index, the statistical query content of establishing timeliness document resources
The number that occurs in the timeliness document resources and timeliness judgement is carried out to the inquiry content, and then identifies timeliness
Property inquiry content.Recognition methods proposed by the present invention can quickly and comprehensively identify that timeliness inquires content;It is to resource
It is required that it is lower, and common query and long-tail inquiry are all suitable for;Increase recall rate simultaneously;And to in outburst the decline phase when
The inquiry of effect property remains to identify;The timeliness intensity that inquiry can be provided, realizing subsequent module can adopt according to its timeliness intensity
With different strategies;It ensure that the accuracy and reliability of identification.
2, technical solution provided by the present invention can be search engine logs to the of less demanding of resource, be also possible to
News documents set, the latter are easier to obtain than the former.
3, technical solution provided by the present invention, the method based on retrieval remove the frequency of occurrence of statistical query, rather than whole
String statistics, can increase recall rate.
4, technical solution provided by the present invention, it is insensitive to the absolute queries of inquiry, common query and long-tail are looked into
Inquiry is all suitable for.
5, technical solution provided by the present invention remains to identify to the timeliness inquiry in the outburst decline phase.
6, technical solution provided by the present invention can provide the timeliness intensity of inquiry, facilitate subsequent module according at that time
Effect property intensity uses different strategies.
7, technical solution provided by the invention, is widely used, and has significant Social benefit and economic benefit.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the recognition methods of timeliness inquiry content of the invention;
Fig. 2 is the flow diagram of the step 1 of recognition methods of the invention;
Fig. 3 is the flow diagram of the step 2 of recognition methods of the invention;
Fig. 4 is the flow diagram of the step 3 of recognition methods of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
The embodiment of the present invention, every other reality obtained by those of ordinary skill in the art without making creative efforts
Example is applied, shall fall within the protection scope of the present invention.
As shown in Figure 1, the present invention provides a kind of recognition methods of timeliness inquiry content, comprising:
Step 1. establishes the index of timeliness document resources;
The mean and variance of the number of number, appearance that step 2. statistical query content occurs in timeliness document resources
Index;
Step 3. pair inquires content and carries out timeliness judgement, and then identifies that timeliness inquires content.
Preferably, the timeliness document resources in step 1 are the set of timeliness document;
Timeliness document is search engine inquiry log or news documents.
As shown in Fig. 2, step 1, comprising:
1-1. adds new timeliness document to timeliness document resources in real time, while recording every timeliness document addition
To the time of timeliness document resources;
1-2. carries out Chinese word segmentation to current timeliness document, obtains Chinese word segmentation result;
1-3. is according to Chinese word segmentation as a result, being in real time added to timeliness document in the index of timeliness document resources.
As shown in figure 3, step 2, comprising:
2-1. carries out Chinese word segmentation to inquiry content, obtains inquiry participle;
2-2. retrieves timeliness document resources by index, obtains including whole timeliness texts for inquiring participle
Shelves;
The number that 2-3. statistical query content occurs in timeliness document.
2-3, comprising:
A. using present period as node cutting a cycle forward, wherein when the period is divided into multiple with constant duration
Section;The total quantity of period including present period is N+1;Wherein, the period is according to application demand, using hour or day as rank;
B. statistics includes each period T of present periodiIn (- N≤i≤0), inquiry content timeliness document resources
The number C of middle appearancei;
C. calculating not includes (- N≤i≤- 1) in the history cycle of present period, inquiry content timeliness document money
The mean value of the number occurred in sourceWith standard deviation SD:
As shown in figure 4, step 3, comprising:
Frequency of occurrence C of the 3-1. judgement inquiry content in present period0Whether threshold value is greater than, wherein threshold value is according to money
The scale in source library determines;Such as 10,20,50 etc., it avoids the occurrence of the very little inquiry of number and is misidentified;
If so, into 3-2;
If it is not, being that non-timeliness inquires content by inquiry content recognition then because the frequency of occurrence of inquiry content is very few;
Frequency of occurrence C of the 3-2. judgement inquiry content in present period0And mean valueAnd the relationship of standard deviation SD whether
MeetWherein, α is empirical coefficient greater than 1, such as 1.5,2,2.5 etc.;
If so, identification inquiry content is timeliness inquiry content, and according to C0WithRatio, determine inquiry content
Timeliness intensity;Frequency of occurrence in the condition stub current period is much higher than the average frequency, is mainly used to identification and just breaks out
Timeliness inquiry;
If it is not, then entering 3-3;
3-3. is previous in present period and in the gap periods in the period, and statistical query content is being spaced respectively
The frequency of occurrence C in each period in periodj, wherein there are M period, and M < N ,-M≤j≤- 1 in gap periods;
Such as: period overall length is 1 month before present period, and 1 period is 1 day;Then N=30 days;And gap periods are
A period of time within 1 month before present period, this time include M=10 period, i.e., gap periods are 10 days;
3-4. judges to whether there is in gap periods
If so, identification inquiry content is that timeliness inquires content;And according to CjWithRatio, determine inquiry content
Timeliness intensity;Inquiry had been broken out in the condition stub gap periods, is inquired usually certain continuity in view of timeliness, is recognized
For in present period still in aged;
If it is not, being then that non-timeliness inquires content by inquiry content recognition.
The above embodiments are merely illustrative of the technical scheme of the present invention and are not intended to be limiting thereof, although referring to above-described embodiment pair
The present invention is described in detail, those of ordinary skill in the art still can to a specific embodiment of the invention into
Row modifies perhaps equivalent replacement and these exist without departing from any modification of spirit and scope of the invention or equivalent replacement
Apply within pending claims of the invention.
Claims (1)
1. a kind of recognition methods of timeliness inquiry content, which is characterized in that the described method includes:
Step 1. establishes the index of timeliness document resources;
The mean and variance of the number of number, appearance that step 2. statistical query content occurs in the timeliness document resources
Index;
Step 3. carries out timeliness judgement to the inquiry content, and then identifies that timeliness inquires content;
The timeliness document resources in the step 1 are the set of timeliness document;
The timeliness document is search engine inquiry log or news documents;
The step 1, comprising:
1-1. adds the new timeliness document to the timeliness document resources in real time, while recording every timeliness
Document is added to the time of the timeliness document resources;
1-2. carries out Chinese word segmentation to the current timeliness document, obtains Chinese word segmentation result;
1-3. is according to the Chinese word segmentation as a result, the timeliness document to be added to the index of timeliness document resources in real time
In;
The step 2, comprising:
2-1. carries out Chinese word segmentation to the inquiry content, obtains inquiry participle;
2-2. retrieves the timeliness document resources by the index, obtains including that whole inquiries segments
The timeliness document;
2-3. count it is described inquiry content occur in the timeliness document resources number, appearance number mean value and side
Poor index;
The 2-3, comprising:
A. using present period as node cutting a cycle forward, wherein when the period is divided into multiple with constant duration
Section;The total quantity of the period including 1 current period is N+1;
B. statistics includes each period T of the present periodi, in-N≤i≤0, the inquiry content timeliness text
The number C occurred in shelvesi;
C. calculating not includes-N≤i≤- 1 in the history cycle of present period, it is described inquiry content the timeliness document
The mean value of the number of middle appearanceWith standard deviation SD:
The step 3, comprising:
3-1. judges frequency of occurrence C of the inquiry content in the present period0Whether threshold value is greater than, wherein the threshold value
It is determined according to the scale of resources bank;
If so, into 3-2;
If it is not, then identifying that the inquiry content is that non-timeliness inquires content;
3-2. judges frequency of occurrence C of the inquiry content in the present period0With the mean valueAnd the pass of standard deviation SD
Whether system meetsWherein, α is the empirical coefficient greater than 1;
If so, the identification inquiry content is timeliness inquiry content, and according to C0WithRatio, determine in the inquiry
The timeliness intensity of appearance;
If it is not, then entering 3-3;
3-3. before present period and be located at the period in a gap periods in, count the inquiry content respectively and exist
The frequency of occurrence C in each period in the gap periodsj, wherein there are M periods in the gap periods, and M <
N,-M≤j≤-1;
3-4. judges to whether there is in the gap periods
If so, the identification inquiry content is that timeliness inquires content;And according to CjWithRatio, determine in the inquiry
The timeliness intensity of appearance;
If it is not, then identifying that the inquiry content is that non-timeliness inquires content.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510526945.1A CN106484671B (en) | 2015-08-25 | 2015-08-25 | A kind of recognition methods of timeliness inquiry content |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510526945.1A CN106484671B (en) | 2015-08-25 | 2015-08-25 | A kind of recognition methods of timeliness inquiry content |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106484671A CN106484671A (en) | 2017-03-08 |
CN106484671B true CN106484671B (en) | 2019-05-28 |
Family
ID=58233171
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510526945.1A Active CN106484671B (en) | 2015-08-25 | 2015-08-25 | A kind of recognition methods of timeliness inquiry content |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106484671B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106933809A (en) * | 2017-03-27 | 2017-07-07 | 三角兽(北京)科技有限公司 | Information processor and information processing method |
CN107180093B (en) * | 2017-05-15 | 2020-05-19 | 北京奇艺世纪科技有限公司 | Information searching method and device and timeliness query word identification method and device |
CN111324805B (en) * | 2018-12-13 | 2024-02-13 | 北京搜狗科技发展有限公司 | Query intention determining method and device, searching method and searching engine |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101609445A (en) * | 2009-07-16 | 2009-12-23 | 复旦大学 | Crucial sub-method for extracting topic based on temporal information |
CN101645066A (en) * | 2008-08-05 | 2010-02-10 | 北京大学 | Method for monitoring novel words on Internet |
CN103049443A (en) * | 2011-10-12 | 2013-04-17 | 腾讯科技(深圳)有限公司 | Method and device for mining hot-spot words |
CN103942265A (en) * | 2014-03-26 | 2014-07-23 | 北京奇虎科技有限公司 | Method and device for pushing webpages containing news information |
-
2015
- 2015-08-25 CN CN201510526945.1A patent/CN106484671B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101645066A (en) * | 2008-08-05 | 2010-02-10 | 北京大学 | Method for monitoring novel words on Internet |
CN101609445A (en) * | 2009-07-16 | 2009-12-23 | 复旦大学 | Crucial sub-method for extracting topic based on temporal information |
CN103049443A (en) * | 2011-10-12 | 2013-04-17 | 腾讯科技(深圳)有限公司 | Method and device for mining hot-spot words |
CN103942265A (en) * | 2014-03-26 | 2014-07-23 | 北京奇虎科技有限公司 | Method and device for pushing webpages containing news information |
Also Published As
Publication number | Publication date |
---|---|
CN106484671A (en) | 2017-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11003726B2 (en) | Method, apparatus, and system for recommending real-time information | |
US10810499B2 (en) | Method and apparatus for recommending social media information | |
US10423648B2 (en) | Method, system, and computer readable medium for interest tag recommendation | |
CN105765573B (en) | Improvements in website traffic optimization | |
US9619564B2 (en) | Method and system for providing recommended terms | |
TW201541267A (en) | Method and device of selecting promotion keywords | |
WO2015196793A1 (en) | Hotspot information analysis method and device and computer storage medium | |
CN106250513A (en) | A kind of event personalization sorting technique based on event modeling and system | |
CN103336766A (en) | Short text garbage identification and modeling method and device | |
TW201428513A (en) | Query word fusion method, commodity information release method and search method and system | |
CN104298719A (en) | Method and system for conducting user category classification and advertisement putting based on social behavior | |
CN106484671B (en) | A kind of recognition methods of timeliness inquiry content | |
CN104424308A (en) | Web page classification standard acquisition method and device and web page classification method and device | |
CN102193936A (en) | Data classification method and device | |
CN105654201B (en) | Advertisement traffic prediction method and device | |
CN105068991A (en) | Big data based public sentiment discovery method | |
WO2017012222A1 (en) | Time-sensitivity processing requirement identification method, device, apparatus and non-volatile computer storage medium | |
CN104317784A (en) | Cross-platform user identification method and cross-platform user identification system | |
CN104933191A (en) | Spam comment recognition method and system based on Bayesian algorithm and terminal | |
CN107609192A (en) | The supplement searching method and device of a kind of search engine | |
US20140250116A1 (en) | Identifying time sensitive ambiguous queries | |
CN107766446A (en) | Method for pushing, device, storage medium and the processor of information | |
CN105183765A (en) | Big data-based topic extraction method | |
CN103279483B (en) | A kind of topic Epidemic Scope appraisal procedure towards micro-blog and system | |
CN103530796A (en) | Active period detection method and active period detection system of application program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20170426 Address after: 100086 Beijing, Haidian District, North Third Ring Road West, No. 43, building 5, floor 08-09, No. 2 Applicant after: BEIJING ZHONGSOU CLOUD BUSINESS NETWORK TECHNOLOGY CO., LTD. Address before: Shou Heng Technology Building No. 51 Beijing 100191 Haidian District Xueyuan Road room 0902 Applicant before: Beijing Zhongsou Network Technology Co,Ltd |
|
GR01 | Patent grant | ||
GR01 | Patent grant |