CN106484671A - A kind of recognition methodss of ageing inquiry content - Google Patents

A kind of recognition methodss of ageing inquiry content Download PDF

Info

Publication number
CN106484671A
CN106484671A CN201510526945.1A CN201510526945A CN106484671A CN 106484671 A CN106484671 A CN 106484671A CN 201510526945 A CN201510526945 A CN 201510526945A CN 106484671 A CN106484671 A CN 106484671A
Authority
CN
China
Prior art keywords
ageing
inquiry
content
document
inquiry content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510526945.1A
Other languages
Chinese (zh)
Other versions
CN106484671B (en
Inventor
吴尉林
许欢庆
郭永福
陈沛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongsou Cloud Business Network Technology Co ltd
Original Assignee
Beijing Zhongsou Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongsou Network Technology Co ltd filed Critical Beijing Zhongsou Network Technology Co ltd
Priority to CN201510526945.1A priority Critical patent/CN106484671B/en
Publication of CN106484671A publication Critical patent/CN106484671A/en
Application granted granted Critical
Publication of CN106484671B publication Critical patent/CN106484671B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of recognition methodss of ageing inquiry content, by setting up the index of ageing document resources, number of times that statistical query content occurs in described ageing document resources and ageing judgement is carried out to described inquiry content, and then identify ageing inquiry content.Recognition methodss proposed by the present invention, can quickly and comprehensively identify ageing inquiry content;It is relatively low to resource requirement, and common query and long-tail inquiry are all suitable for;Increase recall rate simultaneously;And the ageing inquiry being in the outburst decline phase is remained to identify;The ageing intensity of inquiry can be provided it is achieved that subsequent module can adopt different strategies according to its ageing intensity;Ensure that accuracy and the reliability of identification.

Description

A kind of recognition methodss of ageing inquiry content
Technical field
The present invention relates to inquiry content recognition field and in particular to a kind of ageing inquiry content recognition methodss.
Background technology
In the big data epoch of current information explosion, search engine has become as the indispensable handss that people obtain information Section.Input inquiry, to obtain Search Results, therefrom finds required information to user in a search engine.In some situations Under, user's inquiry with very strong ageing, for example, during Brazilian world cup in 2014, the user input " world The related content of Brazilian world cup mainly paid close attention to by cup ", rather than the information that previous session world cup is related.In this case, Search engine should judge " world cup " first in the inquiry being ageing type at that time, then preferentially relatively newer phase Close result and show user.According to statistics, there is the inquiry accounting up to 30% about of ageing demand.Therefore, ageing The identification of inquiry has very important significance for improving Search Results quality.
Existing ageing inquiry identifying method, is typically based in search engine inquiry daily record given inquiry in two sections of in front and back The change of interior queries, if queries has obvious increase, illustrates it is ageing inquiry.Existing judgement side Method includes:
(1) the inquiry quantity that surrounding time section increases
If the inquiry quantity that surrounding time section increases is more than threshold value then it is assumed that being ageing inquiry.The shortcoming of this method Inquiry for long-tail is insensitive, such as queries is changed into 200 from 100, and queries is double but difference only has 100.
(2) the change ratio of the queries that surrounding time section increases
If surrounding time section increase queries and first time period in queries ratio exceed certain threshold value then it is assumed that It is ageing inquiry.This method is avoided that the shortcoming of first method, but excessively sensitive for long-tail inquiry.Such as Queries is changed into 10 from 5, although double difference only has 5 to queries.
The angle of the trajectory of (3) two time periods
The method is that Chinese patent invention (patent No. CN201410211458.1) proposes, and wherein sets for the second time Section is a part for first time period.The method thinks, if first time period queries is slowly increased, and second Time period queries rapidly increases then it is assumed that this inquiry is ageing inquiry.
Existing method has following shortcoming:
(1) whether there is based on search engine logs statistical query the trend of outburst, search engine logs are costly Resource, generally only several large-scale search engine producers just have, and this greatly limits the availability of method.
(2) it is normally based on whole query string statistical query amount, so similar with the explosive inquiry in search daily record But not in search daily record, the overall inquiry occurring just can identify out, reduces recall rate.For example, in 2015 On May 27 about, " Huang dawn bright baby neck card " is popular search in search daily record, but " baby leads card " can Can not be popular search, if directly by whole query string statistics, " baby neck card " just identifies not out.
(3) method based on variation tendency, in the upward period (trough is to crest) of queries, identifies ageing inquiry It is easier, but the downward period (getting off from crest) in queries is easily missed and (generally, at this time inquired about and also belong to In ageing inquiry, because focus always has certain continuity).For example, for patent No. CN201410211458.1 The method proposing, the increase of first time period queries be ratio faster, and second time period is then to be slowly increased very To being to decline, do not meet Rule of judgment.
Content of the invention
In view of this, the recognition methodss of a kind of ageing inquiry content that the present invention provides, the method is relatively low to resource requirement, And common query and long-tail inquiry are all suitable for;Increase recall rate simultaneously;And break out the ageing of decline phase to being in Inquiry remains to identify;The ageing intensity of inquiry can be provided it is achieved that subsequent module can adopt according to its ageing intensity Different strategies;Ensure that comprehensive, accuracy and the reliability of identification.
The purpose of the present invention is achieved through the following technical solutions:
A kind of recognition methodss of ageing inquiry content, methods described includes:
Step 1. sets up the index of ageing document resources;
Number of times that step 2. statistical query content occurs in described ageing document resources, the average of number of times occurring and Variance index;
Step 3. carries out ageing judgement to described inquiry content, and then identifies ageing inquiry content.
Preferably, the described ageing document resources in described step 1 are the set of ageing document;
Described ageing document is search engine inquiry daily record or news documents.
Preferably, described step 1, including:
1-1. adds new described ageing document in real time to described ageing document resources, records every described timeliness simultaneously Property document added to the time of described ageing document resources;
1-2. carries out Chinese word segmentation to current described ageing document, obtains Chinese word segmentation result;
Described ageing document, according to described Chinese word segmentation result, is added to the index of ageing document resources by 1-3. in real time In.
Preferably, described step 2, including:
2-1. carries out Chinese word segmentation to described inquiry content, obtains inquiring about participle;
2-2. enters line retrieval by described index to described ageing document resources, obtains including whole described inquiry participles Described ageing document;
2-3. statistics described inquiry content occurs in described ageing document resources number of times, appearance the average of number of times and Variance index.
Preferably, described 2-3, including:
A. with present period for node cutting a cycle forward, wherein, the described cycle is divided into multiple with constant duration Period;Total quantity including the described period of described 1 current period is N+1;
B. statistics includes each period T of described present periodiIn (- N≤i≤0), described inquiry content described in The number of times C occurring in ageing documenti
C. calculate and do not include in the history cycle of present period (- N≤i≤- 1), described inquiry content described timeliness Property document in occur number of times averageWith standard deviation SD:
Preferably, described step 3, including:
3-1. judges described inquiry occurrence number C in described present period for the content0Whether it is more than threshold value, wherein, described Threshold value determines according to the scale of resources bank;
If so, then enter 3-2;
If it is not, then identifying that described inquiry content inquires about content for Non-ageing;
3-2. judges described inquiry occurrence number C in described present period for the content0With described averageAnd standard deviation SD Relation whether meetWherein, α is the empirical coefficient more than 1;
If so, then identify that described inquiry content is ageing inquiry content, and according to C0WithRatio, look into described in determination Ask the ageing intensity of content;
If it is not, then entering 3-3;
3-3. before present period and be located in one of described cycle gap periods, counts described inquiry content respectively Occurrence number C within each period in described gap periodsj, wherein, when having M individual described in described gap periods Section, and M<N、-M≤j≤-1;
3-4. judges to whether there is in described gap periods
If so, then identify that described inquiry content is ageing inquiry content;And according to CjWithRatio, look into described in determination Ask the ageing intensity of content;
If it is not, then identifying that described inquiry content inquires about content for Non-ageing.
It can be seen from above-mentioned technical scheme that, the invention provides a kind of recognition methodss of ageing inquiry content, by setting up Number of times that the index of ageing document resources, statistical query content occur in described ageing document resources and looking into described Ask content and carry out ageing judgement, and then identify ageing inquiry content.Recognition methodss proposed by the present invention, can be fast Speed and comprehensively identify ageing inquiry content;It is relatively low to resource requirement, and common query and long-tail is inquired about all suitable With;Increase recall rate simultaneously;And remain to identify to the ageing inquiry being in the outburst decline phase;Inquiry can be given Ageing intensity is it is achieved that subsequent module can adopt different strategies according to its ageing intensity;Ensure that the standard of identification Really property and reliability.
With immediate prior art ratio, the present invention provide technical scheme there is following excellent effect:
1st, in technical scheme provided by the present invention, by setting up the index of ageing document resources, statistical query content in institute State the number of times occurring in ageing document resources and ageing judgement is carried out to described inquiry content, and then identify ageing Inquiry content.Recognition methodss proposed by the present invention, can quickly and comprehensively identify ageing inquiry content;It is to money Source requirement is relatively low, and common query and long-tail inquiry are all suitable for;Increase recall rate simultaneously;And to being under outburst The ageing inquiry of fall phase remains to identify;The ageing intensity of inquiry can be provided it is achieved that subsequent module can be according at that time Effect property intensity adopts different strategies;Ensure that accuracy and the reliability of identification.
2nd, technical scheme provided by the present invention, less demanding to resource, can be search engine logs it is also possible to It is news documents set, the latter is easier to obtain than the former.
3rd, technical scheme provided by the present invention, removes the frequency of occurrence of statistical query based on the method for retrieval, rather than whole String statistics, can increase recall rate.
4th, technical scheme provided by the present invention, insensitive to the absolute queries of inquiry, common query and long-tail are looked into Inquiry is all suitable for.
5th, technical scheme provided by the present invention, remains to identify to the ageing inquiry being in the outburst decline phase.
6th, technical scheme provided by the present invention, can provide the ageing intensity of inquiry, facilitate subsequent module according at that time Effect property intensity adopts different strategies.
7th, the technical scheme that the present invention provides, is widely used, has significant Social benefit and economic benefit.
Brief description
Fig. 1 is the schematic flow sheet of the recognition methodss of a kind of ageing inquiry content of the present invention;
Fig. 2 is the schematic flow sheet of the step 1 of recognition methodss of the present invention;
Fig. 3 is the schematic flow sheet of the step 2 of recognition methodss of the present invention;
Fig. 4 is the schematic flow sheet of the step 3 of recognition methodss of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Ground description is it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.Base In embodiments of the invention, those of ordinary skill in the art obtained under the premise of not making creative work all its His embodiment, broadly falls into the scope of protection of the invention.
As shown in figure 1, the present invention provides a kind of recognition methodss of ageing inquiry content, including:
Step 1. sets up the index of ageing document resources;
Number of times, the average of number of times occurring and variance that step 2. statistical query content occurs in ageing document resources Index;
Step 3. carries out ageing judgement to inquiry content, and then identifies ageing inquiry content.
Preferably, the ageing document resources in step 1 are the set of ageing document;
Ageing document is search engine inquiry daily record or news documents.
As shown in Fig. 2 step 1, including:
1-1. adds in real time new ageing document to ageing document resources, record simultaneously every ageing document add to The time of ageing document resources;
1-2. carries out Chinese word segmentation to current ageing document, obtains Chinese word segmentation result;
Ageing document, according to Chinese word segmentation result, is added in the index of ageing document resources by 1-3. in real time.
As shown in figure 3, step 2, including:
2-1. carries out Chinese word segmentation to inquiry content, obtains inquiring about participle;
2-2. enters line retrieval by index to ageing document resources, obtains the ageing literary composition including whole inquiry participles Shelves;
The number of times that 2-3. statistical query content occurs in ageing document.
2-3, including:
A. with present period for node cutting a cycle forward, wherein, the cycle is divided into multiple periods with constant duration; Total quantity including the period of present period is N+1;Wherein, the period is according to application demand, with hour or sky as rank;
B. statistics includes each period T of present periodiIn (- N≤i≤0), inquiry content ageing document money The number of times C occurring in sourcei
C. calculate and do not include in the history cycle of present period (- N≤i≤- 1), inquiry content ageing document money The average of the number of times occurring in sourceWith standard deviation SD:
As shown in figure 4, step 3, including:
3-1. judges inquiry occurrence number C in present period for the content0Whether it is more than threshold value, wherein, threshold value is according to resource The scale in storehouse determines;Such as 10,20,50 etc., it is to avoid occurrence number inquiry very little is misidentified;
If so, then enter 3-2;
If it is not, then very few because inquiring about the occurrence number of content, content recognition will be inquired about and inquire about content for Non-ageing;
3-2. judges inquiry occurrence number C in present period for the content0And averageAnd whether the relation of standard deviation SD is full FootWherein, α is the empirical coefficient more than 1, such as 1.5,2,2.5 etc.;
If so, then identification inquiry content is ageing inquiry content, and according to C0WithRatio, determine inquiry content Ageing intensity;Frequency of occurrence in this condition stub current period is far above the average frequency, is mainly used to identification just quick-fried The ageing inquiry sent out;
If it is not, then entering 3-3;
3-3. is previous and be located in the gap periods in the cycle in present period, and respectively statistical query content is every other week Occurrence number C in each period in phasej, wherein, in gap periods, there is M period, and M<N、-M≤j≤-1;
For example:Cycle overall length is 1 month before present period, and 1 period is 1 day;Then N=30 days;And every other week Phase is a period of time within 1 month before present period, and this time includes M=10 period, i.e. gap periods For 10 days;
3-4. judges to whether there is in gap periods
If so, then identification inquiry content is ageing inquiry content;And according to CjWithRatio, determine inquiry content Ageing intensity;Inquiry in this condition stub gap periods had broken out, in view of the generally certain continuity of ageing inquiry, Think in present period still in aged;
If it is not, content recognition then will be inquired about inquire about content for Non-ageing.
Above example is only not intended to limit in order to technical scheme to be described, although with reference to above-described embodiment to this Invention has been described in detail, and those of ordinary skill in the art still can enter to the specific embodiment of the present invention Row modification or equivalent, and these are without departing from any modification of spirit and scope of the invention or equivalent, it is equal Within the claims applying for the pending present invention.

Claims (6)

1. a kind of recognition methodss of ageing inquiry content are it is characterised in that methods described includes:
Step 1. sets up the index of ageing document resources;
Number of times that step 2. statistical query content occurs in described ageing document resources, the average of number of times occurring and Variance index;
Step 3. carries out ageing judgement to described inquiry content, and then identifies ageing inquiry content.
2. the method for claim 1 is it is characterised in that described ageing document resources in described step 1 Set for ageing document;
Described ageing document is search engine inquiry daily record or news documents.
3. method as claimed in claim 2 is it is characterised in that described step 1, including:
1-1. adds new described ageing document in real time to described ageing document resources, records every described timeliness simultaneously Property document added to the time of described ageing document resources;
1-2. carries out Chinese word segmentation to current described ageing document, obtains Chinese word segmentation result;
Described ageing document, according to described Chinese word segmentation result, is added to the index of ageing document resources by 1-3. in real time In.
4. method as claimed in claim 3 is it is characterised in that described step 2, including:
2-1. carries out Chinese word segmentation to described inquiry content, obtains inquiring about participle;
2-2. enters line retrieval by described index to described ageing document resources, obtains including whole described inquiry participles Described ageing document;
2-3. statistics described inquiry content occurs in described ageing document resources number of times, appearance the average of number of times and Variance index.
5. method as claimed in claim 4 is it is characterised in that described 2-3, including:
A. with present period for node cutting a cycle forward, wherein, the described cycle is divided into multiple with constant duration Period;Total quantity including the described period of described 1 current period is N+1;
B. statistics includes each period T of described present periodiIn (- N≤i≤0), described inquiry content described in The number of times C occurring in ageing documenti
C. calculate and do not include in the history cycle of present period (- N≤i≤- 1), described inquiry content described timeliness Property document in occur number of times averageWith standard deviation SD:
C &OverBar; = &Sigma; i = - N - 1 C i ;
S D = 1 N &Sigma; i = - N - 1 ( C i - C &OverBar; ) 2 .
6. method as claimed in claim 5 is it is characterised in that described step 3, including:
3-1. judges described inquiry occurrence number C in described present period for the content0Whether it is more than threshold value, wherein, described Threshold value determines according to the scale of resources bank;
If so, then enter 3-2;
If it is not, then identifying that described inquiry content inquires about content for Non-ageing;
3-2. judges described inquiry occurrence number C in described present period for the content0With described averageAnd standard deviation SD Relation whether meetWherein, α is the empirical coefficient more than 1;
If so, then identify that described inquiry content is ageing inquiry content, and according to C0WithRatio, look into described in determination Ask the ageing intensity of content;
If it is not, then entering 3-3;
3-3. before present period and be located in one of described cycle gap periods, counts described inquiry content respectively Occurrence number C within each period in described gap periodsj, wherein, when having M individual described in described gap periods Section, and M<N、-M≤j≤-1;
3-4. judges to whether there is in described gap periods
If so, then identify that described inquiry content is ageing inquiry content;And according to CjWithRatio, look into described in determination Ask the ageing intensity of content;
If it is not, then identifying that described inquiry content inquires about content for Non-ageing.
CN201510526945.1A 2015-08-25 2015-08-25 A kind of recognition methods of timeliness inquiry content Expired - Fee Related CN106484671B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510526945.1A CN106484671B (en) 2015-08-25 2015-08-25 A kind of recognition methods of timeliness inquiry content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510526945.1A CN106484671B (en) 2015-08-25 2015-08-25 A kind of recognition methods of timeliness inquiry content

Publications (2)

Publication Number Publication Date
CN106484671A true CN106484671A (en) 2017-03-08
CN106484671B CN106484671B (en) 2019-05-28

Family

ID=58233171

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510526945.1A Expired - Fee Related CN106484671B (en) 2015-08-25 2015-08-25 A kind of recognition methods of timeliness inquiry content

Country Status (1)

Country Link
CN (1) CN106484671B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107180093A (en) * 2017-05-15 2017-09-19 北京奇艺世纪科技有限公司 Information search method and device and ageing inquiry word recognition method and device
CN109885251A (en) * 2017-03-27 2019-06-14 三角兽(北京)科技有限公司 Information processing unit, information processing method and storage medium
CN111324805A (en) * 2018-12-13 2020-06-23 北京搜狗科技发展有限公司 Query intention determining method and device, searching method and searching engine
CN113010817A (en) * 2019-12-18 2021-06-22 腾讯科技(深圳)有限公司 Method and device for adjusting validity period of content, server and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609445A (en) * 2009-07-16 2009-12-23 复旦大学 Crucial sub-method for extracting topic based on temporal information
CN101645066A (en) * 2008-08-05 2010-02-10 北京大学 Method for monitoring novel words on Internet
CN103049443A (en) * 2011-10-12 2013-04-17 腾讯科技(深圳)有限公司 Method and device for mining hot-spot words
CN103942265A (en) * 2014-03-26 2014-07-23 北京奇虎科技有限公司 Method and device for pushing webpages containing news information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101645066A (en) * 2008-08-05 2010-02-10 北京大学 Method for monitoring novel words on Internet
CN101609445A (en) * 2009-07-16 2009-12-23 复旦大学 Crucial sub-method for extracting topic based on temporal information
CN103049443A (en) * 2011-10-12 2013-04-17 腾讯科技(深圳)有限公司 Method and device for mining hot-spot words
CN103942265A (en) * 2014-03-26 2014-07-23 北京奇虎科技有限公司 Method and device for pushing webpages containing news information

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885251A (en) * 2017-03-27 2019-06-14 三角兽(北京)科技有限公司 Information processing unit, information processing method and storage medium
CN107180093A (en) * 2017-05-15 2017-09-19 北京奇艺世纪科技有限公司 Information search method and device and ageing inquiry word recognition method and device
CN111324805A (en) * 2018-12-13 2020-06-23 北京搜狗科技发展有限公司 Query intention determining method and device, searching method and searching engine
CN111324805B (en) * 2018-12-13 2024-02-13 北京搜狗科技发展有限公司 Query intention determining method and device, searching method and searching engine
CN113010817A (en) * 2019-12-18 2021-06-22 腾讯科技(深圳)有限公司 Method and device for adjusting validity period of content, server and storage medium
CN113010817B (en) * 2019-12-18 2024-05-10 深圳市雅阅科技有限公司 Content validity period adjusting method, device, server and storage medium

Also Published As

Publication number Publication date
CN106484671B (en) 2019-05-28

Similar Documents

Publication Publication Date Title
CN105183897B (en) A kind of method and system of video search sequence
CN104424291B (en) The method and device that a kind of pair of search result is ranked up
CN106484671A (en) A kind of recognition methodss of ageing inquiry content
US20190332602A1 (en) Method of data query based on evaluation and device
CN105426514A (en) Personalized mobile APP recommendation method
CN103020845A (en) Mobile application pushing method and system
CN104424308A (en) Web page classification standard acquisition method and device and web page classification method and device
CN103390027A (en) Internet advertisement anti-spamming method and system
CN102542474A (en) Method for sorting inquiry results and device
US8868570B1 (en) Selection and display of online content items
CN103309894B (en) Based on search implementation method and the system of user property
CN104636504A (en) Method and system for identifying sexuality of user
CN104021140B (en) A kind of processing method and processing device of Internet video
CN103324745A (en) Text garbage identifying method and system based on Bayesian model
US9245035B2 (en) Information processing system, information processing method, program, and non-transitory information storage medium
CN104408210B (en) Based on the video recommendation method of leader of opinion
CN103377249A (en) Keyword putting method and system
CN101770482A (en) Method and system for issuing advertisements
CN104933191A (en) Spam comment recognition method and system based on Bayesian algorithm and terminal
CN105373600A (en) Method and device for sorting video playlists
CN107277115A (en) A kind of content delivery method and device
CN103761228A (en) Ranking threshold determination method and ranking threshold determination system for application program
CN111028087A (en) Information display method, device and equipment
Cook et al. Your two weeks of fame and your grandmother's
CN103020141A (en) Method and equipment for providing searching results

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20170426

Address after: 100086 Beijing, Haidian District, North Third Ring Road West, No. 43, building 5, floor 08-09, No. 2

Applicant after: BEIJING ZHONGSOU CLOUD BUSINESS NETWORK TECHNOLOGY Co.,Ltd.

Address before: Shou Heng Technology Building No. 51 Beijing 100191 Haidian District Xueyuan Road room 0902

Applicant before: BEIJING ZHONGSOU NETWORK TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190528