CN102194015B - Retrieval information heat statistical method - Google Patents

Retrieval information heat statistical method Download PDF

Info

Publication number
CN102194015B
CN102194015B CN2011101820470A CN201110182047A CN102194015B CN 102194015 B CN102194015 B CN 102194015B CN 2011101820470 A CN2011101820470 A CN 2011101820470A CN 201110182047 A CN201110182047 A CN 201110182047A CN 102194015 B CN102194015 B CN 102194015B
Authority
CN
China
Prior art keywords
dimension
statistical item
information
temperature value
retrieving information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2011101820470A
Other languages
Chinese (zh)
Other versions
CN102194015A (en
Inventor
史寿伟
李龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Tai Yue Xiang Sheng Software Co., Ltd.
Original Assignee
CHONGQING XINMEI AGRICULTURAL INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHONGQING XINMEI AGRICULTURAL INFORMATION TECHNOLOGY CO LTD filed Critical CHONGQING XINMEI AGRICULTURAL INFORMATION TECHNOLOGY CO LTD
Priority to CN2011101820470A priority Critical patent/CN102194015B/en
Publication of CN102194015A publication Critical patent/CN102194015A/en
Application granted granted Critical
Publication of CN102194015B publication Critical patent/CN102194015B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a retrieval information heat statistical method which comprises the following steps: respectively performing the heat statistic on integral search information, words contained in the search information and a set of words contained in the search information which are used as different statistic projects based on the characteristic of a hot topic so that key words and phrases aiming at the same topic can be respectively accumulated to reflect hot factors contained in the search information from a plurality of dimensions, thereby achieving the purpose of reducing the error rate of the heat statistic; and meanwhile, performing the time-cooling statistic calculation on the heat value of each statistic project so that the latest hot information has higher heat value, and more timely and available hot information service can be provided for a user.

Description

Realize the method for retrieval according to retrieval information heat statistical
Technical field
The present invention relates to internet technique and search engine correlative technology field, particularly a kind of method that realizes retrieval according to retrieval information heat statistical.
Background technology
Search engine refers to according to certain strategy, uses specific computer program to gather information from internet, and after information being organized and processed, for the user provides retrieval service, the information display that user search is relevant is to user's system.Current, a lot of search engine web sites all provides the hot information Reference Services for the user, namely when the user inputs retrieving information,, according to the key word of user's input, lists the hot information relevant to user input content and uses reference for the user provides.Hot information, normally the retrieving information of user in certain historical period carried out hot statistics and obtained, sort according to the temperature value, the temperature value shows that approximately greatly corresponding information is retrieved and discusses manyly in network, thereby by the hot information Reference Services, provides as much as possible the internet information of most worthy for the user.
In present hot statistics process, normally directly the whole piece retrieving information of user's input when retrieving is retrieved the number of times statistics as a statistical item; Yet, for identical discussion topic, different users tends to express custom according to it and inputs different retrieving informations, therefore these different retrieving informations for same topic will be distinguished statistics in the hot statistics process, do not form accumulation on statistical magnitude, therefore occur for the different retrieval information heat statisticals of same topic not high situation, thereby the topic of jointly being discussed does not embody its true temperature after hot statistics, cause mistake of statistics, can not provide real useful reference data for the user.
Summary of the invention
For above shortcomings in prior art, the present invention is difficult to guaranteed problem for the temperature authenticity that solves existing hot statistics and embody, a kind of method that realizes retrieval according to retrieval information heat statistical is proposed, by the method for provide hot information more accurately to offer help to the user.
For achieving the above object, the present invention has adopted following technological means:
Realize the method for retrieval according to retrieval information heat statistical, the temperature information that statistics obtains is used for, according to the key word of user's input, listing the hot information relevant to user input content when the user inputs retrieving information,
Step 1: gather the retrieving information of user in the search engine input;
Step 2: the set to contained vocabulary in vocabulary contained in retrieving information integral body, retrieving information and retrieving information is carried out hot statistics as different statistical items respectively; Wherein:
Retrieving information integral body is carried out hot statistics comprise, a retrieving information, as first a dimension statistical item, is obtained the concern temperature for the complete information that embodies user search;
Vocabulary contained in retrieving information is carried out hot statistics to be comprised, utilize participle technique to carry out word segmentation processing to retrieving information, each vocabulary that word segmentation processing is obtained is as second a dimension statistical item, obtains for the concern temperature that embodies the single vocabulary that retrieving information comprises;
Hot statistics is carried out in the set of contained vocabulary in retrieving information comprise, the set of each vocabulary that a retrieving information word segmentation processing is obtained, as a third dimension degree statistical item, obtains the concern temperature that is received publicity simultaneously be used to embodying a plurality of vocabulary;
Step 3: calculate and embody the temperature value of paying close attention to the temperature height for described the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item respectively, think that the user provides hot information reference accurately, comprising:
When calculating the temperature value, first set the statistics zero-time, the statistics zero-time is divided into several time periods to the duration of temperature value between computing time;
Far away the lower mode of temperature value contribution is weighted respectively processing to described the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item according to the distance current time; Add up respectively again the summation of the weighting result of the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item, obtain the temperature value F for the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item,, by the statistical computation to temperature value F, make new hot information have high temperature value and embody; Wherein, be calculated as follows the temperature value of each the first dimension statistical item, the second dimension statistical item and third dimension degree statistical item:
F = Σ i = 1 N ( λ i · S i ) ;
Wherein, F represents the temperature value of the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item, and it is higher that the temperature value is paid close attention to more greatly temperature; N represents to add up the hop count of zero-time to the time period that the temperature value was divided between computing time; Si represents the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item collect within i time period number of times; λ i represents i weighted value corresponding to time period, and the closer to the temperature value time period of computing time, its corresponding weighted value is larger;
Step 4: pay close attention to the temperature value of temperature height according to described embodiment, obtain the hot information relevant to the retrieving information content of user's input, and described hot information is showed the user, for the user provides retrieval service.
Wherein, i weighted value λ i corresponding to time period presses the following formula acquisition:
λ i = i N , i=1,2,…,N;
Wherein, the larger expression of the value of i i time period the closer to temperature value computing time, N time period is the temperature value place computing time time period.
Wherein, i weighted value λ i corresponding to time period presses the following formula acquisition:
λ i = 1 N - i + 1 , i=1,2,…,N;
Wherein, the larger expression of the value of i i time period the closer to temperature value computing time, N time period is the temperature value place computing time time period.
Than prior art, the present invention has following beneficial effect:
1, the inventive method is carried out word segmentation processing by the retrieving information to the user, not only to retrieving information the most the first dimension statistical item added up, also vocabulary contained in retrieving information and lexical set are added up respectively as the second dimension statistical item and third dimension degree statistical item respectively, the statistics of various dimensions makes the key vocabularies and the phrase thereof that comprise in user's retrieving information can both obtain cumulative statistics, real embodiment goes out the temperature value of the related topic of retrieving information, for the user provides hot information reference more accurately.
2, in the inventive method, also the temperature value of the first dimension statistical item, the second dimension statistical item and third dimension degree statistical item has been carried out the statistical computation of " time cooling ", make up-to-date hot information have higher temperature value and embody, to help more real-time, useful hot information service is provided for the user.
Description of drawings
Fig. 1 is the FB(flow block) of the inventive method.
Embodiment
The hot statistics of prior art, because of the defect of its method self, causes can not get cumulative statistics for the different retrieving informations of same topic, can't embody the true temperature of its topic, and the mistake of hot statistics is also just unavoidable.Yet, for the retrieving information of same topic, inevitably there will be identical key vocabularies or identical phrase.These characteristics based on much-talked-about topic, when the retrieving information that the present invention inputs at search engine the user carries out hot statistics, set to contained vocabulary in vocabulary contained in retrieving information integral body, retrieving information and retrieving information is carried out hot statistics as different statistical items respectively, make for the key vocabularies of same topic and phrase and can both divide and else obtain cumulative statistics, reflect from a plurality of dimensions the focus factor that comprises retrieving information, with this, reach the purpose that reduces the hot statistics error rate.
Below in conjunction with drawings and Examples, technical scheme of the present invention is further described.
Embodiment:
The present embodiment is a cover information service system,, by gathering the retrieving information of user's input in several large-scale search engine web sites in internet, provides retrieving information relevant search records information service and hot statistics information service.Wherein, the search records information service mainly provides the information such as acquisition time, content and historical times of collection that comprise retrieving information, these information obtain the routine information acquisition means that is this area.The hot statistics information service, adopt the inventive method, gathering the user after the retrieving information of search engine input, the set of contained vocabulary in vocabulary contained in retrieving information integral body, retrieving information and retrieving information is being carried out hot statistics as different statistical items respectively; In the hot statistics information service of the present embodiment, logical super heat value embodies pays close attention to temperature, it is higher that the temperature value is paid close attention to more greatly temperature, and then list before the descending sequence of temperature value 20 hot statistics project according to user-selected temperature query context and offer the client as a reference.Adopt flow process that the inventive method carries out hot statistics as shown in Figure 1, concrete steps are as follows:
A) gather the retrieving information of user in the search engine input, and record acquisition time.
B) every the retrieving information that gathers is carried out statistical item and divides processing, be specially:
B1) with a retrieving information as first a dimension statistical item.For example, one retrieving information is " what is hot statistics? " with " what is hot statistics? " this integral body is as first a dimension statistical item, often collect once " what is hot statistics? " this retrieving information, with " what is hot statistics? " the times of collection accumulation of this first dimension statistical item adds 1; If have another retrieving information for " what hot statistics is? " therefore, because its character data is not quite identical, both as two the first different dimension statistical items number of times of collecting of cumulative statistics respectively.The first dimension statistical item is mainly used in embodying the concern temperature of the complete information of user search.
B2) utilize participle technique to carry out word segmentation processing to retrieving information, each vocabulary that word segmentation processing is obtained is as second a dimension statistical item.For example, with retrieving information " what is hot statistics? " carry out word segmentation processing, can obtain " what ", " temperature " and " statistics " three vocabulary, with " what " word as second a dimension statistical item, " temperature " word as second a dimension statistical item, " statistics " word as second a dimension statistical item, these three the second dimension statistical items are carried out respectively the cumulative statistics of times of collection.The second dimension statistical item is mainly used in embodying the concern temperature of the single vocabulary that comprises in retrieving information, makes the key vocabularies that relates in the higher topic of public concern temperature can access cumulative statistics.
Participle technique belongs to the common technology of computer network field, wherein minute morphology commonly used has string matching to divide morphology, the meaning of a word to divide morphology, statistical morphology etc., a large amount of introductions about participle technique are also arranged in existing document, network data, at this, just no longer carefully state the concrete processing procedure of participle technique.
The set of each vocabulary that b3) a retrieving information word segmentation processing is obtained is as a third dimension degree statistical item.For example, by retrieving information " what is hot statistics? " " what " that word segmentation processing obtains, " temperature " and " statistics " three vocabulary, the set of these three vocabulary " what, temperature, statistics " is namely as a third dimension degree statistical item; And by retrieving information " what hot statistics is? " " temperature " that word segmentation processing obtains, " statistics " and " what " three vocabulary, " what, temperature, statistics " its lexical set be similarly, therefore with " what is hot statistics? " the lexical set that word segmentation processing obtains is considered as the second identical dimension statistical item and carries out the cumulative statistics of times of collection.Third dimension degree statistical item is mainly used in embodying the concern temperature that a plurality of vocabulary are received publicity simultaneously, pay close attention to single glossary statistic the much-talked-about topic that temperature seems too unilateral for some, be fit to embody its real temperature of paying close attention to by third dimension degree statistical item.
Divide and process by the statistical item of each retrieving information being carried out above-mentioned three dimensions, strengthened the integrality of statistics, make the vocabulary and the lexical set that comprise in user's retrieving information can both obtain cumulative statistics.
C) calculate and embody the temperature value of paying close attention to the temperature height.Introduced the concept of " time cooling " in the temperature value computation process of this scheme, namely far away lower to the contribution of temperature value apart from the current time.Because present network information highly dense, hot information was passed and may can be replaced by other hot informations very soon in time later in its focus phase, so the hot information nearer apart from the current time, and probably the help to people is larger.Consideration based on this factor, when calculating the temperature value, first set the statistics zero-time, the statistics zero-time is divided into several time periods to the duration of temperature value between computing time, is calculated as follows the temperature value of each the first dimension statistical item, the second dimension statistical item and third dimension degree statistical item:
F = Σ i = 1 N ( λ i · S i ) ;
Wherein, F represents the temperature value of the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item; N represents to add up the hop count of zero-time to the time period that the temperature value was divided between computing time; S iRepresent the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item collect within i time period number of times; λ iRepresent i weighted value corresponding to time period, the closer to the temperature value time period of computing time, its corresponding weighted value is larger.
Carry out the statistical computation of " time cooling " by the temperature value to the first dimension statistical item, the second dimension statistical item and third dimension degree statistical item, make up-to-date hot information have higher temperature value and embody, to help more real-time, useful hot information service is provided for the user.
As for weighted value λ corresponding to each time period iHow value, can have multiple different value scheme.
For example, the mode value that weighted value corresponding to each time period distributes by equal difference, i weighted value λ corresponding to time period iPressing following formula obtains:
λ i = i N , i=1,2,…,N;
Perhaps, the mode value that weighted value corresponding to each time period distributes in proportion, i weighted value λ corresponding to time period iPressing following formula obtains:
λ i = 1 N - i + 1 , i=1,2,…,N;
In the value formula of above-mentioned two weighted value λ i, the larger expression of the value of i i time period the closer to temperature value computing time, N time period is the temperature value place computing time time period.
Go out outside this, can also adopt other " time cooling " value mode, specifically how to obtain weighted value λ i, need to eliminate speed according to the field internal information and determine.
Simultaneously, in different field, the hot statistics of retrieving information can independently carry out respectively, also can adopt a plurality of fields to mix the mode of carrying out hot statistics, can determine according to user's demand the territory of hot statistics.
Explanation is finally, above embodiment is only unrestricted in order to technical scheme of the present invention to be described, although with reference to embodiment, the present invention is had been described in detail, those of ordinary skill in the art is to be understood that, can modify or be equal to replacement technical scheme of the present invention, and not breaking away from aim and the scope of technical solution of the present invention, it all should be encompassed in the middle of claim scope of the present invention.

Claims (3)

1. realize the method for retrieval according to retrieval information heat statistical, the temperature information that statistics obtains is used for, according to the key word of user's input, listing the hot information relevant to user input content when the user inputs retrieving information, it is characterized in that,
Step 1: gather the retrieving information of user in the search engine input;
Step 2: the set to contained vocabulary in vocabulary contained in retrieving information integral body, retrieving information and retrieving information is carried out hot statistics as different statistical items respectively; Wherein:
Retrieving information integral body is carried out hot statistics comprise, a retrieving information, as first a dimension statistical item, is obtained the concern temperature for the complete information that embodies user search;
Vocabulary contained in retrieving information is carried out hot statistics to be comprised, utilize participle technique to carry out word segmentation processing to retrieving information, each vocabulary that word segmentation processing is obtained is as second a dimension statistical item, obtains for the concern temperature that embodies the single vocabulary that retrieving information comprises;
Hot statistics is carried out in the set of contained vocabulary in retrieving information comprise, the set of each vocabulary that a retrieving information word segmentation processing is obtained, as a third dimension degree statistical item, obtains the concern temperature that is received publicity simultaneously be used to embodying a plurality of vocabulary;
Step 3: calculate and embody the temperature value of paying close attention to the temperature height for described the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item respectively, think that the user provides hot information reference accurately, comprising:
When calculating the temperature value, first set the statistics zero-time, the statistics zero-time is divided into several time periods to the duration of temperature value between computing time;
Far away the lower mode of temperature value contribution is weighted respectively processing to described the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item according to the distance current time; Add up respectively again the summation of the weighting result of the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item, obtain the temperature value F for the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item; Wherein, be calculated as follows the temperature value of each the first dimension statistical item, the second dimension statistical item and third dimension degree statistical item:
F = Σ i = 1 N ( λ i · S i ) ;
Wherein, F represents the temperature value of the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item, and it is higher that the temperature value is paid close attention to more greatly temperature; N represents to add up the hop count of zero-time to the time period that the temperature value was divided between computing time; S iRepresent the first dimension statistical item, the second dimension statistical item or third dimension degree statistical item collect within i time period number of times; λ iRepresent i weighted value corresponding to time period, the closer to the temperature value time period of computing time, its corresponding weighted value is larger;
Step 4: pay close attention to the temperature value of temperature height according to described embodiment, obtain the hot information relevant to the retrieving information content of user's input, and described hot information is showed the user, for the user provides retrieval service.
2. the method that realizes retrieval according to retrieval information heat statistical according to claim 1, is characterized in that i weighted value λ corresponding to time period iPressing following formula obtains:
λ i = i N , i=1,2,…,N;
Wherein, the larger expression of the value of i i time period the closer to temperature value computing time, N time period is the temperature value place computing time time period.
3. the method that realizes retrieval according to retrieval information heat statistical according to claim 1, is characterized in that i weighted value λ corresponding to time period iPressing following formula obtains:
λ i = 1 N - i + 1 , i=1,2,…,N;
Wherein, the larger expression of the value of i i time period the closer to temperature value computing time, N time period is the temperature value place computing time time period.
CN2011101820470A 2011-06-30 2011-06-30 Retrieval information heat statistical method Active CN102194015B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011101820470A CN102194015B (en) 2011-06-30 2011-06-30 Retrieval information heat statistical method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011101820470A CN102194015B (en) 2011-06-30 2011-06-30 Retrieval information heat statistical method

Publications (2)

Publication Number Publication Date
CN102194015A CN102194015A (en) 2011-09-21
CN102194015B true CN102194015B (en) 2013-11-13

Family

ID=44602083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011101820470A Active CN102194015B (en) 2011-06-30 2011-06-30 Retrieval information heat statistical method

Country Status (1)

Country Link
CN (1) CN102194015B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793439B (en) * 2012-11-05 2019-01-15 腾讯科技(深圳)有限公司 A kind of real-time retrieval information acquisition method, device and server
CN103324718B (en) * 2013-06-25 2016-08-10 百度在线网络技术(北京)有限公司 Method and system based on humongous search Web log mining topic venation
CN104252470B (en) * 2013-06-26 2018-02-09 重庆新媒农信科技有限公司 A kind of hot word recommends method and system
CN103500163B (en) * 2013-07-24 2016-12-28 百度在线网络技术(北京)有限公司 The method and apparatus of identification event key development
CN103646040A (en) * 2013-11-15 2014-03-19 天脉聚源(北京)传媒科技有限公司 Information display method and device
CN107346494A (en) * 2016-05-05 2017-11-14 滴滴(中国)科技有限公司 A kind of method and system for law mining of going on a journey
CN105205048B (en) * 2015-10-21 2018-05-04 迪爱斯信息技术股份有限公司 A kind of hot word analytic statistics system and method
CN108170693B (en) * 2016-12-07 2020-07-31 北京国双科技有限公司 Hot word pushing method and device
CN108733706B (en) * 2017-04-20 2022-12-20 腾讯科技(深圳)有限公司 Method and device for generating heat information
CN110968691B (en) * 2018-09-30 2023-07-04 北京国双科技有限公司 Judicial hotspot determination method and device
CN109753600A (en) * 2018-12-20 2019-05-14 航天信息股份有限公司 Handle the method, apparatus asked questions and storage medium
CN112445973A (en) * 2020-11-13 2021-03-05 北京创业光荣信息科技有限责任公司 Method and device for searching project, storage medium and computer equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101246499B (en) * 2008-03-27 2010-10-13 腾讯科技(深圳)有限公司 Network information search method and system
CN101923544B (en) * 2009-06-15 2012-08-08 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots

Also Published As

Publication number Publication date
CN102194015A (en) 2011-09-21

Similar Documents

Publication Publication Date Title
CN102194015B (en) Retrieval information heat statistical method
US10402858B2 (en) Computer-implemented method and system for enabling the automated selection of keywords for rapid keyword portfolio expansion
CN104750856B (en) A kind of System and method for of multidimensional Collaborative Recommendation
CN103177090B (en) A kind of topic detection method and device based on big data
CN103324718B (en) Method and system based on humongous search Web log mining topic venation
CN102411583B (en) Method and device for matching texts
CN107644089A (en) A kind of hot ticket extracting method based on the network media
US7904303B2 (en) Engagement-oriented recommendation principle
CN102999588A (en) Method and system for recommending multimedia applications
CN104182389A (en) Semantic-based big data analysis business intelligence service system
CN107577688A (en) Original article influence power analysis system based on media information collection
CN103500213B (en) Page hot-spot resource updating method and device based on pre-reading
CN105608200A (en) Network public opinion tendency prediction analysis method
CN105095433A (en) Recommendation method and device for entities
CN108595461A (en) Interest heuristic approach, storage medium, electronic equipment and system
CN102404126A (en) Charging method of cloud computing during application process
CN110019616A (en) A kind of POI trend of the times state acquiring method and its equipment, storage medium, server
CN102902775A (en) Internet real-time computing method and internet real-time computing system
CN102968494A (en) System and method for acquiring traffic information by microblog
Fujiki et al. Identification of bursts in a document stream
Guo et al. Effect of the time window on the heat-conduction information filtering model
CN111159341A (en) Information recommendation method and device based on user investment and financing preference
CN103034963A (en) Service selection system and selection method based on correlation
CN106528804A (en) User grouping method based on fuzzy clustering
CN109635192A (en) Magnanimity information temperature seniority among brothers and sisters update method and platform towards micro services

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20180731

Address after: 230088 room 405-5, R & D center of China (Hefei) International Intelligent Speech Industrial Park, 3333, hi tech Road, Hefei, Anhui.

Patentee after: Anhui Tai Yue Xiang Sheng Software Co., Ltd.

Address before: 401121 3, 1 floor, office building, south wing of mercury science and technology building, 5 new Mount Huangshan Road, North New District, Chongqing.

Patentee before: Chongqing Xinmei Agricultural Information Technology Co.,Ltd.

TR01 Transfer of patent right