CN103324637B - A kind of hot information method for digging and system - Google Patents

A kind of hot information method for digging and system Download PDF

Info

Publication number
CN103324637B
CN103324637B CN201210079091.3A CN201210079091A CN103324637B CN 103324637 B CN103324637 B CN 103324637B CN 201210079091 A CN201210079091 A CN 201210079091A CN 103324637 B CN103324637 B CN 103324637B
Authority
CN
China
Prior art keywords
information
reprinting
hot
page source
mrow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210079091.3A
Other languages
Chinese (zh)
Other versions
CN103324637A (en
Inventor
姚磊
何军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Shenzhen Shiji Guangsu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Shiji Guangsu Information Technology Co Ltd filed Critical Shenzhen Shiji Guangsu Information Technology Co Ltd
Priority to CN201210079091.3A priority Critical patent/CN103324637B/en
Priority to PCT/CN2013/073011 priority patent/WO2013139290A1/en
Publication of CN103324637A publication Critical patent/CN103324637A/en
Application granted granted Critical
Publication of CN103324637B publication Critical patent/CN103324637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

Embodiment of the present invention proposes a kind of hot information method for digging and system.Method includes:The relative hot value between Intelligence Page source is calculated according to the access times in Intelligence Page source;Each information of reprinting is calculated according to the relative hot value in Intelligence Page source and is reprinting the reprinting weight in having the Intelligence Page source of the reprinting information;Reprinting weight of each reprinting information in each Intelligence Page source is summed, calculates the heatrate value of each reprinting information, and hot information is determined from each reprinting information according to heatrate value size order.Embodiment of the present invention can automatically generate hot information from whole internet based on the heatrate value for reprinting information, therefore information pushing efficiency can be improved, saves artificial and reduces cost, and can dynamically eliminate website inferior, strengthen high-quality website so that Mining Quality is continued to optimize.

Description

A kind of hot information method for digging and system
Technical field
Embodiment of the present invention is related to technical field of internet application, more particularly, to a kind of hot information excavation side Method and system.
Background technology
With the rapid development of computer technology and network technology, internet (Internet) daily life, The effect played in study and work is also increasing.People get used to by a variety of ways such as portal website, news search websites Know Internet news in footpath.
Internet news is the news using network as carrier, have it is quick, many-sided, by all kinds of means, multimedia, the spy such as interaction Point.Internet news is to break through traditional dissemination of news concept, depending on, listen, experience brand-new to audient in terms of sense.It is by disordering News carry out orderly integration, and greatly reduced the thickness of information, allowed people to be obtained within the most short time most effective News information.Moreover, following Internet news will be limited no longer by traditional news media publisher, and audient can issue certainly Oneself news, and obtain propagate faster in a short time, and news is by as the platform of people's interaction.Internet news By as the raising that people recognize is towards deeper level development, this will overturn the traditional concept of Internet news completely.
At present, most of portal website, or news search website all can select some hot informations to be placed on homepage, with guiding User reads.For example news category can be divided into the classification such as domestic, international, amusement, then in these points by some portal websites Hot news is provided in class to guide user to read.
However, such hot information is generally by editor's artificial selection, or comprehensive some portal websites Homepage article and generate.The information pushing efficiency of such hot information is low, and information providing formula waste of manpower, and And carry larger subjective factor.
Meanwhile in currently available technology, the selection range of news can only be confined to some authoritative websites, therefore data are selected Take scope smaller, it is impossible to ensure the accurate hit rate of hot information.
The content of the invention
Embodiment of the present invention proposes a kind of hot information method for digging, to automatically generate hot information, so as to improve letter Cease pushing efficiency.
Embodiment of the present invention also proposes a kind of hot information digging system, to automatically generate hot information, so as to improve Information pushing efficiency.
The concrete scheme of embodiment of the present invention is as follows:
A kind of hot information method for digging, this method include:
The relative hot value between Intelligence Page source is calculated according to the access times in Intelligence Page source;
There is the Information Network of the reprinting information in reprinting according to each reprinting information of the relative hot value in Intelligence Page source calculating Reprinting weight in page source;
Reprinting weight of each reprinting information in each Intelligence Page source is summed, calculates each reprinting information Heatrate value, and according to described information hot value size order from it is described reprinting information in determine hot information.
A kind of hot information digging system, the system include:
With respect to hot value computing unit, for calculating the phase between Intelligence Page source according to the access times in Intelligence Page source To hot value;
Weight calculation unit is reprinted, is being reprinted for calculating each information of reprinting according to the relative hot value in Intelligence Page source There is the reprinting weight in the Intelligence Page source of the reprinting information;
Hot information determining unit, for asking reprinting weight of each reprinting information in each Intelligence Page source With, the heatrate value of each reprinting information is calculated, and according to described information hot value size order from the reprinting information In determine hot information.
It can be seen from the above technical proposal that in embodiments of the present invention, first according to the access in Intelligence Page source time Number calculates the relative hot value between Intelligence Page source;Then each reprint is calculated according to the relative hot value in Intelligence Page source to believe Cease and reprinting the reprinting weight in having the Intelligence Page source of the reprinting information;And the reprinting weight of each reprinting information is asked With, calculate it is each reprinting information heatrate value, according still further to heatrate value size order from reprint information in determine Go out hot information.As can be seen here, can be from whole internet based on the letter for reprinting information after using embodiment of the present invention Breath hot value automatically generates hot information, therefore can improve information pushing efficiency.
Brief description of the drawings
Fig. 1 is the hot information method for digging schematic flow sheet according to embodiment of the present invention;
Fig. 2 is the hot information method for digging system schematic according to embodiment of the present invention;
Fig. 3 is the exemplary hot information mining process schematic diagram according to embodiment of the present invention;
Fig. 4 is the reprinting information recognition result schematic diagram according to embodiment of the present invention;
Fig. 5 is to show schematic diagram according to the hot information of embodiment of the present invention.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, the present invention is made below in conjunction with the accompanying drawings further Detailed description.
In embodiments of the present invention, using each Intelligence Page source as voter, information is reprinted as ballot using every Subject matter, the weight using the popular degree in each Intelligence Page source as ballot.Pass through the throwing of every reprinting information of COMPREHENSIVE CALCULATING Ticket score, the reprinting information to make a good score regard as hot information, and before coming, simultaneously, it is contemplated that dissemination of news needs the time, Correction factor, correction ballot score, so as to obtain temperature ranking to the end can be used as by the use of the issuing time for reprinting information.
Fig. 1 is the hot information method for digging schematic flow sheet according to embodiment of the present invention.
As shown in figure 1, this method includes:
Step 101:The relative hot value between Intelligence Page source is calculated according to the access times in Intelligence Page source.
Herein, can by the access log of hot information in Intelligence Page source and the access log of other news, Calculate the access temperature in each Intelligence Page source.Such as by access times described in the access log of hot information and Described access times are added in the access log of other news, the access times as Intelligence Page source.
Preferably, Intelligence Page source can be various types of news websites.
The access temperature for calculating Intelligence Page source can have a variety of calculations, and its principle is:Intelligence Page source Access times are more, and the relative hot value in the Intelligence Page source should be higher.Such as:
For k-th of Intelligence Page source, it is calculated with respect to hot value SiteHotnessk, wherein:
Wherein norm is normalization coefficient;AccessCountkFor the access times in k-th of Intelligence Page source, K is all The set in Intelligence Page source.
Such as:Assuming that A is acquired in some search engine, the news of tri- information web page sources of B, C, it is assumed that this three information Web page source is respectively 50,20,30 in the access times (AccessCount) of search engine.
Then website C temperature SiteHotnessC=norm* (log (30)/log (50+20+30));
Website B temperature SiteHotnessB=norm* (log (20)/log (50+20+30));
Website A temperature SiteHotnessA=norm* (log (50)/log (50+20+30)).
The truth of a matter in above-mentioned logarithm can be 10, or e.So as to ensure website A temperature SiteHotnessAGreatly In then website C temperature SiteHotnessC, and website C temperature SiteHotnessCMore than website B temperature SiteHotnessB
Wherein, corresponding change or adjustment can be made according to specific experience in the application, norm specific value.
Step 102:Calculate each information of reprinting according to the relative hot value in Intelligence Page source has the reprinting information in reprinting Intelligence Page source in reprinting weight.
Herein, can the similarity algorithm based on text feature determined from each Intelligence Page source it is described reprint letter Breath.The papers published of news is identified by the similarity algorithm based on text feature, that is, identifies which news belongs to same The reprinting of piece news.
Preferably, time factor further can be determined according to each issuing time for reprinting information, and utilizes the time The each heatrate value of factor pair is modified.Exemplarily, the reproduced time of information will can also be reprinted as time factor.
Such as:For i-th of reprinting information, its heatrate value NewsHotness is calculatedi
Wherein:
CitationHotnessk=g (SiteHotnessk);
Wherein K is the set in all Intelligence Page sources for reprinting this i-th reprinting information;PublishTime for this i-th The individual issuing time for reprinting information;F (PublishTime) is the time tune weight function on PublishTime, CitationHotnesskInformation is reprinted in k-th of reprinting reprinted in having the Intelligence Page source of the reprinting information for this i-th Weight, g (SiteHotnessk) it is on SiteHotnesskTemperature adjust weight function.
Time adjusts weight function f (PublishTime) to be used to ensure heatrate value NewsHotnessiTimeliness n.Generally For, issuing time PublishTime should be got over closer to current time, then time tune weight function f (PublishTime) value Greatly.
Time adjusts weight function f (PublishTime) concrete functional form to have numerous embodiments, can be linear Or it is nonlinear.As long as meeting issuing time PublishTime closer to current time, then the time adjust weight function f (PublishTime) value should be bigger (so as to ensure heatrate value NewsHotnessiValue can be bigger) it is substantially former Then, embodiment of the present invention to f (PublishTime) concrete functional form and is not limited.
g(SiteHotnessk) it is that temperature adjusts weight function, for ensureing to reprint weight CitationHotnesskQuality refer to Mark.Typically, the relative hot value SiteHotness of some websiteskHigher, then it reprints weight CitationHotnessk Value should be bigger.
Similarly, temperature adjusts weight function g (SiteHotnessk) concrete functional form can have numerous embodiments, can To be linear or nonlinear.Substantially, as long as meeting the relative hot value SiteHotness of websitekIt is higher, Then temperature adjusts weight function CitationHotnesskThe bigger basic principle of value, embodiment of the present invention is to f (PublishTime) concrete functional form is simultaneously not limited.
Step 103:Reprinting weight of each reprinting information in each Intelligence Page source is summed, calculated each The heatrate value of information is reprinted, and determines that focus is believed from the reprinting information according to described information hot value size order Breath.
Herein, the reprinting weight of each reprinting information is summed, so as to calculate the information of each reprinting information Hot value fraction, after can then proceed in height sequence, suitable news bar number is selected to be showed.
For example it can be set in advance as showing 10 hot informations.So sorted according to height to each reprinting information Heatrate value point be ranked up after, select 10 news bar numbers to be showed as hot information from high to low.
Preferably, in embodiments of the present invention, can also first all news categories, such as be divided into it is domestic, international, Amusement etc., then excavate the hot information in each classification using embodiment of the present invention in specific classified news.
Based on above-mentioned analysis, embodiment of the present invention also proposed a kind of hot information digging system.
Fig. 2 is the hot information method for digging system schematic according to embodiment of the present invention.
As shown in Fig. 2 the system includes relative hot value computing unit 201, reprints weight calculation unit 202 and focus letter Cease determining unit 203.
Wherein:
With respect to hot value computing unit 201, for according between the access times in Intelligence Page source calculating Intelligence Page source Relative hot value;
Weight calculation unit 202 is reprinted, is existed for calculating each information of reprinting according to the relative hot value in Intelligence Page source Reprint the reprinting weight having in the Intelligence Page source of the reprinting information;
Hot information determining unit 203, for entering to reprinting weight of each reprinting information in each Intelligence Page source Row summation, the heatrate value of each reprinting information is calculated, and according to described information hot value size order from the reprinting Hot information is determined in information.
Preferably, hot information determining unit 203, when being further used for being determined according to each issuing time for reprinting information Between the factor, and each heatrate value is modified using the time factor.
Preferably, weight calculation unit 202, it is further used for based on the similarity algorithm of text feature from each Information Network Page determines the reprinting information in source.
In one embodiment, with respect to hot value computing unit 201, for for k-th of Intelligence Page source, calculating it With respect to hot value SiteHotnessk, wherein:
Wherein norm is normalization coefficient;AccessCountkFor the access times in k-th of Intelligence Page source, K is all The set in Intelligence Page source.
In one embodiment, weight calculation unit 202, for for i-th of reprinting information, calculating its heatrate Value NewsHotnessi
CitationHotnessk=g (SiteHotnessk);
Wherein K is the set in all Intelligence Page sources for reprinting this i-th reprinting information;PublishTime for this i-th The individual issuing time for reprinting information;F (PublishTime) is the time tune weight function on PublishTime, CitationHotnesskInformation is reprinted in k-th of reprinting reprinted in having the Intelligence Page source of the reprinting information for this i-th Weight, g (SiteHotnessk) it is on SiteHotnesskTemperature adjust weight function.
Similarly, the time adjusts weight function f (PublishTime) to be used to ensure heatrate value NewsHotnessiIt is stylish Property.Typically, issuing time PublishTime is closer to current time, then the time adjust weight function f's (PublishTime) Value should be bigger.
Time adjusts weight function f (PublishTime) concrete functional form to have numerous embodiments, can be linear Or it is nonlinear.As long as meeting issuing time PublishTime closer to current time, then the time adjust weight function f (PublishTime) value should be bigger basic principle, specific function of the embodiment of the present invention to f (PublishTime) Form is simultaneously not limited.
g(SiteHotnessk) it is that temperature adjusts weight function, for ensureing to reprint weight CitationHotnesskQuality refer to Mark.Typically, the relative hot value SiteHotness of some websiteskHigher, then it reprints weight CitationHotnessk Value should be bigger.
Similarly, temperature adjusts weight function g (SiteHotnessk) concrete functional form can have numerous embodiments, can To be linear or nonlinear.Substantially, as long as meeting the relative hot value SiteHotness of websitekIt is higher, Then temperature adjusts weight function CitationHotnesskThe bigger basic principle of value, embodiment of the present invention is to f (PublishTime) concrete functional form is simultaneously not limited.
In one embodiment, the system further comprises hot information display unit 204.Hot information display unit 204, for showing the hot information determined from reprinting information.For example hot information display unit 204 can be advance It is arranged to show 10 hot informations;After being ranked up according to height sequence to the heatrate value point of each reprinting information, 10 news bar numbers are selected to be showed as hot information from high to low.
Hot news can be excavated from numerous news sources of internet according to embodiment of the present invention.Based on above-mentioned Labor, Fig. 3 are the exemplary hot news mining process schematic diagram according to embodiment of the present invention.
As shown in figure 3, at processing block 1, from the numerous news sources (such as news website) for coming from internet from crawl Go out magnanimity news, and identify the specific papers published of news, that is, identify which news belongs to turning for same piece news Carry.
Such as:Specific identification technology herein can use the Similarity Measure based on text feature.
Exemplarily, Fig. 4 is the reprinting news recognition result schematic diagram according to embodiment of the present invention.
Figure 4 illustrates " China's Software Market in 2015 be expected to up to 71,500,000,000 yuan " from different messages source it is new Hear, be actually the reprinting news of same news.
In processing block 2, pass through the hot news access log to numerous news websites and the access day of other news Will, calculate the relative hot value (accessing temperature) of each news website.
The relative hot value computational methods of each website are as follows:
Wherein K is the collection of all websites Close, norm is normalization coefficient, and AccessCount is the access times of each news website.
In processing block 3, the issuing time and processing block 2 of reprinting recognition result, reprinting news with reference to processing block 1 are counted The relative hot value of each news website calculated.
Such as:Such as:For i-th of reprinting news, its news hot value NewsHotness is calculatedi
Wherein:
CitationHotnessk=g (SiteHotnessk);
Wherein K is the set of all news websites for reprinting this i-th reprinting news;PublishTime is this i-th Reprint the issuing time of news;F (PublishTime) is the time tune weight function on PublishTime, CitationHotnesskNews is reprinted in k-th of the reprinting power reprinted in having the news website of the reprinting news for this i-th Weight, g (SiteHotnessk) it is on SiteHotnesskTemperature adjust weight function.
Time adjusts weight function f (PublishTime) to be used to ensure news hot value NewsHotnessiTimeliness n.Generally For, issuing time PublishTime should be got over closer to current time, then time tune weight function f (PublishTime) value Greatly.
Time adjusts weight function f (PublishTime) concrete functional form to have numerous embodiments, can be linear Or it is nonlinear.As long as meeting issuing time PublishTime closer to current time, then the time adjust weight function f (PublishTime) value should be bigger basic principle, specific function of the embodiment of the present invention to f (PublishTime) Form is simultaneously not limited.
In processing block 4, hot news is determined according to the result of calculation of processing block 3, and pass through microblogging, webpage, electronics Hot news is pushed to user by the various ways such as mail.After determining hot news, hot news can be stored in focus In news access log, consequently facilitating the backtracking of user at any time accesses.
Such as:Fig. 5 is to show schematic diagram according to the hot information of embodiment of the present invention.Moreover, embodiment of the present invention It is preferred that the specific source of the hot information is shown in result is pushed.
In embodiments of the present invention, the phase between Intelligence Page source is calculated according to the access times in Intelligence Page source first To hot value;Then there is the letter of the reprinting information in reprinting according to each reprinting information of the relative hot value in Intelligence Page source calculating Cease the reprinting weight in web page source;And the reprinting weight of each reprinting information is summed, calculate each reprinting information Heatrate value, hot information is determined from reprinting information according still further to the size order of heatrate value.As can be seen here, apply After embodiment of the present invention, focus letter can be automatically generated based on the heatrate value for reprinting information from whole internet Breath, therefore can save manually and reduce cost.
Moreover, embodiment of the present invention can also support any number of hot news to show demand, and base can be supported In the calculating of whole internet news, and by the automatic mining of technorati authority, website inferior can be dynamically eliminated, is strengthened high-quality Website so that Mining Quality is continued to optimize.
The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements made etc., the protection of the present invention should be included in Within the scope of.

Claims (12)

1. a kind of hot information method for digging, it is characterised in that this method includes:
The relative hot value between Intelligence Page source is calculated according to the access times in Intelligence Page source;
There is the Intelligence Page source of the reprinting information in reprinting according to each reprinting information of the relative hot value in Intelligence Page source calculating In reprinting weight;Wherein, the relative hot value in Intelligence Page source is higher, and the reprinting for reprinting information in described information web page source is weighed It is again bigger;
Reprinting weight of each reprinting information in each Intelligence Page source is summed, calculates the letter of each reprinting information Hot value is ceased, and hot information is determined from the reprinting information according to described information hot value size order.
2. hot information method for digging according to claim 1, it is characterised in that this method further comprises:According to every The individual issuing time for reprinting information determines time factor, and each described information hot value is repaiied using the time factor Just.
3. hot information method for digging according to claim 1, it is characterised in that this method further comprises:Based on text The similarity algorithm of eigen determines the reprinting information from each Intelligence Page source.
4. hot information method for digging according to claim 1, it is characterised in that
It is described to calculate the relative hot value between Intelligence Page source according to the access times in Intelligence Page source and be:
For k-th of Intelligence Page source, it is calculated with respect to hot value SiteHotnessk, wherein:
<mrow> <mi>S</mi> <mi>i</mi> <mi>t</mi> <mi>e</mi> <mi>H</mi> <mi>o</mi> <mi>t</mi> <mi>n</mi> <mi>e</mi> <mi>s</mi> <mi> </mi> <msub> <mi>s</mi> <mi>k</mi> </msub> <mo>=</mo> <mi>n</mi> <mi>o</mi> <mi>r</mi> <mi>m</mi> <mo>*</mo> <mrow> <mo>(</mo> <mi>log</mi> <mo>(</mo> <mrow> <msub> <mi>AccessCount</mi> <mi>k</mi> </msub> </mrow> <mo>)</mo> <mo>/</mo> <munder> <mo>&amp;Sigma;</mo> <mi>K</mi> </munder> <mi>log</mi> <mo>(</mo> <mrow> <msub> <mi>AcessCount</mi> <mi>k</mi> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>;</mo> </mrow>
Wherein norm is normalization coefficient;AccessCountkFor the access times in k-th of Intelligence Page source, K is all Information Networks The set in page source.
5. hot information method for digging according to claim 1, it is characterised in that the calculating heatrate value includes:
For i-th of reprinting information, its heatrate value NewsHotness is calculatedi
<mrow> <msub> <mi>NewsHotness</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>P</mi> <mi>u</mi> <mi>b</mi> <mi>l</mi> <mi>i</mi> <mi>s</mi> <mi>h</mi> <mi>T</mi> <mi>i</mi> <mi>m</mi> <mi>e</mi> <mo>)</mo> </mrow> <mo>*</mo> <munderover> <mo>&amp;Sigma;</mo> <mn>1</mn> <mi>K</mi> </munderover> <msub> <mi>CitationHotness</mi> <mi>k</mi> </msub> <mo>;</mo> </mrow>
CitationHotnessk=g (SiteHotnessk);
Wherein K is the set in all Intelligence Page sources for reprinting this i-th reprinting information;PublishTime is this i-th turn The issuing time of information carrying breath;F (PublishTime) is the time tune weight function on PublishTime, CitationHotnesskInformation is reprinted in k-th of reprinting reprinted in having the Intelligence Page source of the reprinting information for this i-th Weight, g (SiteHotnessk) it is on SiteHotnesskTemperature adjust weight function.
6. the hot information method for digging according to any one of claim 1-5, it is characterised in that this method is further wrapped Include:
Show the hot information determined from reprinting information.
7. a kind of hot information digging system, it is characterised in that the system includes:
With respect to hot value computing unit, for calculating the relative thermal between Intelligence Page source according to the access times in Intelligence Page source Angle value;
Weight calculation unit is reprinted, has this in reprinting for calculating each information of reprinting according to the relative hot value in Intelligence Page source Reprint the reprinting weight in the Intelligence Page source of information;Wherein, the relative hot value in Intelligence Page source is higher, reprints information in institute The reprinting weight for stating Intelligence Page source is bigger;
Hot information determining unit, for being summed to reprinting weight of each reprinting information in each Intelligence Page source, The heatrate value of each reprinting information is calculated, and is reprinted according to described information hot value size order from described in information really Make hot information.
8. hot information digging system according to claim 7, it is characterised in that hot information determining unit, further For determining time factor according to each issuing time for reprinting information, and using the time factor to each information heat Angle value is modified.
9. hot information digging system according to claim 7, it is characterised in that reprint weight calculation unit, further For determining the reprinting information from each Intelligence Page source based on the similarity algorithm of text feature.
10. hot information digging system according to claim 7, it is characterised in that
With respect to hot value computing unit, for for k-th of Intelligence Page source, calculating it with respect to hot value SiteHotnessk, its In:
<mrow> <mi>S</mi> <mi>i</mi> <mi>t</mi> <mi>e</mi> <mi>H</mi> <mi>o</mi> <mi>t</mi> <mi>n</mi> <mi>e</mi> <mi>s</mi> <mi> </mi> <msub> <mi>s</mi> <mi>k</mi> </msub> <mo>=</mo> <mi>n</mi> <mi>o</mi> <mi>r</mi> <mi>m</mi> <mo>*</mo> <mrow> <mo>(</mo> <mi>log</mi> <mo>(</mo> <mrow> <msub> <mi>AccessCount</mi> <mi>k</mi> </msub> </mrow> <mo>)</mo> <mo>/</mo> <munder> <mo>&amp;Sigma;</mo> <mi>K</mi> </munder> <mi>log</mi> <mo>(</mo> <mrow> <msub> <mi>AcessCount</mi> <mi>k</mi> </msub> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mo>;</mo> </mrow>
Wherein norm is normalization coefficient;AccessCountkFor the access times in k-th of Intelligence Page source, K is all Information Networks The set in page source.
11. hot information digging system according to claim 7, it is characterised in that
Weight calculation unit is reprinted, for for i-th of reprinting information, calculating its heatrate value
<mrow> <msub> <mi>NewsHotness</mi> <mi>i</mi> </msub> <mo>;</mo> <msub> <mi>NewsHotness</mi> <mi>i</mi> </msub> <mo>=</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>P</mi> <mi>u</mi> <mi>b</mi> <mi>l</mi> <mi>i</mi> <mi>s</mi> <mi>h</mi> <mi>T</mi> <mi>i</mi> <mi>m</mi> <mi>e</mi> <mo>)</mo> </mrow> <mo>*</mo> <munderover> <mo>&amp;Sigma;</mo> <mn>1</mn> <mi>K</mi> </munderover> <msub> <mi>CitationHotness</mi> <mi>k</mi> </msub> <mo>;</mo> </mrow>
CitationHotnessk=g (SiteHotnessk);
Wherein K is the set in all Intelligence Page sources for reprinting this i-th reprinting information;PublishTime is this i-th turn The issuing time of information carrying breath;F (PublishTime) is the time tune weight function on PublishTime, CitationHotnesskInformation is reprinted in k-th of reprinting reprinted in having the Intelligence Page source of the reprinting information for this i-th Weight, g (SiteHotnessk) it is on SiteHotnesskTemperature adjust weight function.
12. the hot information digging system according to any one of claim 7-10, it is characterised in that the system is further Including hot information display unit;
The hot information display unit, for showing the hot information determined from reprinting information.
CN201210079091.3A 2012-03-23 2012-03-23 A kind of hot information method for digging and system Active CN103324637B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201210079091.3A CN103324637B (en) 2012-03-23 2012-03-23 A kind of hot information method for digging and system
PCT/CN2013/073011 WO2013139290A1 (en) 2012-03-23 2013-03-21 Hot information mining method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210079091.3A CN103324637B (en) 2012-03-23 2012-03-23 A kind of hot information method for digging and system

Publications (2)

Publication Number Publication Date
CN103324637A CN103324637A (en) 2013-09-25
CN103324637B true CN103324637B (en) 2017-12-12

Family

ID=49193384

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210079091.3A Active CN103324637B (en) 2012-03-23 2012-03-23 A kind of hot information method for digging and system

Country Status (2)

Country Link
CN (1) CN103324637B (en)
WO (1) WO2013139290A1 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103714132B (en) * 2013-12-17 2017-12-26 北京本果信息技术有限公司 A kind of method and apparatus for being used to carry out focus incident excavation based on region and industry
CN105450608A (en) * 2014-08-28 2016-03-30 华为技术有限公司 Digital media content pushing method and digital media content pushing device
CN104408175B (en) * 2014-12-12 2017-11-10 北京奇虎科技有限公司 The method and apparatus for identifying type of webpage
CN104504059B (en) * 2014-12-22 2018-03-27 合一网络技术(北京)有限公司 Multimedia resource recommends method
CN105045890A (en) * 2015-07-29 2015-11-11 百度在线网络技术(北京)有限公司 Method and device for determining hot news in target news source
CN105630929B (en) * 2015-12-22 2019-08-30 北京奇虎科技有限公司 Based on the method and device for commenting on determining news recommendation weight
KR102580820B1 (en) * 2016-03-10 2023-09-20 에스케이하이닉스 주식회사 Data storage device and operating method thereof
CN105843963A (en) * 2016-04-19 2016-08-10 北京金山安全软件有限公司 Website selection method and server
CN106383919A (en) * 2016-11-21 2017-02-08 青岛农业大学 Method and system for determining news transmission effect
CN108255900A (en) 2017-03-22 2018-07-06 广州市动景计算机科技有限公司 Recommend news rendering method, equipment, browser and electronic equipment
CN108205589B (en) * 2017-12-29 2022-02-15 成都优易数据有限公司 Heat iterative calculation method
CN109145246A (en) * 2018-07-31 2019-01-04 成都华栖云科技有限公司 A kind of news virtual click amount implementation method based on paas media cloud multi-tenant platform
CN112202889B (en) * 2020-09-30 2023-05-23 深圳前海微众银行股份有限公司 Information pushing method, device and storage medium
CN114221988A (en) * 2021-11-03 2022-03-22 新浪网技术(中国)有限公司 Content distribution network hotspot analysis method and system
CN113987372B (en) * 2021-12-27 2022-03-18 昆仑智汇数据科技(北京)有限公司 Hot data acquisition method, device and equipment of domain business object model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127046A (en) * 2007-09-25 2008-02-20 腾讯科技(深圳)有限公司 Method and system for sequencing to blog article
CN101140587A (en) * 2007-10-15 2008-03-12 深圳市迅雷网络技术有限公司 Searching method and apparatus
CN101246498A (en) * 2008-03-27 2008-08-20 腾讯科技(深圳)有限公司 News web page searching method
CN101814171A (en) * 2009-02-24 2010-08-25 李晓萌 Media-oriented network influence index calculation method
US9418114B1 (en) * 2013-06-19 2016-08-16 Google Inc. Augmenting a content item using search results content
US9460198B1 (en) * 2012-07-26 2016-10-04 Google Inc. Process for serializing and deserializing data described by a schema

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409634B (en) * 2007-10-10 2011-04-13 中国科学院自动化研究所 Quantitative analysis tools and method for internet news influence based on information retrieval
CN101625693A (en) * 2009-08-10 2010-01-13 北京精讯云顿数据软件有限公司 Method and system of online article statistics

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127046A (en) * 2007-09-25 2008-02-20 腾讯科技(深圳)有限公司 Method and system for sequencing to blog article
CN101140587A (en) * 2007-10-15 2008-03-12 深圳市迅雷网络技术有限公司 Searching method and apparatus
CN101246498A (en) * 2008-03-27 2008-08-20 腾讯科技(深圳)有限公司 News web page searching method
CN101814171A (en) * 2009-02-24 2010-08-25 李晓萌 Media-oriented network influence index calculation method
US9460198B1 (en) * 2012-07-26 2016-10-04 Google Inc. Process for serializing and deserializing data described by a schema
US9418114B1 (en) * 2013-06-19 2016-08-16 Google Inc. Augmenting a content item using search results content

Also Published As

Publication number Publication date
CN103324637A (en) 2013-09-25
WO2013139290A1 (en) 2013-09-26

Similar Documents

Publication Publication Date Title
CN103324637B (en) A kind of hot information method for digging and system
US9898554B2 (en) Implicit question query identification
US8352455B2 (en) Processing a content item with regard to an event and a location
CN103294781B (en) A kind of method and apparatus for processing page data
CN101313330A (en) Selecting high quality reviews for display
CN109508373A (en) Calculation method, equipment and the computer readable storage medium of enterprise&#39;s public opinion index
CN106484829A (en) A kind of foundation of microblogging order models and microblogging diversity search method
US11789946B2 (en) Answer facts from structured content
CN106651312A (en) Intellectual property (IP) service management system
CN108920479B (en) Cross-information-source account recommendation method for two micro terminals
CN109978020A (en) A kind of social networks account vest identity identification method based on multidimensional characteristic
CN106874356A (en) Geographical location information management method and device
CN107862039A (en) Web data acquisition methods, system and Data Matching method for pushing
CN109885656A (en) Microblogging forwarding prediction technique and device based on quantization temperature
Zhang et al. An ensemble method for job recommender systems
CN106649749A (en) Chinese voice bit characteristic-based text duplication checking method
Lien A note on the relationship between the variability of the hedge ratio and hedging performance
de Moura et al. Using structural information to improve search in Web collections
CN110502680A (en) A kind of abstracting method and device of acceptance of the bid bulletin relevant field
CN106997340A (en) The generation of dictionary and the Document Classification Method and device using dictionary
CN106780199A (en) A kind of intellectual property evaluation system
Liang et al. Detecting novel business blogs
CN104391982B (en) Information recommendation method and information recommendation system
Kaiser Possibly vast greenhouse gas sponge ignites controversy
CN110598960B (en) Entity-level emotion assessment method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131021

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518044 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TA01 Transfer of patent application right

Effective date of registration: 20131021

Address after: 518057 Tencent Building, 16, Nanshan District hi tech park, Guangdong, Shenzhen

Applicant after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518044 Zhenxing Road, SEG Science Park 2 East Room 403

Applicant before: Tencent Technology (Shenzhen) Co., Ltd.

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant