CN102955804B - A kind of network word temperature defining method and device - Google Patents

A kind of network word temperature defining method and device Download PDF

Info

Publication number
CN102955804B
CN102955804B CN201110247837.2A CN201110247837A CN102955804B CN 102955804 B CN102955804 B CN 102955804B CN 201110247837 A CN201110247837 A CN 201110247837A CN 102955804 B CN102955804 B CN 102955804B
Authority
CN
China
Prior art keywords
time
page
distribution parameter
calculating
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110247837.2A
Other languages
Chinese (zh)
Other versions
CN102955804A (en
Inventor
田冬
张远
吴淑燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201110247837.2A priority Critical patent/CN102955804B/en
Publication of CN102955804A publication Critical patent/CN102955804A/en
Application granted granted Critical
Publication of CN102955804B publication Critical patent/CN102955804B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of network word temperature defining method and device: the network word X receiving user's input, obtain the page address and the issuing time that comprise the page of network word X; According to the Regional Distribution parameter of the page address computational grid word X got, according to the Annual distribution parameter of the issuing time computational grid word X got, according to the hot value of the Regional Distribution parameter calculated and Annual distribution parameter computational grid word X, show user.Apply scheme of the present invention, the accuracy of network word temperature determination result can be improved.

Description

Network word heat degree determination method and device
Technical Field
The invention relates to the internet technology, in particular to a method and a device for determining the popularity of network words.
Background
The heat degree of the network words is determined, and the method has important reference value for research, decision, management and service of related departments. The existing determination method mainly comprises the following steps: and inquiring an interface of a search engine, and calculating a parameter representing the heat degree of the network words according to the search times and the search result quantity of the network words in a certain mode, wherein the search times and the search result quantity are in direct proportion to the heat degree of the network words.
However, this method needs to depend on the input behavior of the user when using the search engine, and thus has certain subjectivity and sidedness, and is not accurate enough; moreover, the amount of search results can only reflect the frequency of occurrence of the network words, and cannot reflect information such as distribution conditions, which also results in inaccurate determination results, for example, if a network word only appears in a certain page or certain pages with a high frequency, but rarely or even does not appear in other pages, the popularity of the network word determined in the existing manner is also high.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for determining a popularity of a network word, which can improve the accuracy of a result of determining the popularity of the network word.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a network word heat determination method comprises the following steps:
receiving a network word X input by a user, and acquiring a page address and release time of a page comprising the network word X;
calculating the region distribution parameter of the network word X according to the acquired page address, calculating the time distribution parameter of the network word X according to the acquired release time, calculating the heat value of the network word X according to the region distribution parameter and the time distribution parameter, and displaying the heat value to a user.
A network word heat determination apparatus comprising:
the system comprises an application program interface API, a web page processing module and a web page processing module, wherein the application program interface API is used for receiving a web word X input by a user through a user interface and acquiring a page address and release time of a page including the web word X;
the popularity calculation module is used for calculating a regional distribution parameter of the network word X according to the acquired page address, calculating a time distribution parameter of the network word X according to the acquired release time, calculating a popularity value of the network word X according to the regional distribution parameter and the time distribution parameter, and displaying the popularity value of the network word X to a user through a user interface.
Therefore, by adopting the scheme of the invention, when the popularity of the network words is determined, the input behavior of the user when the search engine is used is not required to be relied on, and the regional distribution condition and the time distribution condition of the network words are fully considered, so that the determination result is more objective and comprehensive, and the accuracy of the determination result is further improved.
Drawings
FIG. 1 is a flow chart of an embodiment of the method of the present invention.
FIG. 2 is a schematic diagram of the structure of the device according to the present invention.
Detailed Description
Aiming at the problems in the prior art, the invention provides an improved network word heat determination scheme, which can improve the accuracy of the determination result.
In order to make the technical solution of the present invention clearer and more obvious, the solution of the present invention is further described in detail below by referring to the drawings and examples.
FIG. 1 is a flow chart of an embodiment of the method of the present invention. As shown in fig. 1, the method comprises the following steps:
step 11: receiving a network word X input by a user (the network word X is used for representing any network word input by the user), and acquiring a page address and a release time of a page comprising the network word X.
The page address refers to a Uniform Resource Locator (URL) of the page.
In the invention, a word bank and a web page text index bank need to be established, wherein, a series of network words are stored in the word bank, in the initial stage, the network words in the word bank can be manually input, and the web page text index bank stores the text content of each page captured from each website according to a certain mode, the page address and the release time of each text content. How to capture is the prior art, and in addition, which websites are captured and which pages in the websites are captured can be determined according to actual needs.
And then, segmenting each text content by using the network words stored in the word bank, namely if a certain network word stored in the word bank appears in a certain text content, identifying the network word in the text content by using a special symbol, wherein the identification is not limited as long as the identification can be carried out, and the text content before the segmentation is correspondingly replaced by the text content after the segmentation.
The contents stored in the word stock and the web page text index stock can be updated in real time, for example, after the word segmentation is performed on the text contents, a sequence formed by single words is selected, and if the occurrence frequency of a certain sequence is greater than a preset threshold value, the sequence is used as a new network word and is supplemented into the word stock.
In addition, the time for each network word to be added into the word stock can be simultaneously stored in the word stock, and the specific action will be introduced later.
And after receiving the network word X input by the user, inquiring the page address and the release time of the page comprising the network word X from the webpage text index library.
Step 12: and calculating a region distribution parameter of the network word X according to the acquired page address, calculating a time distribution parameter of the network word X according to the acquired release time, calculating a heat value of the network word X according to the region distribution parameter and the time distribution parameter, and displaying the heat value to a user.
In the invention, when the heat value of the network word X is calculated, two factors, namely the region distribution condition and the time distribution condition of the network word X are mainly considered, correspondingly, the region distribution parameter and the time distribution parameter of the network word X can be calculated, and the heat value is finally calculated by combining the contribution weight of the region distribution parameter and the time distribution parameter to the heat value.
In practical application, only the heat value corresponding to the current time can be displayed to the user, and the change trend of the heat value within a period of time can be displayed to the user.
1) In a first mode
Setting the current time as a reference time T; calculating the sum of the distances between any two page addresses in the designated page addresses, taking the calculation result as a region distribution parameter, wherein the designated page addresses are the page addresses of which the corresponding release time in the page addresses acquired in the step 11 is within the range of T-T1-T, and T1 is preset time; calculating the sum of absolute values of differences between each release time and T in the appointed release time, taking the calculation result as a time distribution parameter, and the appointed release time is the release time within the range of T-T1-T in the release times acquired in the step 11; and calculating the heat value of the network word X according to the region distribution parameter and the time distribution parameter, and displaying the heat value to the user.
The way of calculating the distance between any two page addresses may be: respectively acquiring the 1 st-k level domain names in each page address aiming at any two page addresses, wherein k is a positive integer larger than 1, if the level number of the domain name in one page address is less than the k level, 0 is used for complementing, and if the level number of the domain name is more than the k level, redundant domain names are discarded; sequentially comparing whether all levels of domain names in two page addresses are the same or not from the 1 st level of domain names, taking the weight corresponding to the first different level as the distance between the two page addresses, and taking 0 as the distance between the two page addresses if all levels of domain names are the same; the higher the level, the less the corresponding weight.
The first equation is further explained below by specific examples.
Suppose that m page addresses and m publication times are obtained for the network word X; moreover, for the page address, a general model is predefined (assuming that the value of k is 9): http:// pn2·pn1/pn3/pn4/pn5/pn6/pn7/pn8/pn9(ii) a Wherein p isn1Representing a primary domain name, pn2Representing a second-level domain name, and so on; for example, for a page address http:// labs.china mobile. com/news/12345.htm, wherein "labs" is a secondary domain name and "china mobile. com" is a primary domain name; and respectively setting a weight for each level of domain name, wherein the higher the level is, the smaller the weight is.
Setting the current time as a reference time T to obtain n page addresses with corresponding release time within the range of T-T1-T and n release time within the range of T-T1-T, wherein m and n are positive integers, and n is less than or equal to m.
Firstly, calculating a regional distribution parameter of a network word X, wherein the regional distribution parameter comprises the following steps:
a. obtaining a region distribution matrix according to the n page addresses and the general model:
p 11 p 12 p 13 p 14 p 15 p 16 p 17 p 18 p 19 p 21 p 22 p 23 p 24 p 25 p 26 p 27 p 28 p 29 · · · p n 1 p n 2 p n 3 p n 4 p n 5 p n 6 p n 7 p n 8 p n 9 ; - - - ( 1 )
b. calculating the distance between any two lines of elements in the region distribution matrix, and comparing p with the 1 st line and the 2 nd line as examples11And p21If not, taking the weight corresponding to the level 1 domain name as the distance between the two lines of elements, if so, comparing p12And p22If not, the corresponding weight of the 2 nd level domain name is determinedRe-defining the distance between the elements of the two rows, and if so, comparing p13And p23Whether the domain names are the same or not is judged, if not, the weight corresponding to the 3 rd level domain name is taken as the distance between the two rows of elements, and the like is repeated; if it is compared until p19And p29All the same, 0 is taken as the distance between the two rows of elements; namely, the method comprises the following steps:
wherein, WqRepresents a weight, f (R)ij) Representing the distance between the ith row element and the jth row element, i not being equal to j;
c. the distance between any two lines of elements calculated in the above manner is added, that is:
H d = Σ i = 1 n Σ j = i + 1 n f ( R ij ) ; - - - ( 3 )
Hdnamely the regional distribution parameters.
Then, calculating the time distribution parameters of the network word X, including:
a. obtaining a time distribution matrix according to the n release times:
[T1T2T3…Tn];(4)
b. calculating the sum of absolute difference values of each element in the time distribution matrix and the reference time T, namely:
H t = Σ i = 1 n | T i - T | ; - - - ( 5 )
Hti.e. the time distribution parameter.
In the known HdAnd HtThen, the heat value M of the network word X can be calculatedt
M t = r 1 H d + r 2 ( 1 - H t A * B ) - - - ( 6 )
Wherein r is1And r2Are all weight values, HdIs a regional distribution parameter, HtFor the time distribution parameter, a is the number of page addresses participating in the calculation of the heat value of this time, and B is equal to t 1.
It can be seen that HdAnd MtProportional ratio of HtAnd MtIs inversely proportional because of HdThe larger the distribution range of the network word X, the larger the heat value, and HtThe larger the distribution time of the network word X, the more dispersed the distribution time, the smaller the heat value should be.
2) Mode two
In the method, more than two reference times are set, a heat value is calculated for each reference time, each reference time is less than or equal to the current time, and the time intervals between every two adjacent reference times are the same; and drawing a heat value change trend graph according to the calculated heat values and the corresponding reference time of the heat values, and displaying the heat value change trend graph to a user.
The specific implementation mode can be as follows:
b1, setting the time for storing the network word X into the word stock as the initial reference time T;
b2, calculating the sum of the distances between any two page addresses in the designated page addresses, taking the calculation result as a region distribution parameter, wherein the designated page addresses are the page addresses of which the corresponding release time in the page addresses acquired in the step 11 is within the range of T-T1-T, and T1 is preset time;
calculating the sum of absolute values of differences between each release time and T in the appointed release time, taking the calculation result as a time distribution parameter, and the appointed release time is the release time within the range of T-T1-T in the release times acquired in the step 11;
calculating the heat value of the network word X according to the region distribution parameter and the time distribution parameter, and storing the calculated heat value and the corresponding reference time;
b3, making T equal to T + T2, T2 is a predetermined time length, and determining whether the new T is greater than the current time, if yes, executing step B4, otherwise, repeatedly executing step B2 according to the new T;
and B4, drawing a trend graph of the change of the heat value according to the stored heat values and the corresponding reference time, and displaying the trend graph to the user.
Wherein the heat value M t = r 1 H d + r 2 ( 1 - H t A * B ) ; - - - ( 6 )
r1And r2Are all weight values, HdIs a regional distribution parameter, HtFor the time distribution parameter, A is the number of page addresses participating in the current heat value calculation, and B is the larger of t1 and t 2.
It should be noted that the above-mentioned B1-B4 are only one possible implementation manner, and in practical applications, it is also possible to adopt other implementation manners, such as setting the initial reference time according to other principles, or setting the corresponding range at each calculation to T-T1-T + T1, etc.
The way of calculating the distance between any two page addresses in the second mode is the same as that in the first mode, and is not described again.
It can be seen that, compared to the first method, the second method needs to calculate a plurality of heat values, that is, one heat value corresponds to each reference time, and the first method needs to calculate only one heat value, that is, the heat value corresponding to the reference time as the current time.
After obtaining each heat value and the corresponding reference time, drawing a two-dimensional heat value change trend graph, specifically, the abscissa of the graph is the time, the ordinate is the heat value, a plurality of points (the same as the calculated heat value) are drawn in a coordinate system defined by the abscissa and the ordinate, and every two adjacent points can be respectively connected by a straight line, so that a user can intuitively know the heat value change condition of the network word X in the past period of time and can predict the future change condition of the network word X to a certain extent.
In practical application, different weights, i.e. different values of Page Rank (PR) can be set for different websites, wherein for more important websites, the PR value can be set higher, and for less important websites, the PR value can be set lower; in addition, in the first or second mode, after the distance between the two page addresses is obtained through calculation, the calculation result may be further multiplied by the sum of the PR values corresponding to the two page addresses, and the product is finally used as the distance between the two page addresses, where the PR value corresponding to the page address refers to the PR value corresponding to the website where the page address is located. In combination with the previous calculation of the heat value, it can be seen that if the network word X appears in two more important websites at the same time, the finally calculated heat value will be larger than that of two less important websites at the same time.
This completes the description of the method embodiment of the present invention.
Based on the above description, fig. 2 is a schematic structural diagram of an embodiment of the apparatus of the present invention. As shown in fig. 2, includes:
an Application Programming Interface (API) for receiving a network word X input by a user through a user interface, and acquiring a page address and publication time of a page including the network word X;
and the heat calculation module is used for calculating the region distribution parameter of the network word X according to the acquired page address, calculating the time distribution parameter of the network word X according to the acquired release time, calculating the heat value of the network word X according to the region distribution parameter and the time distribution parameter, and displaying the heat value to a user through a user interface.
The device shown in fig. 2 may further include:
the word bank is used for storing a series of network words;
the aggregation module is used for capturing the text content of the page in each website, storing the text content into the web page text index library, and correspondingly storing the page address and the release time of each text content;
the word segmentation module is used for segmenting each text content stored in the web page text index library by using the network words stored in the word library and correspondingly replacing the text content before the word segmentation by using the text content after the word segmentation;
the API queries the page address and the release time of the page comprising the network word X from the webpage text index library.
The word stock and the webpage text index stock both support real-time updating of self-stored contents.
The heat calculation module may specifically include (not shown in the drawings for simplicity):
a calculation unit for setting a current time as a reference time T; calculating the sum of the distances between any two page addresses in the designated page addresses, taking the calculation result as a regional distribution parameter, wherein the designated page addresses are the page addresses of which the corresponding release time is within the range of T-T1-T in the acquired page addresses, and T1 is preset time; calculating the sum of absolute values of differences between each piece of release time and T in the appointed release time, taking the calculation result as a time distribution parameter, and the appointed release time is the release time within the range of T-T1-T in the acquired release times; calculating the heat value of the network word X according to the region distribution parameter and the time distribution parameter;
and the processing unit is used for displaying the heat value calculated by the calculating unit to a user through a user interface.
Wherein the heat value M t = r 1 H d + r 2 ( 1 - H t A * B ) ; - - - ( 6 )
r1And r2Are all weight values, HdIs a regional distribution parameter, HtFor the time distribution parameter, a is the number of page addresses participating in the calculation of the heat value of this time, and B is equal to t 1.
Or,
the calculating unit is used for setting more than two reference times, respectively calculating a heat value aiming at each reference time, wherein each reference time is less than or equal to the current time, and the time intervals between every two adjacent reference times are the same;
and the processing unit is used for drawing a heat value change trend graph according to the calculated heat values and the corresponding reference time of the heat values and displaying the heat value change trend graph to a user through a user interface.
The word bank can further store the time for storing each network word into the word bank, correspondingly, the calculating unit calculates the sum of the distances between any two page addresses in the appointed page addresses, the calculation result is used as a regional distribution parameter, the appointed page addresses are the page addresses of which the corresponding release time in the acquired page addresses is within the range of T-T1-T, T is reference time, and T1 is preset time; calculating the sum of absolute values of differences between each piece of release time and T in the appointed release time, taking the calculation result as a time distribution parameter, and the appointed release time is the release time within the range of T-T1-T in the acquired release times; calculating the heat value of the network word X according to the region distribution parameter and the time distribution parameter, and storing the calculated heat value and the corresponding reference time; let T be T + T2, T2 be a predetermined time duration, and determine whether the new T is greater than the current time, if yes, inform the processing unit to execute the self-function, otherwise, repeatedly execute the self-function according to the new T; the initial reference time is the time when the network word X is stored in the thesaurus.
Heat value M t = r 1 H d + r 2 ( 1 - H t A * B ) ; - - - ( 6 )
r1And r2Are all weight values, HdIs a regional distribution parameter, HtFor the time distribution parameter, A is the number of page addresses participating in the current heat value calculation, and B is the larger of t1 and t 2.
In addition, aiming at any two page addresses, the calculation unit respectively acquires the 1 st-k level domain names in each page address, wherein k is a positive integer larger than 1, if the number of the domain names in one page address is less than k levels, 0 is used for complementing, if the number of the domain names is more than k levels, redundant domain names are abandoned, starting from the 1 st level domain name, whether the domain names at each level in the two page addresses are the same or not is sequentially compared, the weight corresponding to the first different level is used as the distance between the two page addresses, and if the domain names at each level are the same, 0 is used as the distance between the two page addresses; the higher the level, the less the corresponding weight.
Moreover, the calculating unit can be further used for acquiring a PR value corresponding to each page address; and multiplying the calculated distance between the two page addresses by the sum of the PR values corresponding to the two page addresses, and finally taking the product as the distance between the two page addresses.
For a specific work flow of the apparatus embodiment shown in fig. 2, please refer to the corresponding description in the method embodiment shown in fig. 1, which is not repeated herein.
The specific values of the weight, the threshold, the duration, and the like, which are related to the above embodiments, can be determined according to actual needs, and the present invention is not limited thereto.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (20)

1. A method for determining popularity of network words is characterized by comprising the following steps:
receiving a network word X input by a user, and acquiring a page address and release time of a page comprising the network word X;
calculating a region distribution parameter of the network word X according to the acquired page address, calculating a time distribution parameter of the network word X according to the acquired release time, calculating a heat value of the network word X according to the region distribution parameter and the time distribution parameter, and displaying the heat value to a user;
wherein, the calculating the popularity value of the network word X according to the regional distribution parameter and the time distribution parameter and displaying to the user comprises:
setting a reference time T;
calculating the sum of the distances between any two page addresses in the designated page addresses, and taking the calculation result as the regional distribution parameter, wherein the designated page addresses are the page addresses of which the corresponding release time is within the range of T-T1-T in the acquired page addresses, and T1 is preset time;
calculating the sum of absolute values of differences between each piece of distribution time and T in the appointed distribution time, and taking the calculation result as the time distribution parameter, wherein the appointed distribution time is the distribution time within the range of T-T1-T in the acquired distribution times;
and calculating the heat value of the network word X according to the region distribution parameter and the time distribution parameter, and displaying the heat value to a user.
2. The method of claim 1,
before the receiving the network word X input by the user, the method further includes: establishing a word bank and a webpage text index bank, wherein a series of network words are stored in the word bank; capturing the text content of the page in each website, storing the text content into the web page text index library, correspondingly storing the page address and the release time of each text content, segmenting each text content by using the network words stored in the word library, and correspondingly replacing the text content before segmentation by using the text content after segmentation;
the acquiring the page address and the release time of the page including the network word X comprises: and inquiring the page address and the release time of the page comprising the network word X from the webpage text index library.
3. The method of claim 2, further comprising: and updating the contents stored in the word bank and the webpage text index bank in real time.
4. The method of claim 1, wherein:
the current time is set as the reference time T.
5. The method according to claim 4, wherein the calculating the heat value of the network word X according to the regional distribution parameter and the time distribution parameter comprises:
calculating heat value M t = r 1 H d + r 2 ( 1 - H t A * B ) ;
Wherein, r is1And r2Are all weight values, said HdAs a regional distribution parameter, said HtFor the time distribution parameter, a is the number of page addresses participating in the calculation of the heat value of this time, and B is equal to t 1.
6. The method of claim 3, wherein:
setting more than two reference times, respectively calculating a heat value aiming at each reference time, wherein each reference time is less than or equal to the current time, and the time lengths of the intervals between every two adjacent reference times are the same;
and drawing a heat value change trend graph according to the calculated heat values and the corresponding reference time of the heat values, and displaying the heat value change trend graph to a user.
7. The method of claim 6, wherein the thesaurus further stores a time for each network word to be stored in the thesaurus; the setting of more than two reference times, the calculation of a heat value for each reference time, the drawing of a heat value change trend graph according to each calculated heat value and the corresponding reference time, and the display to the user comprises:
b1, setting the time for storing the network word X into the word stock as an initial reference time T;
b2, calculating the sum of the distances between any two page addresses in the specified page addresses, and taking the calculation result as the regional distribution parameter, wherein the specified page addresses are the page addresses of which the corresponding release time is within the range of T-T1-T in the acquired page addresses, and T1 is preset time length;
calculating the sum of absolute values of differences between each piece of distribution time and T in the appointed distribution time, and taking the calculation result as the time distribution parameter, wherein the appointed distribution time is the distribution time within the range of T-T1-T in the acquired distribution times;
calculating the heat value of the network word X according to the region distribution parameter and the time distribution parameter, and storing the calculated heat value and the corresponding reference time;
b3, making T equal to T + T2, T2 is a predetermined time length, and determining whether the new T is greater than the current time, if yes, executing step B4, otherwise, repeatedly executing step B2 according to the new T;
and B4, drawing a trend graph of the change of the heat value according to the stored heat values and the corresponding reference time, and displaying the trend graph to the user.
8. The method according to claim 7, wherein the calculating the heat value of the network word X according to the regional distribution parameter and the time distribution parameter comprises:
calculating heat value M t = r 1 H d + r 2 ( 1 - H t A * B ) ;
Wherein, r is1And r2Are all weight values, said HdAs a regional distribution parameter, said HtFor the time distribution parameter, A is the number of page addresses participating in the current heat value calculation, and B is the larger of t1 and t 2.
9. The method of claim 4 or 7, wherein calculating the distance between any two page addresses comprises:
respectively acquiring the 1 st-k level domain names in each page address aiming at any two page addresses, wherein k is a positive integer larger than 1, if the level number of the domain name in one page address is less than the k level, 0 is used for complementing, and if the level number of the domain name is more than the k level, redundant domain names are discarded;
sequentially comparing whether all levels of domain names in two page addresses are the same or not from the 1 st level of domain names, taking the weight corresponding to the first different level as the distance between the two page addresses, and taking 0 as the distance between the two page addresses if all levels of domain names are the same; the higher the level, the less the corresponding weight.
10. The method of claim 9, further comprising:
acquiring a webpage level PR value corresponding to each page address;
and multiplying the calculated distance between the two page addresses by the sum of the PR values corresponding to the two page addresses, and finally taking the product as the distance between the two page addresses.
11. A network word heat determination apparatus, comprising:
the system comprises an application program interface API, a web page processing module and a web page processing module, wherein the application program interface API is used for receiving a web word X input by a user through a user interface and acquiring a page address and release time of a page including the web word X;
the popularity calculation module is used for calculating a regional distribution parameter of the network word X according to the acquired page address, calculating a time distribution parameter of the network word X according to the acquired release time, calculating a popularity value of the network word X according to the regional distribution parameter and the time distribution parameter, and displaying the popularity value of the network word X to a user through a user interface;
wherein, the calculating the popularity value of the network word X according to the regional distribution parameter and the time distribution parameter and displaying to the user comprises:
setting a reference time T;
calculating the sum of the distances between any two page addresses in the designated page addresses, and taking the calculation result as the regional distribution parameter, wherein the designated page addresses are the page addresses of which the corresponding release time is within the range of T-T1-T in the acquired page addresses, and T1 is preset time;
calculating the sum of absolute values of differences between each piece of distribution time and T in the appointed distribution time, and taking the calculation result as the time distribution parameter, wherein the appointed distribution time is the distribution time within the range of T-T1-T in the acquired distribution times;
and calculating the heat value of the network word X according to the region distribution parameter and the time distribution parameter, and displaying the heat value to a user.
12. The apparatus of claim 11, further comprising:
the word bank is used for storing a series of network words;
the aggregation module is used for capturing the text content of the page in each website, storing the text content into the web page text index library, and correspondingly storing the page address and the release time of each text content;
the word segmentation module is used for segmenting each text content stored in the webpage text index library by using the network words stored in the word library, and correspondingly replacing the text content before segmentation by using the text content after segmentation;
and the API queries the page address and the release time of the page comprising the network word X from the webpage text index library.
13. The apparatus of claim 12, wherein the thesaurus and the web page text index database both support real-time updating of their own saved content.
14. The apparatus of claim 11, wherein the heat calculation module comprises:
a calculation unit for setting a current time as a reference time T; calculating the sum of the distances between any two page addresses in the designated page addresses, and taking the calculation result as the regional distribution parameter, wherein the designated page addresses are the page addresses of which the corresponding release time is within the range of T-T1-T in the acquired page addresses, and T1 is preset time; calculating the sum of absolute values of differences between each piece of distribution time and T in the appointed distribution time, and taking the calculation result as the time distribution parameter, wherein the appointed distribution time is the distribution time within the range of T-T1-T in the acquired distribution times; calculating the heat value of the network word X according to the region distribution parameter and the time distribution parameter;
and the processing unit is used for displaying the heat value calculated by the calculating unit to a user through a user interface.
15. The apparatus of claim 14,
the heat value M t = r 1 H d + r 2 ( 1 - H t A * B ) ;
Wherein, r is1And r2Are all weight values, said HdAs a regional distribution parameter, said HtFor the time distribution parameter, a is the number of page addresses participating in the calculation of the heat value of this time, and B is equal to t 1.
16. The apparatus of claim 13, wherein the heat calculation module comprises:
the calculating unit is used for setting more than two reference times, respectively calculating a heat value aiming at each reference time, wherein each reference time is less than or equal to the current time, and the time intervals between every two adjacent reference times are the same;
and the processing unit is used for drawing a heat value change trend graph according to the calculated heat values and the corresponding reference time of the heat values and displaying the heat value change trend graph to a user through a user interface.
17. The apparatus of claim 16, wherein the thesaurus further stores a time for each network word to be stored in the thesaurus;
the calculation unit calculates the sum of the distances between any two page addresses in the designated page addresses, and takes the calculation result as the regional distribution parameter, wherein the designated page addresses are the page addresses of which the corresponding issuing time is within the range of T-T1-T in the acquired page addresses, T is reference time, and T1 is preset time; calculating the sum of absolute values of differences between each piece of distribution time and T in the appointed distribution time, and taking the calculation result as the time distribution parameter, wherein the appointed distribution time is the distribution time within the range of T-T1-T in the acquired distribution times; calculating the heat value of the network word X according to the region distribution parameter and the time distribution parameter, and storing the calculated heat value and the corresponding reference time; let T be T + T2, T2 be a predetermined time duration, and determine whether the new T is greater than the current time, if yes, inform the processing unit to execute the self-function, otherwise, repeatedly execute the self-function according to the new T; the initial reference time is the time when the network word X is stored in the thesaurus.
18. The apparatus of claim 17,
the heat value M t = r 1 H d + r 2 ( 1 - H t A * B ) ;
Wherein, r is1And r2Are all weight values, said HdAs a regional distribution parameter, said HtFor the time distribution parameter, A is the number of page addresses participating in the current heat value calculation, and B is the larger of t1 and t 2.
19. The device according to claim 14 or 17, wherein for any two page addresses, the computing unit obtains the domain names of levels 1 to k in each page address, where k is a positive integer greater than 1, and if the number of the domain names in a page address is less than k, the domain names are complemented with 0, and if the number of the domain names is greater than k, the redundant domain names are discarded, starting from the level 1 domain name, comparing whether the domain names of the levels in the two page addresses are the same in sequence, and using the weight corresponding to the first different level as the distance between the two page addresses, and if the domain names of the levels are the same, using 0 as the distance between the two page addresses; the higher the level, the less the corresponding weight.
20. The apparatus of claim 19, wherein the computing unit is further configured to obtain a page level PR value corresponding to each page address; and multiplying the calculated distance between the two page addresses by the sum of the PR values corresponding to the two page addresses, and finally taking the product as the distance between the two page addresses.
CN201110247837.2A 2011-08-25 2011-08-25 A kind of network word temperature defining method and device Expired - Fee Related CN102955804B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110247837.2A CN102955804B (en) 2011-08-25 2011-08-25 A kind of network word temperature defining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110247837.2A CN102955804B (en) 2011-08-25 2011-08-25 A kind of network word temperature defining method and device

Publications (2)

Publication Number Publication Date
CN102955804A CN102955804A (en) 2013-03-06
CN102955804B true CN102955804B (en) 2016-03-02

Family

ID=47764616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110247837.2A Expired - Fee Related CN102955804B (en) 2011-08-25 2011-08-25 A kind of network word temperature defining method and device

Country Status (1)

Country Link
CN (1) CN102955804B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309189B (en) * 2018-03-13 2023-04-18 深圳市腾讯计算机系统有限公司 Method and device for acquiring heat of entity words

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101393566A (en) * 2008-11-17 2009-03-25 北京交通大学 Information tracking and detecting method and system based on network structure user pattern of behavior
CN101477556A (en) * 2009-01-22 2009-07-08 苏州智讯科技有限公司 Method for discovering hot sport in internet mass information
EP2211282A2 (en) * 2009-01-27 2010-07-28 Palo Alto Research Center Incorporated System and method for managing user attention by detecting hot and cold topics in social indexes
CN101923544A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101393566A (en) * 2008-11-17 2009-03-25 北京交通大学 Information tracking and detecting method and system based on network structure user pattern of behavior
CN101477556A (en) * 2009-01-22 2009-07-08 苏州智讯科技有限公司 Method for discovering hot sport in internet mass information
EP2211282A2 (en) * 2009-01-27 2010-07-28 Palo Alto Research Center Incorporated System and method for managing user attention by detecting hot and cold topics in social indexes
CN101923544A (en) * 2009-06-15 2010-12-22 北京百分通联传媒技术有限公司 Method for monitoring and displaying Internet hot spots

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Topic Extraction from News Archive Using TF*PDF Algorithm;Khoo Khyou Bun 等;《Web Information Systems Engineering, International Conference on 2002》;20021231;全文 *
网络热点事件发现系统的设计;刘星星 等;《中文信息学报》;20081115;第22卷(第6期);全文 *
网络热点信息识别方法研究;邓爱萍;《微计算机信息》;20100705;第26卷(第19期);全文 *

Also Published As

Publication number Publication date
CN102955804A (en) 2013-03-06

Similar Documents

Publication Publication Date Title
US10204145B2 (en) Systems and methods for re-ranking ranked search results
JP5436665B2 (en) Classification of simultaneously selected images
US9569499B2 (en) Method and apparatus for recommending content on the internet by evaluating users having similar preference tendencies
CN105247507A (en) Influence score of a brand
KR101330273B1 (en) Context based resource relevance
CN107122467B (en) Search engine retrieval result evaluation method and device and computer readable medium
JP5916959B2 (en) Dynamic data acquisition method and system
TW201327233A (en) Personalized information pushing method and device
WO2014107682A1 (en) Method and apparatus for generating webpage content
RU2015142105A (en) CLASSIFICATION OF DOCUMENTS USING MULTILEVEL TEXT SIGNATURES
CN106462583A (en) Systems and methods for rapid data analysis
WO2015039165A1 (en) Improvements in website traffic optimization
CN103617213B (en) Method and system for identifying newspage attributive characters
CN104899236B (en) A kind of comment information display methods, apparatus and system
JP2011227721A (en) Interest extraction device, interest extraction method, and interest extraction program
CN106776609A (en) Reprint the statistical method and device of quantity in website
CN108664605B (en) Model evaluation method and system
CN112232933A (en) House source information recommendation method, device, equipment and readable storage medium
CN103617146B (en) A kind of machine learning method and device based on hardware resource consumption
US10296924B2 (en) Document performance indicators based on referral context
WO2015149550A1 (en) Method and apparatus for determining grades of links within website
CN111259274A (en) Information processing method, device, equipment and information display device
CN102955804B (en) A kind of network word temperature defining method and device
CN107729344B (en) Website data crawling method and device, computer equipment and readable storage medium
CN113553477B (en) Graph splitting method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160302