CN108710664A - A kind of hot word analysis method, computer readable storage medium and terminal device - Google Patents
A kind of hot word analysis method, computer readable storage medium and terminal device Download PDFInfo
- Publication number
- CN108710664A CN108710664A CN201810456973.4A CN201810456973A CN108710664A CN 108710664 A CN108710664 A CN 108710664A CN 201810456973 A CN201810456973 A CN 201810456973A CN 108710664 A CN108710664 A CN 108710664A
- Authority
- CN
- China
- Prior art keywords
- exposure
- hot word
- text message
- sequence
- threshold
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to a kind of field of computer technology more particularly to hot word analysis method, computer readable storage medium and terminal devices.The method crawls the webpage issued on targeted website in the current statistic period by search engine;Cutting word processing is carried out to the text message in the webpage, obtains each participle for constituting the text message;Count the exposure frequency of each participle in the text message;The participle that the exposure frequency in the text message is more than to preset first threshold exposure is determined as hot word;Count the exposure frequency of each enterprise name in preferred text message;The degree of association between each enterprise name and the hot word is calculated according to the exposure frequency of each enterprise name in the preferred text message.The present invention provides a set of objective assessment standard for the determination of hot word, and after obtaining hot word, has considered the relationship between enterprise and hot word, and analysis result has stronger directive significance for enterprise.
Description
Technical field
The invention belongs to field of computer technology more particularly to a kind of hot word analysis method, computer readable storage mediums
And terminal device.
Background technology
Hot word, i.e. network hot topic vocabulary refer to a kind of vocabulary phenomenon, reflect a country, an area at one
Phase people's question of common concern and things.Hot word has characteristics of the times, can be as the much-talked-about topic and the people's livelihood in a period
The representative of problem.
At present for the determination of hot word, mainly by network analysis personnel according to oneself browsed to information on the internet
Handle obtained, judgement of this mode dependent on network analysis personnel individual, subjectivity is extremely strong, it is difficult to objectively anti-
True hot word situation is answered, and after obtaining hot word, is only often to carry out unilateral analysis just for hot word, analysis dimension
Spend single, analysis result is very poor for the directive significance of enterprise.
Invention content
In view of this, an embodiment of the present invention provides a kind of hot word analysis method, computer readable storage medium and terminals
Equipment, the determination process subjectivity to solve hot word in the prior art is extremely strong and analysis result is very poor for the directive significance of enterprise
The problem of.
The first aspect of the embodiment of the present invention provides a kind of hot word analysis method, may include:
The webpage issued on targeted website in the current statistic period is crawled by search engine, the targeted website is clear
The amount of looking at is more than the website of preset pageview threshold value;
Cutting word processing is carried out to the text message in the webpage, obtains each participle for constituting the text message;
Count the exposure frequency of each participle in the text message;
The participle that the exposure frequency in the text message is more than to preset first threshold exposure is determined as hot word;
The exposure frequency of each enterprise name in preferred text message is counted, the preferred text message is comprising described
The text message of hot word;
According to the exposure frequency of each enterprise name in the preferred text message calculate each enterprise name with it is described
The degree of association between hot word.
The second aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage
Media storage has computer-readable instruction, the computer-readable instruction to realize following steps when being executed by processor:
The webpage issued on targeted website in the current statistic period is crawled by search engine, the targeted website is clear
The amount of looking at is more than the website of preset pageview threshold value;
Cutting word processing is carried out to the text message in the webpage, obtains each participle for constituting the text message;
Count the exposure frequency of each participle in the text message;
The participle that the exposure frequency in the text message is more than to preset first threshold exposure is determined as hot word;
The exposure frequency of each enterprise name in preferred text message is counted, the preferred text message is comprising described
The text message of hot word;
According to the exposure frequency of each enterprise name in the preferred text message calculate each enterprise name with it is described
The degree of association between hot word.
The third aspect of the embodiment of the present invention provide a kind of hot word analysing terminal equipment, including memory, processor with
And it is stored in the computer-readable instruction that can be run in the memory and on the processor, described in the processor execution
Following steps are realized when computer-readable instruction:
The webpage issued on targeted website in the current statistic period is crawled by search engine, the targeted website is clear
The amount of looking at is more than the website of preset pageview threshold value;
Cutting word processing is carried out to the text message in the webpage, obtains each participle for constituting the text message;
Count the exposure frequency of each participle in the text message;
The participle that the exposure frequency in the text message is more than to preset first threshold exposure is determined as hot word;
The exposure frequency of each enterprise name in preferred text message is counted, the preferred text message is comprising described
The text message of hot word;
According to the exposure frequency of each enterprise name in the preferred text message calculate each enterprise name with it is described
The degree of association between hot word.
Existing advantageous effect is the embodiment of the present invention compared with prior art:The embodiment of the present invention is drawn by search first
The webpage for crawling and being issued on targeted website in the current statistic period is held up, the text message in the webpage is carried out at cutting word
Reason, obtains each participle for constituting the text message, then counts the exposure frequency of each participle in the text message,
The participle that the exposure frequency in the text message is more than to preset first threshold exposure is determined as hot word, and finally statistics is each
The exposure frequency of a enterprise name in preferred text message, according to exposure of each enterprise name in the preferred text message
Optical frequency time calculates the degree of association between each enterprise name and the hot word.Through the embodiment of the present invention, on the one hand, for hot word
Determination provides a set of objective assessment standard, has broken away from the dependence to network analysis personnel profile, the hot word determined
It more can be difficult to objectively react true situation, and after obtaining hot word, consider between enterprise and hot word
Relationship, analysis result for enterprise have stronger directive significance.
Description of the drawings
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only the present invention some
Embodiment for those of ordinary skill in the art without having to pay creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is a kind of one embodiment flow chart of hot word analysis method in the embodiment of the present invention;
Fig. 2 is a kind of exemplary flow of the setting up procedure of the first threshold exposure and the second threshold exposure in specific implementation
Figure;
Exemplary flows of the Fig. 3 for the setting up procedure of the first threshold exposure and the second threshold exposure in another implement
Figure;
Fig. 4 is a kind of one embodiment structure chart of hot word analytical equipment in the embodiment of the present invention;
Fig. 5 is a kind of schematic block diagram of hot word analysing terminal equipment in the embodiment of the present invention.
Specific implementation mode
In order to make the invention's purpose, features and advantages of the invention more obvious and easy to understand, below in conjunction with the present invention
Attached drawing in embodiment, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that disclosed below
Embodiment be only a part of the embodiment of the present invention, and not all embodiment.Based on the embodiments of the present invention, this field
All other embodiment that those of ordinary skill is obtained without making creative work, belongs to protection of the present invention
Range.
Referring to Fig. 1, a kind of one embodiment of hot word analysis method may include in the embodiment of the present invention:
Step S101, the webpage issued on targeted website in the current statistic period is crawled by search engine.
The targeted website is the website that pageview is more than preset pageview threshold value, can will be described according to actual conditions
Pageview threshold value is set as 100,000 times, 500,000 times, 1,000,000 times etc., and the targeted website can be Baidu news (http://
News.baidu.com/), Netease's news (http://news.163.com/), Tencent news (http://
News.qq.com/), phoenix news (http://news.ifeng.com/) etc. news websites or other news website.
Measurement period can be set as one day, one week, two weeks or one month etc. according to actual conditions.
Step S102, cutting word processing is carried out to the text message in the webpage, obtains constituting each of the text message
A participle.
Cutting word processing refers to that a statement text is cut into individual word one by one namely each participle,
In the present embodiment, cutting can be carried out to statement text according to universaling dictionary, it is normal vocabulary, such as word to ensure the word separated all
Language does not separate individual character then in dictionary.
Step S103, the exposure frequency of each participle in the text message is counted.
Namely each number for segmenting and occurring in the text message is counted respectively.
Step S104, participle that the exposure frequency in the text message is more than to preset first threshold exposure determines
For hot word.
For example, it is 10000 that first threshold exposure, which can be arranged,.
Optionally, can also by the exposure frequency in the text message be less than or equal to first threshold exposure and
Participle more than preset second threshold exposure is determined as candidate participle, and each candidate point is then obtained from historical statistics record
The exposure frequency of the word in the T measurement period before the current statistic period will meet the candidate of following conditions and segment
It is determined as hot word:
For the value of arbitrary t, inequalitySet up.
Wherein, n is the serial number of the candidate participle, and 1≤n≤N, N are the sum of the candidate participle, and t is each statistics
The serial number that period is arranged in order according to chronological order, 1≤t≤T, T are positive integer, ExpNumn,tFor n-th of candidate participle
The exposure frequency in t-th of measurement period, ExpNumn,T+1For exposure of n-th of candidate participle within the current statistic period
Optical frequency time, ln are natural logrithm function, and ThreshRatio is preset proportion threshold value.
For example, it is 2000 that second threshold exposure, which can be arranged, the fractional threshold is 2, and T=1.If uniting currently
It counts in the period, the exposure frequency of " Xiong Anxinqu " is 9000, is less than first threshold exposure, but is greater than second exposure
Threshold value then obtains the exposure frequency in its 1 before the current statistic period measurement period, if the exposure in a upper measurement period
Optical frequency time is 1000, inequalityIt sets up, then it is also determined as hot word.
If within the current statistic period, the light exposure of " artificial intelligence " is 1500 times, is not only smaller than first threshold exposure
Value, and be less than second threshold exposure, then directly it is confirmed as common words.
Further, the hot word determined can also be filtered, namely interference will be generated from the hot word determined
Word filter out, better common interference hot word can be pre-set, for example, as " we ", " everybody ", " this " etc.
Deng.The exposure frequency of these interference hot words has no any relationship with news content, namely regardless of what news content is, these are dry
First threshold exposure may be all remained above by disturbing the exposure frequency of hot word.When doing hot word statistics, if not dry to these
Disturb hot word and be filtered processing, then can impact analysis result accuracy, thus need interference is filtered out from the hot word determined
Hot word obtains filtered hot word, namely obtains really necessary hot word.Specifically, after determining hot word, can again from
Preset interference hot word is obtained in data list, then, one by one with all interference hot words by all hot words determined
Comparison is filtered out if some hot word is consistent with some interference hot word, otherwise, if some hot word is dry with any one
It is all inconsistent to disturb hot word, then retains the hot word, the hot word being finally retained is filtered hot word.
Step S105, the exposure frequency of each enterprise name in preferred text message is counted.
The preferred text message is the text message for including the hot word.
Optionally, the current statistic period may include M subcycle, wherein M is positive integer, then needs to count each
The exposure frequency of the enterprise name in the preferred text message of each sub- period.
Step S106, the exposure frequency according to each enterprise name in the preferred text message calculates each enterprise's name
Claim the degree of association between the hot word.
Specifically, the degree of association between each enterprise name and the hot word can be calculated according to the following formula:
Wherein, q be enterprise name serial number, 1≤q≤Q, Q be enterprise name sum, p be hot word serial number, 1≤p≤
P, P is the sums of hot word, the serial number that m is arranged in order for each sub- period according to chronological order, 1≤m≤M,
EntExpNumq,p,mFor exposure of q-th of enterprise name in the preferred text message comprising p-th of hot word in m-th of sub- period
Optical frequency time, kmFor preset weight coefficient, km<km+1AndRelq,pFor q-th enterprise name and p-th hot word it
Between the degree of association.
It distinguishingly, can be with if the degree of association between enterprise name A and the hot word B is more than preset degree of association threshold value
Think that the two is unique match, the degree of association threshold value can be set as 80%, 90% or 95% etc. according to actual conditions
Deng.
For example, to carry out interindustrial relations analysis to hot word " king's honor ", then in the text message comprising the hot word
Searching enterprise title counts the exposure frequency of each enterprise name, and calculates each enterprise name and the heat according to above-mentioned formula
The degree of association between word, if relational degree taxis result first is Tencent, and its degree of association between the hot word is 98%, is surpassed
The degree of association threshold value is crossed, it is determined that the enterprise name with hot word " king's honor " unique match is Tencent.
It is that corresponding enterprise is associated with by hot word above, another association angle is to be associated with corresponding hot word by enterprise.
Specifically, there is the text message of the enterprise name in netpage search, then searched in the text message for the enterprise occur
Hot word counts the frequency that each hot word occurs, and is ranked up to hot word according to the sequence of the frequency from big to small, and sequence is more forward
Hot word and the degree of association of the enterprise it is higher, sequence hot word more rearward is lower with the degree of association of the enterprise.
For example, to carry out hot word association analysis to Tencent, then there is the text message of Tencent in netpage search, then
Hot word is searched in these text messages, counts the frequency that each hot word occurs, and according to the sequence of the frequency from big to small to heat
Word is ranked up, if ranking results are followed successively by from big to small:" king's honor ", " seeking survival danger spot ", " Missions " ..., then may be used
Determine currently to be respectively " king's honor ", " seeking survival danger spot ", " Missions " ... with the highest hot word of Tencent's degree of association.
Further, when carrying out interindustrial relations analysis, enterprise name should including its nickname, for example, to Tencent into
It when row association analysis, not only to search for " Tencent ", also need search " Tencent " " goose factory " etc., Alibaba is associated point
It when analysis, not only to search for " Alibaba ", also need search " Alibaba " " Ali " etc..Specifically, enterprise's name can be pre-set
The nickname list of title, records the correspondence between the formal name of enterprise and nickname, when carrying out interindustrial relations analysis, from this
The corresponding nickname of enterprise is obtained in list, which is also included in the statistic processes to enterprise.
In one kind of the embodiment of the present invention in the specific implementation, setting for first threshold exposure and second threshold exposure
The process of setting may include step as shown in Figure 2:
Step S201, the exposure frequency of each history hot word in each measurement period is obtained from historical statistics record.
The history hot word is the hot word being had determined before the current statistic period.
Step S202, the first exposure sequence of each history hot word is constructed.
Specifically, the first exposure sequence of each history hot word can be constructed according to the following formula:
ExpSeq1nh={ HsExpNumnh,1,HsExpNumnh,2,......,HsExpNumnh,th,......,
HsExpNumnh,THnh}
Wherein, nh is the serial number of the history hot word, and 1≤nh≤NH, NH are the sum of the history hot word, and th is each
The serial number that measurement period is arranged in order according to chronological order, 1≤th≤THnh, THnhFor the statistics of n-th h history hot word
The sum in period, HsExpNumnh,thFor the exposure frequency of n-th h history hot word in the th measurement period, ExpSeq1nh
For the first exposure sequence of n-th h history hot word.
Step S203, the mean value of each first exposure sequence is calculated.
Specifically, the mean value of each first exposure sequence can be calculated according to the following formula:
Wherein, AvExpSeq1nhFor the mean value of the n-th h first exposure sequence.
Step S204, the sequence that the mean value of each first exposure sequence is arranged in order according to sequence from big to small is constructed.
Specifically, the mean value that can construct each first exposure sequence according to the following formula is arranged successively according to sequence from big to small
The sequence of row:
{AvExpSeq11′,AvExpSeq12′,......,AvExpSeq1nh1′,......,AvExpSeq1NH′}
Wherein, AvExpSeq1nh1' for according to the first exposure sequence being arranged sequentially on the n-th positions h1 from big to small
Mean value, 1≤nh1≤NH.
Step S205, first threshold exposure and second threshold exposure are calculated.
Specifically, first threshold exposure can be calculated according to the following formula:
Wherein, NMAX=floor (ξmax× NH), ξmaxFor preset coefficient, and 0<ξmax<1, floor is downward value
Function, Threshold1 are first threshold exposure;
Second threshold exposure is calculated according to the following formula:
Wherein, NMIN=floor (ξmin× NH), ξminFor preset coefficient, and 0<ξmin<1, Threshold2 is described
Second threshold exposure.
The embodiment of the present invention another kind in the specific implementation, first threshold exposure and second threshold exposure
Setting up procedure may include step as shown in Figure 3:
Step S301, the exposure frequency of each history hot word in each measurement period is obtained from historical statistics record.
Step S302, the first exposure sequence of each history hot word is constructed.
Step S303, the mean value of each first exposure sequence is calculated.
Wherein, the process of step S301- steps S303 is identical as the process of step S201- steps S203, specifically can refer to
Above description, details are not described herein.
Step S304, the second exposure sequence of each history hot word is constructed.
Specifically, the second exposure sequence of each history hot word can be constructed according to the following formula:
ExpSeq2nh={ HsExpNumnh,1′,HsExpNumnh,2′,......,HsExpNumnh,th1′,......,
HsExpNumnh,THnh′}
Wherein, HsExpNumnh,th1′∈ExpSeq1nh, 1≤th1≤THnh, HsExpNumnh,th1′≥
HsExpNumnh,th1+1', ExpSeq2nhFor the second exposure sequence of n-th h history hot word.
Step S305, the mean value of each second exposure sequence is calculated.
Specifically, the mean value of each second exposure sequence can be calculated according to the following formula:
Wherein, AvExpSeq2nhFor the mean value of the n-th h second exposure sequence, TH1nhMeet the following conditions:
HsExpNumnh,TH1′≥AvExpSeq1nhAnd HsExpNumnh,TH1+1′<AvExpSeq1nh。
Step S306, the sequence that the mean value of each second exposure sequence is arranged in order according to sequence from big to small is constructed.
Specifically, the mean value that can construct each second exposure sequence according to the following formula is arranged successively according to sequence from big to small
The sequence of row:
{AvExpSeq21′,AvExpSeq22′,......,AvExpSeq2nh1′,......,AvExpSeq2NH′}
Wherein, AvExpSeq2nh1' for according to the second exposure sequence being arranged sequentially on the n-th positions h1 from big to small
Mean value.
Step S307, first threshold exposure and second threshold exposure are calculated.
Specifically, first threshold exposure can be calculated according to the following formula:
Second threshold exposure is calculated according to the following formula:
It is sent out on targeted website in conclusion the embodiment of the present invention is crawled by search engine in the current statistic period first
The webpage of cloth carries out cutting word processing to the text message in the webpage, obtains each participle for constituting the text message, so
The exposure frequency of each participle in the text message is counted afterwards, the exposure frequency in the text message is more than default
The participle of the first threshold exposure be determined as hot word, finally count exposure frequency of each enterprise name in preferred text message
It is secondary, according to the exposure frequency of each enterprise name in the preferred text message calculate each enterprise name and the hot word it
Between the degree of association.Through the embodiment of the present invention, on the one hand, provide a set of objective assessment standard for the determination of hot word, break away from
Dependence to network analysis personnel profile, the hot word determined more can be difficult to objectively react true situation,
And after obtaining hot word, the relationship between enterprise and hot word has been considered, analysis result has enterprise stronger
Directive significance.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
Corresponding to a kind of hot word analysis method described in foregoing embodiments, Fig. 4 shows provided in an embodiment of the present invention one
One embodiment structure chart of kind hot word analytical equipment.
In the present embodiment, a kind of hot word analytical equipment may include:
Web page crawl module 401 is issued in the current statistic period on targeted website for being crawled by search engine
Webpage, the targeted website are the website that pageview is more than preset pageview threshold value;
Cutting word processing module 402 obtains constituting the text for carrying out cutting word processing to the text message in the webpage
Each participle of this information;
First statistical module 403, for counting the exposure frequency of each participle in the text message;
First hot word determining module 404 is exposed for the exposure frequency in the text message to be more than preset first
The participle of photo threshold is determined as hot word;
Second statistical module 405, it is described excellent for counting the exposure frequency of each enterprise name in preferred text message
It is the text message for including the hot word to select text message;
Calculation of relationship degree module 406, for the exposure frequency according to each enterprise name in the preferred text message
Calculate the degree of association between each enterprise name and the hot word.
Further, the hot word analytical equipment can also include:
Candidate's participle determining module, exposes for the exposure frequency in the text message to be less than or equal to described first
Photo threshold and it is determined as candidate participle more than the participle of preset second threshold exposure;
Third statistical module, for obtained in being recorded from historical statistics each candidate participle the current statistic period it
The exposure frequency in T preceding measurement period, wherein T is positive integer;
Second hot word determining module, the candidate participle for that will meet following conditions are determined as hot word:
For the value of arbitrary t, inequalityIt sets up, wherein n is described
The serial number of candidate's participle, 1≤n≤N, N are the sum of the candidate participle, and t is each measurement period according to chronological order
The serial number being arranged in order, 1≤t≤T, ExpNumn,tThe exposure frequency in the t measurement period is segmented for n-th of candidate,
ExpNumn,T+1For the exposure frequency of n-th of candidate participle within the current statistic period, ln is natural logrithm function,
ThreshRatio is preset proportion threshold value;
Further, the hot word analytical equipment can also include:
4th statistical module, for obtaining exposure of each history hot word in each measurement period in being recorded from historical statistics
Optical frequency time, the history hot word is the hot word being had determined before the current statistic period;
First exposure sequence constructing module, the first exposure sequence for constructing each history hot word according to the following formula:
ExpSeq1nh={ HsExpNumnh,1,HsExpNumnh,2,......,HsExpNumnh,th,......,
HsExpNumnh,THnh}
Wherein, nh is the serial number of the history hot word, and 1≤nh≤NH, NH are the sum of the history hot word, and th is each
The serial number that measurement period is arranged in order according to chronological order, 1≤th≤THnh, THnhFor the statistics of n-th h history hot word
The sum in period, HsExpNumnh,thFor the exposure frequency of n-th h history hot word in the th measurement period, ExpSeq1nh
For the first exposure sequence of n-th h history hot word;
First exposure serial mean computing module, the mean value for calculating each first exposure sequence according to the following formula:
Wherein, AvExpSeq1nhFor the mean value of the n-th h first exposure sequence;
First mean value sequence structure module, for construct according to the following formula it is each first exposure sequence mean value according to from greatly to
The sequence that small sequence is arranged in order:
{AvExpSeq11′,AvExpSeq12′,......,AvExpSeq1nh1′,......,AvExpSeq1NH′}
Wherein, AvExpSeq1nh1' for according to the first exposure sequence being arranged sequentially on the n-th positions h1 from big to small
Mean value, 1≤nh1≤NH;
First threshold exposure computing module, for calculating first threshold exposure according to the following formula:
Wherein, NMAX=floor (ξmax× NH), ξmaxFor preset coefficient, and 0<ξmax<1, floor is downward value letter
Number, Threshold1 are first threshold exposure;
Second threshold exposure computing module, for calculating second threshold exposure according to the following formula:
Wherein, NMIN=floor (ξmin× NH), ξminFor preset coefficient, and 0<ξmin<1, Threshold2 is described
Second threshold exposure.
Further, the hot word analytical equipment can also include:
Second exposure sequence constructing module, the second exposure sequence for constructing each history hot word according to the following formula:
ExpSeq2nh={ HsExpNumnh,1′,HsExpNumnh,2′,......,HsExpNumnh,th1′,......,
HsExpNumnh,THnh′}
Wherein, HsExpNumnh,th1′∈ExpSeq1nh, 1≤th1≤THnh, HsExpNumnh,th1′≥
HsExpNumnh,th1+1', ExpSeq2nhFor the second exposure sequence of n-th h history hot word;
Second exposure serial mean computing module, the mean value for calculating each second exposure sequence according to the following formula:
Wherein, AvExpSeq2nhFor the mean value of the n-th h second exposure sequence, TH1nhMeet the following conditions:
HsExpNumnh,TH1′≥AvExpSeq1nhAnd HsExpNumnh,TH1+1′<AvExpSeq1nh;
Second mean value sequence structure module, for construct according to the following formula it is each second exposure sequence mean value according to from greatly to
The sequence that small sequence is arranged in order:
{AvExpSeq21′,AvExpSeq22′,......,AvExpSeq2nh1′,......,AvExpSeq2NH′}
Wherein, AvExpSeq2nh1' for according to the second exposure sequence being arranged sequentially on the n-th positions h1 from big to small
Mean value;
First threshold exposure computing module, for calculating first threshold exposure according to the following formula:
Second threshold exposure computing module, for calculating second threshold exposure according to the following formula:
Further, the current statistic period includes M subcycle, wherein M is positive integer, the second statistics mould
Block may include:
Sub- period statistic unit, for counting each enterprise name in the preferred text message of each sub- period
Expose the frequency;
The calculation of relationship degree module may include:
First computing unit, for calculating the degree of association between each enterprise name and the hot word according to the following formula:
Wherein, q be enterprise name serial number, 1≤q≤Q, Q be enterprise name sum, p be hot word serial number, 1≤p≤
P, P is the sums of hot word, the serial number that m is arranged in order for each sub- period according to chronological order, 1≤m≤M,
EntExpNumq,p,mFor exposure of q-th of enterprise name in the preferred text message comprising p-th of hot word in m-th of sub- period
Optical frequency time, kmFor preset weight coefficient, km<km+1AndRelq,pBetween q-th of enterprise name and p-th of hot word
The degree of association.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description,
The specific work process of module and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment
The part of load may refer to the associated description of other embodiments.
Fig. 5 shows a kind of schematic block diagram of hot word analysing terminal equipment provided in an embodiment of the present invention, for the ease of saying
It is bright, it illustrates only and the relevant part of the embodiment of the present invention.
In the present embodiment, the hot word analysing terminal equipment 5 can be mobile phone, tablet computer, desktop PC, pen
Remember the computing devices such as this and cloud server.The hot word analysing terminal equipment 5 may include:It processor 50, memory 51 and deposits
The computer-readable instruction 52 that can be run in the memory 51 and on the processor 50 is stored up, such as executes above-mentioned heat
The computer-readable instruction of word analysis method.The processor 50 is realized above-mentioned each when executing the computer-readable instruction 52
Step in hot word analysis method embodiment, such as step S101 to S106 shown in FIG. 1.Alternatively, the processor 50 executes
The function of each module/unit in above-mentioned each device embodiment, such as module shown in Fig. 4 are realized when the computer-readable instruction 52
401 to 406 function.
Illustratively, the computer-readable instruction 52 can be divided into one or more module/units, one
Or multiple module/units are stored in the memory 51, and executed by the processor 50, to complete the present invention.Institute
It can be the series of computation machine readable instruction section that can complete specific function, the instruction segment to state one or more module/units
For describing implementation procedure of the computer-readable instruction 52 in the hot word analysing terminal equipment 5.
The processor 50 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit
(Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor
Deng.
The memory 51 can be the internal storage unit of the hot word analysing terminal equipment 5, such as hot word analysis is eventually
The hard disk or memory of end equipment 5.The memory 51 can also be the External memory equipment of the hot word analysing terminal equipment 5,
Such as the plug-in type hard disk being equipped in the hot word analysing terminal equipment 5, intelligent memory card (Smart Media Card, SMC),
Secure digital (Secure Digital, SD) blocks, flash card (Flash Card) etc..Further, the memory 51 may be used also
With both include the hot word analysing terminal equipment 5 internal storage unit and also including External memory equipment.The memory 51 is used
In other instruction and datas needed for the storage computer-readable instruction and the hot word analysing terminal equipment 5.It is described to deposit
Reservoir 51 can be also used for temporarily storing the data that has exported or will export.
Each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also be each
Unit physically exists alone, can also be during two or more units are integrated in one unit.Above-mentioned integrated unit both may be used
It realizes, can also be realized in the form of SFU software functional unit in the form of using hardware.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can be stored in a computer readable storage medium.Based on this understanding, technical scheme of the present invention substantially or
Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products
Reveal and, which is stored in a storage medium, including several computer-readable instructions are used so that one
Platform computer equipment (can be personal computer, server or the network equipment etc.) executes described in each embodiment of the present invention
The all or part of step of method.And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-
Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with
Store the medium of computer-readable instruction.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to aforementioned reality
Applying example, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to aforementioned each
Technical solution recorded in embodiment is modified or equivalent replacement of some of the technical features;And these are changed
Or it replaces, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution.
Claims (10)
1. a kind of hot word analysis method, which is characterized in that including:
The webpage issued on targeted website in the current statistic period is crawled by search engine, the targeted website is pageview
More than the website of preset pageview threshold value;
Cutting word processing is carried out to the text message in the webpage, obtains each participle for constituting the text message;
Count the exposure frequency of each participle in the text message;
The participle that the exposure frequency in the text message is more than to preset first threshold exposure is determined as hot word;
The exposure frequency of each enterprise name in preferred text message is counted, the preferred text message is to include the hot word
Text message;
Each enterprise name and the hot word are calculated according to the exposure frequency of each enterprise name in the preferred text message
Between the degree of association.
2. hot word analysis method according to claim 1, which is characterized in that in each participle of the statistics in the text
After the exposure frequency in information, further include:
The exposure frequency in the text message is less than or equal to first threshold exposure and is exposed more than preset second
The participle of photo threshold is determined as candidate participle;
From the exposure obtained in historical statistics record in each candidate T measurement period segmented before the current statistic period
Optical frequency time, wherein T is positive integer;
The candidate participle for meeting following conditions is determined as hot word:
For the value of arbitrary t, inequalityIt sets up, wherein n is described candidate point
The serial number of word, 1≤n≤N, N are the sum of the candidate participle, and t is that each measurement period is arranged successively according to chronological order
The serial number of row, 1≤t≤T, ExpNumn,tFor the exposure frequency of n-th of candidate participle in t-th of measurement period, ExpNumn,T+1
For the exposure frequency of n-th of candidate participle within the current statistic period, ln is natural logrithm function, and ThreshRatio is
Preset proportion threshold value.
3. hot word analysis method according to claim 2, which is characterized in that first threshold exposure and described second exposes
The setting up procedure of photo threshold includes:
The exposure frequency of each history hot word in each measurement period is obtained from historical statistics record, the history hot word is
The hot word being had determined before the current statistic period;
The first exposure sequence of each history hot word is constructed according to the following formula:
Wherein, nh is the serial number of the history hot word, and 1≤nh≤NH, NH are the sum of the history hot word, and th is each statistics
The serial number that period is arranged in order according to chronological order, 1≤th≤THnh, THnhFor the measurement period of n-th h history hot word
Sum, HsExpNumnh,thFor the exposure frequency of n-th h history hot word in the th measurement period, ExpSeq1nhIt is
First exposure sequence of nh history hot word;
The mean value of each first exposure sequence is calculated according to the following formula:
Wherein, AvExpSeq1nhFor the mean value of the n-th h first exposure sequence;
The sequence that the mean value of each first exposure sequence is arranged in order according to sequence from big to small is constructed according to the following formula:
{AvExpSeq11′,AvExpSeq12′,......,AvExpSeq1nh1′,......,AvExpSeq1NH′}
Wherein, AvExpSeq1nh1' to expose the mean value of sequence according to first be arranged sequentially on the n-th positions h1 from big to small,
1≤nh1≤NH;
First threshold exposure is calculated according to the following formula:
Wherein, NMAX=floor (ξmax× NH), ξmaxFor preset coefficient, and 0<ξmax<1, floor is downward value function,
Threshold1 is first threshold exposure;
Second threshold exposure is calculated according to the following formula:
Wherein, NMIN=floor (ξmin× NH), ξminFor preset coefficient, and 0<ξmin<1, Threshold2 is described second
Threshold exposure.
4. hot word analysis method according to claim 3, which is characterized in that in the mean value for calculating each first exposure sequence
Later, further include:
The second exposure sequence of each history hot word is constructed according to the following formula:
Wherein, HsExpNumnh,th1′∈ExpSeq1nh, 1≤th1≤THnh, HsExpNumnh,th1′≥HsExpNumnh,th1+1',
ExpSeq2nhFor the second exposure sequence of n-th h history hot word;
The mean value of each second exposure sequence is calculated according to the following formula:
Wherein, AvExpSeq2nhFor the mean value of the n-th h second exposure sequence, TH1nhMeet the following conditions:HsExpNumnh,TH1′
≥AvExpSeq1nhAnd HsExpNumnh,TH1+1′<AvExpSeq1nh;
The sequence that the mean value of each second exposure sequence is arranged in order according to sequence from big to small is constructed according to the following formula:
{AvExpSeq21′,AvExpSeq22′,......,AvExpSeq2nh1′,......,AvExpSeq2NH′}
Wherein, AvExpSeq2nh1' for according to the mean value of the second exposure sequence being arranged sequentially on the n-th positions h1 from big to small;
First threshold exposure is calculated according to the following formula:
Second threshold exposure is calculated according to the following formula:
5. hot word analysis method according to any one of claim 1 to 4, which is characterized in that the current statistic period
Including M subcycle, wherein M is positive integer, exposure frequency packet of each enterprise name of statistics in preferred text message
It includes:
Count the exposure frequency of each enterprise name in the preferred text message of each sub- period;
The exposure frequency according to each enterprise name in the preferred text message calculate each enterprise name with it is described
The degree of association between hot word includes:
The degree of association between each enterprise name and the hot word is calculated according to the following formula:
Wherein, q is the serial number of enterprise name, and 1≤q≤Q, Q are the sum of enterprise name, and p is the serial number of hot word, 1≤p≤P, P
For the sum of hot word, m is the serial number being arranged in order according to chronological order each sub- period, 1≤m≤M,
EntExpNumq,p,mFor exposure of q-th of enterprise name in the preferred text message comprising p-th of hot word in m-th of sub- period
Optical frequency time, kmFor preset weight coefficient, km<km+1AndRelq,pBetween q-th of enterprise name and p-th of hot word
The degree of association.
6. a kind of computer readable storage medium, the computer-readable recording medium storage has computer-readable instruction, special
Sign is, the hot word point as described in any one of claim 1 to 5 is realized when the computer-readable instruction is executed by processor
The step of analysis method.
7. a kind of hot word analysing terminal equipment, including memory, processor and it is stored in the memory and can be described
The computer-readable instruction run on processor, which is characterized in that the processor executes real when the computer-readable instruction
Existing following steps:
The webpage issued on targeted website in the current statistic period is crawled by search engine, the targeted website is pageview
More than the website of preset pageview threshold value;
Cutting word processing is carried out to the text message in the webpage, obtains each participle for constituting the text message;
Count the exposure frequency of each participle in the text message;
The participle that the exposure frequency in the text message is more than to preset first threshold exposure is determined as hot word;
The exposure frequency of each enterprise name in preferred text message is counted, the preferred text message is to include the hot word
Text message;
Each enterprise name and the hot word are calculated according to the exposure frequency of each enterprise name in the preferred text message
Between the degree of association.
8. hot word analysing terminal equipment according to claim 7, which is characterized in that in each participle of the statistics described
After the exposure frequency in text message, further include:
The exposure frequency in the text message is less than or equal to first threshold exposure and is exposed more than preset second
The participle of photo threshold is determined as candidate participle;
From the exposure obtained in historical statistics record in each candidate T measurement period segmented before the current statistic period
Optical frequency time, wherein T is positive integer;
The candidate participle for meeting following conditions is determined as hot word:
For the value of arbitrary t, inequalityIt sets up, wherein n is described candidate point
The serial number of word, 1≤n≤N, N are the sum of the candidate participle, and t is that each measurement period is arranged successively according to chronological order
The serial number of row, 1≤t≤T, ExpNumn,tFor the exposure frequency of n-th of candidate participle in t-th of measurement period, ExpNumn,T+1
For the exposure frequency of n-th of candidate participle within the current statistic period, ln is natural logrithm function, and ThreshRatio is
Preset proportion threshold value.
9. hot word analysing terminal equipment according to claim 8, which is characterized in that first threshold exposure and described
The setting up procedure of two threshold exposures includes:
The exposure frequency of each history hot word in each measurement period is obtained from historical statistics record, the history hot word is
The hot word being had determined before the current statistic period;
The first exposure sequence of each history hot word is constructed according to the following formula:
Wherein, nh is the serial number of the history hot word, and 1≤nh≤NH, NH are the sum of the history hot word, and th is each statistics
The serial number that period is arranged in order according to chronological order, 1≤th≤THnh, THnhFor the measurement period of n-th h history hot word
Sum, HsExpNumnh,thFor the exposure frequency of n-th h history hot word in the th measurement period, ExpSeq1nhIt is
First exposure sequence of nh history hot word;
The mean value of each first exposure sequence is calculated according to the following formula:
Wherein, AvExpSeq1nhFor the mean value of the n-th h first exposure sequence;
The sequence that the mean value of each first exposure sequence is arranged in order according to sequence from big to small is constructed according to the following formula:
{AvExpSeq11′,AvExpSeq12′,......,AvExpSeq1nh1′,......,AvExpSeq1NH′}
Wherein, AvExpSeq1nh1' to expose the mean value of sequence according to first be arranged sequentially on the n-th positions h1 from big to small,
1≤nh1≤NH;
First threshold exposure is calculated according to the following formula:
Wherein, NMAX=floor (ξmax× NH), ξmaxFor preset coefficient, and 0<ξmax<1, floor is downward value function,
Threshold1 is first threshold exposure;
Second threshold exposure is calculated according to the following formula:
Wherein, NMIN=floor (ξmin× NH), ξminFor preset coefficient, and 0<ξmin<1, Threshold2 is described second
Threshold exposure.
10. the hot word analysing terminal equipment according to any one of claim 7 to 9, which is characterized in that the current statistic
Period includes M subcycle, wherein M is positive integer, exposure frequency of each enterprise name of statistics in preferred text message
It is secondary to include:
Count the exposure frequency of each enterprise name in the preferred text message of each sub- period;
The exposure frequency according to each enterprise name in the preferred text message calculate each enterprise name with it is described
The degree of association between hot word includes:
The degree of association between each enterprise name and the hot word is calculated according to the following formula:
Wherein, q is the serial number of enterprise name, and 1≤q≤Q, Q are the sum of enterprise name, and p is the serial number of hot word, 1≤p≤P, P
For the sum of hot word, m is the serial number being arranged in order according to chronological order each sub- period, 1≤m≤M,
EntExpNumq,p,mFor exposure of q-th of enterprise name in the preferred text message comprising p-th of hot word in m-th of sub- period
Optical frequency time, kmFor preset weight coefficient, km<km+1AndRelq,pBetween q-th of enterprise name and p-th of hot word
The degree of association.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810456973.4A CN108710664B (en) | 2018-05-14 | 2018-05-14 | Hot word analysis method, computer readable storage medium and terminal device |
PCT/CN2018/096267 WO2019218452A1 (en) | 2018-05-14 | 2018-07-19 | Method, computer readable storage medium, terminal apparatus, and device for analyzing trending terms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810456973.4A CN108710664B (en) | 2018-05-14 | 2018-05-14 | Hot word analysis method, computer readable storage medium and terminal device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108710664A true CN108710664A (en) | 2018-10-26 |
CN108710664B CN108710664B (en) | 2023-04-18 |
Family
ID=63868099
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810456973.4A Active CN108710664B (en) | 2018-05-14 | 2018-05-14 | Hot word analysis method, computer readable storage medium and terminal device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108710664B (en) |
WO (1) | WO2019218452A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427381A (en) * | 2019-08-07 | 2019-11-08 | 北京嘉和海森健康科技有限公司 | A kind of data processing method and relevant device |
CN111310018A (en) * | 2018-12-11 | 2020-06-19 | 阿里巴巴集团控股有限公司 | Determining method of timeliness search vocabulary and search engine |
CN111737553A (en) * | 2020-06-16 | 2020-10-02 | 苏州朗动网络科技有限公司 | Method and device for selecting enterprise associated words and storage medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117034904B (en) * | 2023-10-09 | 2023-12-08 | 北京睿企信息科技有限公司 | Method for obtaining hot words with stable heat, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727494A (en) * | 2009-12-29 | 2010-06-09 | 华中师范大学 | Network hot word generating system in specific area |
CN103186675A (en) * | 2013-04-03 | 2013-07-03 | 南京安讯科技有限责任公司 | Automatic webpage classification method based on network hot word identification |
US20170169062A1 (en) * | 2015-12-14 | 2017-06-15 | Le Holdings (Beijing) Co., Ltd. | Method and electronic device for recommending video |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101246499B (en) * | 2008-03-27 | 2010-10-13 | 腾讯科技(深圳)有限公司 | Network information search method and system |
CN103106227A (en) * | 2012-08-03 | 2013-05-15 | 人民搜索网络股份公司 | System and method of looking up new word based on webpage text |
CN105045882B (en) * | 2015-07-21 | 2018-09-25 | 无锡天脉聚源传媒科技有限公司 | A kind of hot word processing method and processing device |
-
2018
- 2018-05-14 CN CN201810456973.4A patent/CN108710664B/en active Active
- 2018-07-19 WO PCT/CN2018/096267 patent/WO2019218452A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727494A (en) * | 2009-12-29 | 2010-06-09 | 华中师范大学 | Network hot word generating system in specific area |
CN103186675A (en) * | 2013-04-03 | 2013-07-03 | 南京安讯科技有限责任公司 | Automatic webpage classification method based on network hot word identification |
US20170169062A1 (en) * | 2015-12-14 | 2017-06-15 | Le Holdings (Beijing) Co., Ltd. | Method and electronic device for recommending video |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310018A (en) * | 2018-12-11 | 2020-06-19 | 阿里巴巴集团控股有限公司 | Determining method of timeliness search vocabulary and search engine |
CN111310018B (en) * | 2018-12-11 | 2024-03-01 | 阿里巴巴集团控股有限公司 | Method for determining timeliness search vocabulary and search engine |
CN110427381A (en) * | 2019-08-07 | 2019-11-08 | 北京嘉和海森健康科技有限公司 | A kind of data processing method and relevant device |
CN111737553A (en) * | 2020-06-16 | 2020-10-02 | 苏州朗动网络科技有限公司 | Method and device for selecting enterprise associated words and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108710664B (en) | 2023-04-18 |
WO2019218452A1 (en) | 2019-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200193382A1 (en) | Employment resource system, method and apparatus | |
Koesten et al. | Everything you always wanted to know about a dataset: Studies in data summarisation | |
CN108710664A (en) | A kind of hot word analysis method, computer readable storage medium and terminal device | |
WO2019149145A1 (en) | Compliant report class sorting method and apparatus | |
US20170352089A1 (en) | Recommendation Engine | |
US20190286676A1 (en) | Contextual content collection, filtering, enrichment, curation and distribution | |
WO2017107569A1 (en) | Android application assembly method based on application content | |
US9934293B2 (en) | Generating search results | |
US20150213368A1 (en) | Information recommendation method, apparatus, and server | |
US20110270845A1 (en) | Ranking Information Content Based on Performance Data of Prior Users of the Information Content | |
US20160125028A1 (en) | Systems and methods for query rewriting | |
US11803927B2 (en) | Analysis of intellectual-property data in relation to products and services | |
CN107918644A (en) | News subject under discussion analysis method and implementation system in reputation Governance framework | |
Rebelo et al. | TwitterJam: Identification of mobility patterns in urban centers based on tweets | |
Ranganath et al. | Understanding and identifying rhetorical questions in social media | |
Moya et al. | Integrating web feed opinions into a corporate data warehouse | |
US20200380376A1 (en) | Artificial Intelligence Based System And Method For Predicting And Preventing Illicit Behavior | |
US20150074121A1 (en) | Semantics graphs for enterprise communication networks | |
CN111882224A (en) | Method and device for classifying consumption scenes | |
CN107766537A (en) | A kind of position search ordering method and computing device | |
Alghieth et al. | A map-based job recommender model | |
CN113961811B (en) | Event map-based conversation recommendation method, device, equipment and medium | |
US20150294019A1 (en) | Web browsing activity flow | |
CN109242690A (en) | Finance product recommended method, device, computer equipment and readable storage medium storing program for executing | |
CN113868373A (en) | Word cloud generation method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |