CN108228808A - Determine the method, apparatus of focus incident and storage medium and electronic equipment - Google Patents
Determine the method, apparatus of focus incident and storage medium and electronic equipment Download PDFInfo
- Publication number
- CN108228808A CN108228808A CN201711484349.7A CN201711484349A CN108228808A CN 108228808 A CN108228808 A CN 108228808A CN 201711484349 A CN201711484349 A CN 201711484349A CN 108228808 A CN108228808 A CN 108228808A
- Authority
- CN
- China
- Prior art keywords
- determined
- word
- weight
- text
- theme
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Abstract
This disclosure relates to the method, apparatus and storage medium and electronic equipment, this method of a kind of determining focus incident include:Obtain multiple texts to be determined in preset time period;The corresponding topic model of whole text to be determined in the preset time period is obtained, and determines that each text to be determined belongs to the first theme conditional probability of different themes according to topic model;Topic model includes multiple themes;The temperature weight of each participle word in the first theme condition determine the probability all text to be determined;Focus incident is determined from multiple texts to be determined according to the temperature weight of each participle word.
Description
Technical field
This disclosure relates to information technology field, and in particular, to a kind of method, apparatus of determining focus incident and storage are situated between
Matter and electronic equipment.
Background technology
With the rapid proliferation of internet, the social influence of network is expanding increasingly, and user can pass through portal website, society
The various modes such as software, microblogging, forum is handed over to go to obtain news information, and express oneself viewpoint to news information, it is this frequent
Interactive process will generate some topics commons among different users, this topics common is referred to as focus incident.
At present, it determines that all news informations can be regarded as to a set during focus incident, passes through cluster
Mode go the news information of same type polymerizeing, and respectively in same type news information carry out temperature row
Name, and using temperature media event in the top as the temperature event of the type, wherein, determine that the factor of the temperature ranking can
To be quantity of access times and comment text etc., due to the temperature ranking can by manual intervention (such as by software or
Person manually carries out brush ranking), in this way, determining that focus incident is inaccurate only according to the temperature ranking.
Invention content
To solve the above-mentioned problems, the present disclosure proposes a kind of method, apparatus of determining focus incident and storage medium with
And electronic equipment.
According to the embodiment of the present disclosure in a first aspect, provide a kind of method of determining focus incident, the method includes:
Obtain multiple texts to be determined in preset time period;
The corresponding topic model of whole in the preset time period text to be determined is obtained, and according to the theme
Model determines that each text to be determined belongs to the first theme conditional probability of different themes;The topic model includes more
A theme;
The temperature of each participle word in the first theme condition determine the probability all text to be determined
Weight;
Focus incident is determined from multiple texts to be determined according to the temperature weight of each participle word.
Optionally, the corresponding topic model of the whole text to be determined obtained in preset time period includes:
Each text to be determined in the preset time period is subjected to word segmentation processing and obtains at least one participle word
Language;
Preset themes model is trained to obtain topic model by least one participle word.
Optionally, each participle in the first theme condition determine the probability all text to be determined
The temperature weight of word includes:
Obtain the second theme that at least one of each text to be determined participle word belongs to different themes
Conditional probability;
At least one participle word is determined according to the first theme conditional probability and the second theme conditional probability
Topic weights of the language in each text to be determined;
The temperature weight of each participle word is determined according to the topic weights.
Optionally, at least one of each text to be determined participle word that obtains belongs to different themes
Second theme conditional probability include:
Determine probability of occurrence of at least one participle word in the corresponding text to be determined;
The theme corresponding with the same subject is worth to for calculating the corresponding first theme conditional probability of same subject is general
Rate;
At least one of each text to be determined participle word is obtained in difference according to the topic model
Word conditional probability under theme;
Determine that second theme condition is general according to the theme probability and the probability of occurrence and the word conditional probability
Rate.
Optionally, the preset time period include a period when, it is described determined according to the topic weights it is each
The temperature weight of the participle word includes:
The first weight of each participle word in all texts to be determined is obtained by Weight Acquisition step,
It is the temperature weight to determine first weight.
It is described to determine each participle according to the topic weights when the preset time period includes multiple periods
The temperature weight of word includes:
It is obtained respectively by Weight Acquisition step each in the whole text to be determined in each period
First weight of the participle word;
According to the temperature weight of each participle word of first Weight Acquisition.
Optionally, the Weight Acquisition step includes:
Obtain location information of each participle word in each text to be determined;The location information includes text
This caption position or text body position;
When the location information of the participle word is the text header position, the master of the participle word is determined
Inscribe second weight of the product of weight and parameter preset for the participle word in each text to be determined;
When the location information of the participle word is the text body position, the master of the participle word is determined
Inscribe second weight of the weight for the participle word in each text to be determined;
It is same to calculate the average value of second weight of the same participle word in all texts to be determined respectively
First weight of the one participle word.
Optionally, the temperature weight according to each participle word of first Weight Acquisition includes:
The same participle is determined according to same corresponding first weight of word that segments in each period
The third weight of word;
Each participle word is determined according to the third weight of each participle word and first weight
The temperature weight.
Optionally, the focus incident includes hot spot word and hot spot subordinate sentence, the temperature of each word of the basis
Weight determines that focus incident includes from multiple texts to be determined:
The hot spot word of word quantity is preset according to the temperature Weight Acquisition of each participle word;
The subordinate sentence to be determined for including the hot spot word is obtained from all texts to be determined;
Multiple subordinate sentence words that the subordinate sentence to be determined includes are carried out descending sort according to topic weights to be sorted
As a result;
In the temperature word when the weight ranking of the ranking results is less than or equal to default ranking, determine described
Subordinate sentence to be determined is target subordinate sentence, and hot spot subordinate sentence is obtained from the target subordinate sentence;
Determine that the hot spot word and the hot spot subordinate sentence are the focus incident.
According to the second aspect of the embodiment of the present disclosure, a kind of device of determining focus incident is provided, described device includes:
Acquisition module, for obtaining multiple texts to be determined in preset time period;
Processing module, for obtaining the corresponding topic model of the text to be determined of the whole in the preset time period,
And determine that each text to be determined belongs to the first theme conditional probability of different themes according to the topic model;The master
Topic model includes multiple themes;
First determining module, for every in the first theme condition determine the probability all text to be determined
The temperature weight of a participle word;
Second determining module, for each temperature weight for segmenting word of basis from multiple texts to be determined
Determine focus incident.
Optionally, the processing module includes:
Submodule is handled, is obtained for each text to be determined in the preset time period to be carried out word segmentation processing
At least one participle word;
Training submodule, for preset themes model being trained to obtain topic model by least one participle word.
Optionally, first determining module includes:
First acquisition submodule belongs to for obtaining at least one of each text to be determined participle word
The second theme conditional probability of different themes;
First determination sub-module, for being determined according to the first theme conditional probability and the second theme conditional probability
Topic weights of at least one participle word in each text to be determined;
Second determination sub-module, for determining the temperature weight of each participle word according to the topic weights.
Optionally, first acquisition submodule, for determining that at least one participle word described is treated corresponding
Determine the probability of occurrence in text;Calculate the corresponding with the same subject is worth to of the first theme conditional probability of same subject
Theme probability;At least one of each text to be determined participle word is obtained not according to the topic model
With the word conditional probability under theme;It is determined according to the theme probability and the probability of occurrence and the word conditional probability
Second theme conditional probability.
Optionally, when the preset time period includes a period, second determination sub-module, for passing through power
Weight obtaining step obtains the first weight of each participle word in all texts to be determined, determines first power
Weight is the temperature weight.
When the preset time period includes multiple periods, second determination sub-module, for passing through Weight Acquisition
Step obtains the first power of each participle word in the whole text to be determined in each period respectively
Weight;According to the temperature weight of each participle word of first Weight Acquisition.
Optionally, the Weight Acquisition step includes:Each participle word is obtained in each text to be determined
Location information;The location information includes text header position or text body position;
When the location information of the participle word is the text header position, the master of the participle word is determined
Inscribe second weight of the product of weight and parameter preset for the participle word in each text to be determined;
When the location information of the participle word is the text body position, the master of the participle word is determined
Inscribe second weight of the weight for the participle word in each text to be determined;
It is same to calculate the average value of second weight of the same participle word in all texts to be determined respectively
First weight of the one participle word.
Optionally, second determination sub-module, for according to the same participle word in each period
Corresponding first weight determines the third weight of the same participle word;
Each participle word is determined according to the third weight of each participle word and first weight
The temperature weight.
Optionally, the focus incident includes hot spot word and hot spot subordinate sentence, and second determining module includes:
Second acquisition submodule, for presetting the heat of word quantity according to the temperature Weight Acquisition of each participle word
Point word;
Third acquisition submodule, it is to be determined comprising the hot spot word for being obtained from all texts to be determined
Subordinate sentence;
Sorting sub-module, for multiple subordinate sentence words that the subordinate sentence to be determined includes to be dropped according to topic weights
Sequence sorts to obtain ranking results;
Third determination sub-module, for being less than or equal in the temperature word in the weight ranking of the ranking results
During default ranking, the subordinate sentence to be determined is determined as target subordinate sentence, and hot spot subordinate sentence is obtained from the target subordinate sentence;
4th determination sub-module, for determining that the hot spot word and the hot spot subordinate sentence are the focus incident.
According to the third aspect of the embodiment of the present disclosure, a kind of computer readable storage medium is provided, is stored thereon with calculating
The step of machine program, which realizes above-mentioned first aspect the method when being executed by processor.
According to the fourth aspect of the embodiment of the present disclosure, a kind of electronic equipment is provided, including:
Computer readable storage medium described in the above-mentioned third aspect;And
One or to a processor, for performing the program in the computer readable storage medium.
Through the above technical solutions, obtain multiple texts to be determined in preset time period;Obtain the preset time period
The corresponding topic model of the interior whole text to be determined, and each text to be determined is determined according to the topic model
Belong to the first theme conditional probability of different themes;The topic model includes multiple themes;According to first title bar
The temperature weight of the whole each participle word in the text to be determined of part determine the probability;According to each participle word
Temperature weight determines focus incident from multiple texts to be determined, in this way, since topic model can combine text to be determined
Incidence relation between sheet, theme, word determines first of each text to be determined in different themes based on the topic model
Theme conditional probability, and each temperature weight for segmenting word is gone out according to the first theme condition determine the probability, so as to by each
The corresponding focus incident of temperature weighted mining of word is segmented, improves the accuracy rate of determining focus incident.
Other feature and advantage of the disclosure will be described in detail in subsequent specific embodiment part.
Description of the drawings
Attached drawing is for providing further understanding of the disclosure, and a part for constitution instruction, with following tool
Body embodiment is used to explain the disclosure, but do not form the limitation to the disclosure together.In the accompanying drawings:
Fig. 1 is a kind of flow chart of the method for determining focus incident shown in disclosure exemplary embodiment;
Fig. 2 determines the flow chart of the method for focus incident for the another kind shown in disclosure exemplary embodiment;
Fig. 3 is the device block diagram of the first determining focus incident shown in disclosure exemplary embodiment;
Fig. 4 is the device block diagram of second of determining focus incident shown in disclosure exemplary embodiment;
Fig. 5 is the device block diagram of the third determining focus incident shown in disclosure exemplary embodiment;
Fig. 6 is the device block diagram of the 4th kind of determining focus incident shown in disclosure exemplary embodiment;
Fig. 7 is the block diagram of a kind of electronic equipment shown in disclosure exemplary embodiment.
Specific embodiment
The specific embodiment of the disclosure is described in detail below in conjunction with attached drawing.It should be understood that this place is retouched
The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.
First, the application scenarios of the disclosure are illustrated, as network makees a part for people's lives, news information
It can be propagated by network, so as to fulfill the interaction between user and news information, if user is to a certain news information
Interaction it is more frequent, then the news information will be used as focus incident, illustratively, using the news information during certain meeting in 2017 as
, include the relevant informations such as information 1, information 2, information 3, information 4, information 5, information 6, information 7 in the news information, from
Obtained in a large amount of news information focus incident for government work person, Enterprise Public Relations personnel, Financial Research personnel and other
Research related personnel has very big value with public opinion, i.e., is conducive to hold event development trend in time by obtaining focus incident,
And corresponding measure is taken in time, still, focus incident is mainly determined by temperature ranking at present, since the temperature ranking can lead to
Manual intervention (such as carrying out brush ranking by software or manually) is crossed, in this way, determining focus incident only according to the temperature ranking
It is inaccurate.
The present disclosure proposes the method, apparatus and storage medium and electronic equipment of a kind of determining focus incident, by obtaining
The text to be determined of multiple news informations in preset time period is taken, and obtains the whole text to be determined in the preset time period
Corresponding topic model, the first theme condition for belonging to different themes by each text to be determined of topic model acquisition are general
Rate, the temperature weight of each participle word in the first theme condition determine the probability all text to be determined, so as to
With according to the temperature weight from the text to be determined determine focus incident, in this way, due to topic model can combine it is to be determined
Incidence relation between text, theme, word determines of each text to be determined in different themes based on the topic model
One theme conditional probability, and each temperature weight for segmenting word is gone out according to the first theme condition determine the probability, so as to by every
The corresponding focus incident of temperature weighted mining of a participle word improves the accuracy rate of determining focus incident.
The disclosure is described in detail with reference to specific embodiment.
Flow diagrams of the Fig. 1 for a kind of method of determining focus incident shown in disclosure exemplary embodiment, such as Fig. 1
Shown, this method includes:
Multiple texts to be determined in S101, acquisition preset time period.
Wherein, which can be by crawler technology from acquisitions such as portal website, social software, microblogging, forums
To the text of target topic, illustratively, illustrated by taking the news information during the target topic is certain meeting in 2017 as an example, it should
Include the relevant informations such as information 1, information 2, information 3, information 4, information 5, information 6, information 7 in news information, then this is treated really
The corresponding texts of related news information such as information 1, information 2, information 3, information 4, information 5, information 6, information 7 can be included by determining text
This, above-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.
S102, all corresponding topic models of the text to be determined, and according to the theme in the preset time period are obtained
Model determines that each text to be determined belongs to the first theme conditional probability of different themes.
In this step, if the preset time period be a period, the topic model be according to this when
Between whole text generation to be determined in section model;If the preset time period includes multiple periods, the topic model
Including multiple, and the topic model is respectively the model according to the whole text generation to be determined in each period.
In a kind of possible realization method, each text to be determined in the same period can be carried out at participle
Reason obtains at least one participle word, and passes through at least one word and preset themes quantity training preset themes model obtains
Topic model, wherein, which can be by LDA (Latent Dirichlet Allocation;Potential Di
Sharp Cray distribution model) algorithm generation model, which is equivalent to three layers of bayesian probability model, as
The three-decker of word, theme and text to be determined, in this way, can be generated according to the preset themes model comprising the text to be determined
Originally, the topic model of word and theme, so as to determine that each text to be determined belongs to different masters according to the topic model
Word of at least one of first theme conditional probability of topic and each text to be determined the participle word under different themes
Conditional probability.
S103, each temperature for segmenting word in the first theme condition determine the probability all text to be determined
Weight.
In the disclosure, if the temperature weight is bigger, the temperature of the participle word is higher, i.e., user is to the participle word
Attention rate it is higher, if on the contrary, the temperature weight is smaller, the temperature of the participle word is lower, i.e., user is to the participle word
Attention rate it is lower.
S104, focus incident is determined from multiple texts to be determined according to the temperature weight of each participle word.
Using the above method, since topic model can combine the incidence relation between text to be determined, theme, word,
First theme conditional probability of each text to be determined in different themes is determined, and according to the first theme based on the topic model
Conditional probability determines the temperature weight of each participle word, so as to corresponding by the temperature weighted mining of each participle word
Focus incident improves the accuracy rate of determining focus incident.
Flow diagrams of the Fig. 2 for a kind of method of determining focus incident shown in disclosure exemplary embodiment, such as Fig. 2
Shown, this method includes:
Multiple texts to be determined in S201, acquisition preset time period.
Wherein, which can be by crawler technology from acquisitions such as portal website, social software, microblogging, forums
To the text of target topic, illustratively, illustrated by taking the news information during the target topic is certain meeting in 2017 as an example, it should
Include the relevant informations such as information 1, information 2, information 3, information 4, information 5, information 6, information 7 in news information, then this is treated really
The corresponding texts of related news information such as information 1, information 2, information 3, information 4, information 5, information 6, information 7 can be included by determining text
This,, can will be multiple in each period if the preset time period is multiple periods in a kind of possible realization method
The text representation to be determined is text collection D, D={ d1,d2,…,di,…dn, diRepresent i-th of text to be determined, n is represented should
The total quantity of multiple texts to be determined in the same period, and diInclude the title of this i-th text to be determined
(titlei) and text (bodyi), i.e. di={ titlei,bodyi, in this way, to segment word according to each in subsequent step
Each the first weight for segmenting word in the location information of each text to be determined determines all texts to be determined, it is above-mentioned to show
Example is merely illustrative, and the disclosure is not construed as limiting this.
S202, all corresponding topic models of the text to be determined in the preset time period are obtained.
It should be noted that if when the preset time period is a period, then this according to the topic model is
One model of the whole text generation to be determined in the period;If the preset time period includes multiple periods, the master
Model is inscribed including multiple, and the topic model is respectively the model according to the whole text generation to be determined in each period.
In this step, the corresponding master of whole text to be determined in the preset time period can be obtained by following steps
Inscribe model:
S11, each text progress word segmentation processing to be determined in the preset time period is obtained at least one participle word
Language.
Wherein, word segmentation processing can include a variety of methods, such as the method (i.e. mechanical segmentation method) of character match, specifically
Ground matches each text to be determined with the entry in default dictionary successively, if being found in this presets dictionary each
Some corresponding entry of the text to be determined, then successful match, so as to identify a word, it should be noted that due to one
A little words as " " " " " " " " etc. stop words, practical meaning is had no, it is merely meant that the tone, if these words also added
The problem of entering into subsequent preset themes model training, computation complexity being caused excessive, so as to occupy at more data
Manage resource.Therefore, in order to solve this problem, in another embodiment of the disclosure, segmented to each text to be determined
After processing obtains at least one participle word, stop words can be removed, in such manner, it is possible to which the word of no practical significance is gone
Fall, so as in the case where ensureing to determine focus incident accuracy, reduce the calculating during follow-up preset themes model training
Complexity.
S12, preset themes model is trained to obtain topic model according at least one participle word.
In this step, which includes the theme of preset themes quantity, usually can be according to text to be determined
Amount of text determines the preset themes quantity, and usually, which could be provided as 50~200, the preset themes
Model can be by LDA (Latent Dirichlet Allocation;Latent Dirichletal location model) algorithm generation
Model, the preset themes model are equivalent to three layers of bayesian probability model, as segment word, theme and text to be determined
Three-decker, in this way, the theme for including the text to be determined, word and theme can be generated according to the preset themes model
Model, so as to determine that each text to be determined belongs to the first theme conditional probability of different themes according to the topic model,
And word conditional probability of at least one of each text to be determined participle word under different themes.
It should be noted that after at least one participle word is got, due to can in the different texts to be determined
Identical participle word can be included, in this way so that when according at least one participle word training preset themes model, exist
It,, can will each this be to be determined in order to avoid the above problem so as to reduce treatment effeciency to same participle word repetition training
At least one participle word of text merges processing and duplicate removal pretreatment so that obtained all texts to be determined
There is no dittograph at least one participle word, and can be according to pretreated participle word training preset themes model
Topic model is obtained, the temperature weight of the pretreated participle word can be obtained in subsequent step, is avoided to same point
Word word computes repeatedly temperature weight, so as to improve computational efficiency, for example, by merging at least the one of each text to be determined
It is a to segment word so as to form set of words W, the W={ w of the pretreated participle word1,w2,…,wl,…wc, wl
L-th of pretreated participle word is represented, and the pretreated participle word of any two is different, so as in subsequent step
In can obtain the temperature weight of each word in the set of words successively.
S203, determined according to the topic model each text to be determined belong to different themes the first theme condition it is general
Rate.
Illustratively, i-th of text to be determined belongs to theme tpThe first theme conditional probability can be expressed as p (tp/di),
Wherein, tpRepresent p-th of theme, diI-th of text to be determined is represented, in this way, the first title bar of i-th text to be determined
Part Making by Probability Sets can be expressed as { p (t1/di),p(t2/di),…,p(tp/di),…,p(tk/di), in this way, every by determining
First theme set of conditional probabilities of a text to be determined, thus in subsequent step, it can be according to each text to be determined
It is that this first theme set of conditional probabilities calculates the corresponding first theme conditional probability of same subject and be worth to same subject
Corresponding theme probability.
S204, probability of occurrence of at least one participle word in the corresponding text to be determined is determined.
Wherein, the word subclass of at least one participle word of this in i-th of text to be determined can be expressed as wi, wi
={ w1i,w2i,…wji,…,wzi, wjiRepresent j-th of participle word in i-th of text to be determined, z represents to treat for this i-th
Determine the total quantity of the participle word in text, any two participle word in the word subclass is different, the word subset
J-th of participle word w in conjunctionjiThe calculation formula of probability of occurrence in i-th of text to be determined is:count(wji) represent j-th of participle word wjiGo out occurrence in i-th of text to be determined
Number.
S205, the theme corresponding with the same subject is worth to for the first theme conditional probability for calculating same subject are general
Rate.
In this step, theme t is calculatedpThe formula of theme probability be:Wherein, P (tp)
Represent theme tpTheme probability, p (tp/di) represent that i-th of text to be determined belongs to theme tpThe first theme conditional probability.
S206, at least one of each text to be determined participle word is obtained according to the topic model in different masters
Word conditional probability under topic.
It is three layers of bayesian probability model for including word, theme and text to be determined based on topic model, therefore,
At least one of each text to be determined can be got according to the topic model of structure and segment word under different themes
Word conditional probability, illustratively, j-th in this i-th text to be determined participle word wjiIn theme tpThe word of lower appearance
Language conditional probability can be expressed as p (wji/tp)。
S207, determine that second theme condition is general according to the theme probability and the probability of occurrence and the word conditional probability
Rate.
In this step, the calculation formula of the second theme conditional probability is:
Wherein, p (tp/wji) represent to belong to theme t under j-th in this i-th text to be determined participle wordpSecond theme item
Part probability, p (wji/tp) represent to segment word in theme t j-th in this i-th text to be determinedpThe word condition of lower appearance
Probability, P (tp) represent theme tpTheme probability, p (wji) represent j-th of participle word going out in i-th of text to be determined
Existing probability.
S208, at least one participle word is determined according to the first theme conditional probability and the second theme conditional probability
Topic weights in each text to be determined.
Wherein, the calculation formula of the topic weights is:twji=p (tp/di)*p(tp/wji), twjiRepresent j-th of participle word
Topic weights of the language in i-th of text to be determined, p (tp/di) represent that i-th of text to be determined belongs to theme tpFirst master
Inscribe conditional probability, p (tp/wji) represent theme tpSecond occurred under j-th of participle word in this i-th text to be determined
Theme conditional probability.
S209, the temperature weight that each participle word is determined according to the topic weights.
In this step, since the text to be determined in a preset time period may be obtained in step s 201,
The text to be determined in multiple preset time periods may be obtained, therefore, for the preset time period of different data, this step can
To determine the temperature weight by following different modes.
If the preset time period includes a period, obtained in all texts to be determined by Weight Acquisition step
Each participle word the first weight, determine first weight be temperature weight.
In a kind of possible realization method, Weight Acquisition step includes:Each participle word is obtained each to be determined
The location information of text;The location information includes text header position or text body position, due to i-th of text to be determined
diInclude the title (title of this i-th text to be determinedi) and text (bodyi), i.e. di={ titlei,bodyi, because
This, can be according to diIt determines location information of the participle word in i-th of text to be determined, and believes in the position of the participle word
When ceasing for text caption position, determining the topic weights of the participle word and parameter preset (if the parameter preset is 2)
Product is second weight of the participle word in each text to be determined;It is the text in the location information of the participle word
During text position, the topic weights for determining the participle word are second power of the participle word in each text to be determined
Weight;The of the average value of second weight of the same participle word in all texts to be determined for same participle word is calculated respectively
One weight, and the first weight determined is temperature weight.
If the preset time period is multiple periods, obtained respectively by Weight Acquisition step complete in each period
First weight of each participle word in portion's text to be determined, according to each participle word of first Weight Acquisition
Temperature weight, wherein, the method for the Weight Acquisition step can refer to the process of above-mentioned Weight Acquisition step, repeat no more.
Wherein, the of same participle word is determined according to corresponding first weight of same participle word in each period
Three weights;And the temperature weight of each participle word is determined according to the third weight of each participle word and the first weight.
Illustratively, when which can include three periods, i.e. first time period, second time period and third
Between section, wherein, which can be current slot, which can include the first time period, and
It is longer than the period of the first time period, which can include the second time period, and be longer than second time
The period of section, for example, the first time period can be this week, which can be the last week of this week and this week,
The third period can be the last fortnight of this week and this week, in a kind of possible realization method, when can get first
Between each participle word in whole text to be determined in section correspond to the first weight and the second time of the first time period
Each participle word in whole text to be determined in section corresponds to the first weight of the second time period and third period
Each participle word in interior whole text to be determined corresponds to the first weight of third period, in this way, can be according to this
Corresponding three first weights of same participle word in three periods determine the third weight of same participle word, this point
The calculation formula of the third weight of word word can be:Wherein, wwqIt is q-th
Segment the corresponding third weight of word;b1wqThe first weight of the first time period is corresponded to for q-th of participle word;b2wqFor q
A participle word corresponds to the first weight of the second time period;b3wqThe first power of third period is corresponded to for q-th of participle word
Weight;A is the first preset value;B is the second preset value;C is third preset value, for example, a is 0.3, b 0.4, c 0.3.
After the third weight is got, in order to calculate each participle in first time period (i.e. current slot)
The temperature weight of word, need to combine the third weight that is calculated of the above-mentioned same participle word according in three periods with
And the first weight in first time period obtains temperature weight, in the present embodiment, each be somebody's turn to do can be obtained by the following formula
The calculation formula for segmenting temperature weight of the word in first time period is:hwq=α * b1wq+β*wwq, wherein, hwqRepresent q
The temperature weight of a participle word;b1wqRepresent that q-th of participle word corresponds to the first weight of the first time period;wwqRepresent the
The third weight of q participle word;α represents the 4th preset value (such as α values are 0.25);The 5th preset value of β expressions (such as β
Value be 0.75), can be on the basis of corresponding first weight of first time period, with reference to above-mentioned in this way, by above-mentioned formula
Third weight obtains temperature weight, which has considered corresponding first weight of second time period and third period
Corresponding first weight.
It should be noted that first preset value, second preset value, the third preset value, the 4th preset value and
5th preset value after testing repeatedly by obtaining, and a+b+c=1, alpha+beta=1.
S210, the hot spot word that word quantity is preset according to the temperature Weight Acquisition of each participle word.
In this step, which is subjected to descending sort according to temperature weight and obtains word ranking, according to word
Ranking result obtains word ranking and is less than or equal to the participle word of the default word quantity as hot spot word.
S211, the subordinate sentence to be determined for including the hot spot word is obtained from all texts to be determined.
Wherein it is possible to the text to be determined is divided using the punctuation mark in each text to be determined as division points
Sentence processing obtains multiple initial subordinate sentences, should in this way, whether there is in the initial subordinate sentence in determining each text to be determined successively
Hot spot word, if there are the hot spot words in the initial subordinate sentence, it is determined that the initial subordinate sentence is subordinate sentence to be determined, retains this and treats really
Determine subordinate sentence, if there is no the hot spot words in the initial subordinate sentence, which is filtered out.
S212, hot spot subordinate sentence is determined according to the subordinate sentence to be determined.
In this step, the hot spot subordinate sentence can be determined by following steps:
S21, multiple subordinate sentence words that the subordinate sentence to be determined includes are arranged according to topic weights progress descending sort
Sequence result.
S22, determine whether the temperature word is less than or equal to default ranking in the weight ranking of the ranking results.
In the temperature word when the weight ranking of the ranking results is less than or equal to default ranking, step S23 is performed;
In the temperature word when the weight ranking of the ranking results is more than default ranking, ignore the subordinate sentence to be determined.
S23, the subordinate sentence to be determined is determined as target subordinate sentence, and hot spot subordinate sentence is obtained from the target subordinate sentence.
In this step, if the subordinate sentence set of the target subordinate sentence got includes { S1, S2 ..., Sn, then it can count respectively
The similarity of each target subordinate sentence and other target subordinate sentences in addition to the target subordinate sentence in the subordinate sentence set is calculated, so as to
The similarity and value of each target subordinate sentence are got, similarity target subordinate sentence corresponding with the maximum value in value is
The hot spot subordinate sentence, specific formula for calculation are as follows:
Wherein, x represents that hot spot subordinate sentence is x-th of target subordinate sentence, and u represents the total quantity of the target subordinate sentence, sim (Sd,S-d)
Represent the similarity between two target subordinate sentences, SdRepresent d-th of target subordinate sentence, S-dIt represents other than d-th of target subordinate sentence
Other target subordinate sentences, illustratively, the formula for calculating the similarity of d-th of target subordinate sentence and r-th of target subordinate sentence isSd∩SrRepresent that there are of identical Chinese character in the d target subordinate sentence and r-th of target subordinate sentence
Number, Sd∪SrIt represents present in d-th of target subordinate sentence and r-th of target subordinate sentence not repeat the number of Chinese character.
S213, determine that the hot spot word and the hot spot subordinate sentence are the focus incident.
In this way, the corresponding hot spot subordinate sentence of each hot spot word can be got, and by the hot spot word and the hot spot subordinate sentence
Merging is shown, can be with by above-mentioned steps S211 to S212 for example, the hot spot word obtained is " certain goal task assaults fortified position "
Determine " certain goal task is assaulted fortified position " the corresponding hot spot subordinate sentence for " leader focuses on certain goal task, and outstanding problem is oriented to, just son
Task 1, subtask 2 and subtask 3 etc. claim, and have issued the action slogan for unswervingly completing certain goal task ",
So as to accurately determine focus incident by the hot spot word and the hot spot subordinate sentence, and the hot spot word and the hot spot subordinate sentence are closed
And show user so that user gets accurate focus incident, and above-mentioned example is merely illustrative, and the disclosure does not make this
It limits.
Using the above method, since topic model can combine the incidence relation between text to be determined, theme, word,
First theme conditional probability of each text to be determined in different themes is determined, and according to the first theme based on the topic model
Conditional probability determines the temperature weight of each participle word, so as to corresponding by the temperature weighted mining of each participle word
Focus incident improves the accuracy rate of determining focus incident.
Fig. 3 is a kind of device block diagram of determining focus incident shown in disclosure exemplary embodiment, as shown in figure 3, should
Device includes:
Acquisition module 301, for obtaining multiple texts to be determined in preset time period;
Processing module 302, for obtaining all corresponding topic models of the text to be determined in the preset time period, and
Determine that each text to be determined belongs to the first theme conditional probability of different themes according to the topic model;In the topic model
Including multiple themes;
First determining module 303, for every in the first theme condition determine the probability all text to be determined
The temperature weight of a participle word;
Second determining module 304, for according to the temperature weight of each participle word from multiple texts to be determined
Determine focus incident.
Fig. 4 is a kind of device block diagram of determining focus incident shown in disclosure exemplary embodiment, as shown in figure 4, should
Processing module 302 includes:
Submodule 3021 is handled, is obtained for each text to be determined in the preset time period to be carried out word segmentation processing
At least one participle word;
Training submodule 3022, for preset themes model being trained to obtain theme mould by least one participle word
Type.
Fig. 5 is a kind of device block diagram of determining focus incident shown in disclosure exemplary embodiment, as shown in figure 5, should
First determining module 303 includes:
First acquisition submodule 3031 belongs to for obtaining at least one of each text to be determined participle word
The second theme conditional probability of different themes;
First determination sub-module 3032, for being determined according to the first theme conditional probability and the second theme conditional probability
Topic weights of at least one participle word in each text to be determined;
Second determination sub-module 3033, for determining the temperature weight of each participle word according to the topic weights.
Optionally, first acquisition submodule 3031, for determining that at least one participle word is treated really in corresponding this
Determine the probability of occurrence in text;Calculate the corresponding with the same subject is worth to of the corresponding first theme conditional probability of same subject
Theme probability;At least one of each text to be determined participle word is obtained in different themes according to the topic model
Under word conditional probability;Second theme condition is determined according to the theme probability and the probability of occurrence and the word conditional probability
Probability.
Optionally, when the preset time period includes a period, second determination sub-module 3033, for passing through power
Weight obtaining step obtains the first weight of each participle word in all texts to be determined, determines first weight to be somebody's turn to do
Temperature weight.
When the preset time period includes multiple periods, second determination sub-module 3033, for passing through Weight Acquisition
Step obtains the first weight of each participle word in all texts to be determined in each period respectively;According to
The temperature weight of each participle word of first Weight Acquisition.
Optionally, which includes:Each participle word is obtained in the position of each text to be determined
Information;The location information includes text header position or text body position;
When the location information of the participle word is text caption position, determine the topic weights of the participle word with
The product of parameter preset is second weight of the participle word in each text to be determined;
When the location information of the participle word is text text position, the topic weights for determining the participle word are
Second weight of the participle word in each text to be determined;
The average value for calculating second weight of the same participle word in all texts to be determined respectively is somebody's turn to do to be same
Segment the first weight of word.
Optionally, second determination sub-module 3033, for being corresponded to according to the same participle word in each period
The first weight determine it is same participle word third weight;
The temperature weight of each participle word is determined according to the third weight of each participle word and the first weight.
Device block diagrams of the Fig. 6 for a kind of determining focus incident shown in disclosure exemplary embodiment, the focus incident packet
Hot spot word and hot spot subordinate sentence are included, as shown in fig. 6, second determining module 304 includes:
Second acquisition submodule 3041, for presetting word quantity according to the temperature Weight Acquisition of each participle word
Hot spot word;
Third acquisition submodule 3042, it is to be determined comprising the hot spot word for being obtained from all texts to be determined
Subordinate sentence;
Sorting sub-module 3043, for multiple subordinate sentence words that the subordinate sentence to be determined includes to be carried out according to topic weights
Descending sort obtains ranking results;
Third determination sub-module 3044, for being less than or equal in the temperature word in the weight ranking of the ranking results
During default ranking, the subordinate sentence to be determined is determined as target subordinate sentence, and hot spot subordinate sentence is obtained from the target subordinate sentence;
4th determination sub-module 3045, for determining that the hot spot word and the hot spot subordinate sentence are the focus incident.
Using above device, since topic model can combine the incidence relation between text to be determined, theme, word,
First theme conditional probability of each text to be determined in different themes is determined, and according to the first theme based on the topic model
Conditional probability determines the temperature weight of each participle word, so as to corresponding by the temperature weighted mining of each participle word
Focus incident improves the accuracy rate of determining focus incident.
About the device in above-described embodiment, wherein modules perform the concrete mode of operation in related this method
Embodiment in be described in detail, explanation will be not set forth in detail herein.
Fig. 7 is the block diagram of a kind of electronic equipment 700 shown in disclosure exemplary embodiment.As shown in fig. 7, the electronics is set
Standby 700 can include:Processor 701, memory 702, multimedia component 703, input/output (I/O) interface 704, Yi Jitong
Believe component 705.
Wherein, processor 701 is used to control the integrated operation of the electronic equipment 700, to complete determining heat described above
All or part of step in the method for point event.Memory 702 is used to store various types of data to support in the electronics
The operation of equipment 700, these data can for example include for any application program operated on the electronic equipment 700 or side
The instruction of method and the relevant data of application program.The memory 702 by any kind of volatibility or non-volatile can be deposited
It stores up equipment or combination thereof is realized, such as static RAM (Static Random Access Memory,
Abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-
Only Memory, abbreviation EEPROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable Read-
Only Memory, abbreviation EPROM), and programmable read only memory (Programmable Read-Only Memory, referred to as
PROM), read-only memory (Read-Only Memory, abbreviation ROM), magnetic memory, flash memory, disk or CD.It is more
Media component 703 can include screen and audio component.Wherein screen for example can be touch screen, and audio component is used to export
And/or input audio signal.For example, audio component can include a microphone, microphone is used to receive external audio signal.
The received audio signal can be further stored in memory 702 or be sent by communication component 705.Audio component also wraps
At least one loud speaker is included, for exports audio signal.I/O interfaces 704 carry between processor 701 and other interface modules
For interface, other above-mentioned interface modules can be keyboard, mouse, button etc..These buttons can be virtual push button or entity
Button.Communication component 705 is used to carry out wired or wireless communication between the electronic equipment 700 and other equipment.Wireless communication, example
Such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they in
One or more of combinations, therefore the corresponding communication component 705 can include:Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 700 can be by one or more application application-specific integrated circuit
(Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital
Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device,
Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array
(Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member
Part is realized, for performing the method for determining focus incident described above.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction, example are additionally provided
Such as include the memory 702 of program instruction, above procedure instruction can be performed by the processor 701 of electronic equipment 700 in completion
The method for stating the determining focus incident.
The preferred embodiment of the disclosure is described in detail above in association with attached drawing, still, the disclosure is not limited to above-mentioned reality
The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure
Monotropic type, these simple variants belong to the protection domain of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance
In the case of shield, can be combined by any suitable means, in order to avoid unnecessary repetition, the disclosure to it is various can
The combination of energy no longer separately illustrates.
In addition, arbitrary combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally
Disclosed thought should equally be considered as disclosure disclosure of that.
The preferred embodiment of the disclosure is described in detail above in association with attached drawing, still, the disclosure is not limited to above-mentioned reality
The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure
Monotropic type, these simple variants belong to the protection domain of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance
In the case of shield, can be combined by any suitable means, in order to avoid unnecessary repetition, the disclosure to it is various can
The combination of energy no longer separately illustrates.
In addition, arbitrary combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally
Disclosed thought should equally be considered as disclosure disclosure of that.
Claims (10)
- A kind of 1. method of determining focus incident, which is characterized in that the method includes:Obtain multiple texts to be determined in preset time period;The corresponding topic model of whole in the preset time period text to be determined is obtained, and according to the topic model Determine that each text to be determined belongs to the first theme conditional probability of different themes;The topic model includes multiple institutes State theme;The temperature weight of each participle word in the first theme condition determine the probability all text to be determined;Focus incident is determined from multiple texts to be determined according to the temperature weight of each participle word.
- It is 2. according to the method described in claim 1, it is characterized in that, described according to the first theme condition determine the probability whole The temperature weight of each participle word in the text to be determined includes:Obtain the second theme condition that at least one of each text to be determined participle word belongs to different themes Probability;Determine that at least one participle word exists according to the first theme conditional probability and the second theme conditional probability Topic weights in each text to be determined;The temperature weight of each participle word is determined according to the topic weights.
- 3. according to the method described in claim 2, it is characterized in that, it is described obtain in each text to be determined at least one The second theme conditional probability that a participle word belongs to different themes includes:Determine probability of occurrence of at least one participle word in the corresponding text to be determined;Calculate the theme probability corresponding with the same subject is worth to of the corresponding first theme conditional probability of same subject;At least one of each text to be determined participle word is obtained in different themes according to the topic model Under word conditional probability;Second theme conditional probability is determined according to the theme probability and the probability of occurrence and the word conditional probability.
- 4. according to the method described in claim 2, it is characterized in that, the preset time period include a period when, institute It states and determines that the temperature weight of each participle word includes according to the topic weights:It is obtained all by Weight Acquisition step First weight of each participle word in the text to be determined, it is the temperature weight to determine first weight.It is described to determine each participle word according to the topic weights when the preset time period includes multiple periods Temperature weight include:It is obtained respectively by Weight Acquisition step in the whole text to be determined in each period Each participle word the first weight, and according to the temperature of each participle word of first Weight Acquisition Weight.
- 5. according to the method described in claim 4, it is characterized in that, the Weight Acquisition step includes:Obtain location information of each participle word in each text to be determined;The location information includes text mark Inscribe position or text body position;When the location information of the participle word is the text header position, the theme power of the participle word is determined Second weight of the product of weight and parameter preset for the participle word in each text to be determined;When the location information of the participle word is the text body position, the theme power of the participle word is determined Second weight of the weight for the participle word in each text to be determined;The average value for calculating second weight of the same participle word in all texts to be determined respectively is same institute State the first weight of participle word.
- 6. method according to claim 4 or 5, which is characterized in that described each described according to first Weight Acquisition The temperature weight of participle word includes:The same participle word is determined according to same corresponding first weight of word that segments in each period Third weight;The institute of each participle word is determined according to the third weight of each participle word and first weight State temperature weight.
- 7. according to the method described in claim 1, it is characterized in that, the focus incident include hot spot word and hot spot subordinate sentence, The temperature weight of each word of the basis determines that focus incident includes from multiple texts to be determined:The hot spot word of word quantity is preset according to the temperature Weight Acquisition of each participle word;The subordinate sentence to be determined for including the hot spot word is obtained from all texts to be determined;Multiple subordinate sentence words that the subordinate sentence to be determined includes are subjected to descending sort according to topic weights and obtain ranking results;In the temperature word when the weight ranking of the ranking results is less than or equal to default ranking, described treat really is determined Subordinate sentence is determined for target subordinate sentence, and hot spot subordinate sentence is obtained from the target subordinate sentence;Determine that the hot spot word and the hot spot subordinate sentence are the focus incident.
- 8. a kind of device of determining focus incident, which is characterized in that described device includes:Acquisition module, for obtaining multiple texts to be determined in preset time period;Processing module, for obtaining the corresponding topic model of the text to be determined of the whole in the preset time period, and root Determine that each text to be determined belongs to the first theme conditional probability of different themes according to the topic model;The theme mould Type includes multiple themes;First determining module, for each point in the first theme condition determine the probability all text to be determined The temperature weight of word word;Second determining module, for being determined from multiple texts to be determined according to the temperature weight of each participle word Focus incident.
- 9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1 to 7 the method is realized during row.
- 10. a kind of electronic equipment, which is characterized in that including:Computer readable storage medium described in claim 9;AndOne or to a processor, for performing the program in the computer readable storage medium.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711484349.7A CN108228808B (en) | 2017-12-29 | 2017-12-29 | Method and device for determining hot event, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711484349.7A CN108228808B (en) | 2017-12-29 | 2017-12-29 | Method and device for determining hot event, storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108228808A true CN108228808A (en) | 2018-06-29 |
CN108228808B CN108228808B (en) | 2020-07-31 |
Family
ID=62647311
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711484349.7A Active CN108228808B (en) | 2017-12-29 | 2017-12-29 | Method and device for determining hot event, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108228808B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109151498A (en) * | 2018-09-03 | 2019-01-04 | 北京达佳互联信息技术有限公司 | Focus incident processing method, device, server and storage medium |
CN109710944A (en) * | 2018-12-29 | 2019-05-03 | 新华网股份有限公司 | Hot word extracting method, device, electronic equipment and computer readable storage medium |
CN109739975A (en) * | 2018-11-15 | 2019-05-10 | 东软集团股份有限公司 | Focus incident abstracting method, device, readable storage medium storing program for executing and electronic equipment |
CN112528018A (en) * | 2020-12-01 | 2021-03-19 | 天津中科智能识别产业技术研究院有限公司 | Hot news discovery method based on text mining |
CN113076489A (en) * | 2021-04-14 | 2021-07-06 | 合肥工业大学 | Method for classifying social media user roles in public sentiment event |
CN113822069A (en) * | 2021-09-17 | 2021-12-21 | 国家计算机网络与信息安全管理中心 | Emergency early warning method and device based on meta-knowledge and electronic device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2211282A2 (en) * | 2009-01-27 | 2010-07-28 | Palo Alto Research Center Incorporated | System and method for managing user attention by detecting hot and cold topics in social indexes |
CN103970756A (en) * | 2013-01-28 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Hot topic extracting method, device and server |
WO2014127673A1 (en) * | 2013-02-25 | 2014-08-28 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for acquiring hot topics |
CN104216875A (en) * | 2014-09-26 | 2014-12-17 | 中国科学院自动化研究所 | Automatic microblog text abstracting method based on unsupervised key bigram extraction |
CN104281645A (en) * | 2014-08-27 | 2015-01-14 | 北京理工大学 | Method for identifying emotion key sentence on basis of lexical semantics and syntactic dependency |
CN105824959A (en) * | 2016-03-31 | 2016-08-03 | 首都信息发展股份有限公司 | Public opinion monitoring method and system |
-
2017
- 2017-12-29 CN CN201711484349.7A patent/CN108228808B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2211282A2 (en) * | 2009-01-27 | 2010-07-28 | Palo Alto Research Center Incorporated | System and method for managing user attention by detecting hot and cold topics in social indexes |
CN103970756A (en) * | 2013-01-28 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Hot topic extracting method, device and server |
WO2014127673A1 (en) * | 2013-02-25 | 2014-08-28 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for acquiring hot topics |
CN104281645A (en) * | 2014-08-27 | 2015-01-14 | 北京理工大学 | Method for identifying emotion key sentence on basis of lexical semantics and syntactic dependency |
CN104216875A (en) * | 2014-09-26 | 2014-12-17 | 中国科学院自动化研究所 | Automatic microblog text abstracting method based on unsupervised key bigram extraction |
CN105824959A (en) * | 2016-03-31 | 2016-08-03 | 首都信息发展股份有限公司 | Public opinion monitoring method and system |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109151498A (en) * | 2018-09-03 | 2019-01-04 | 北京达佳互联信息技术有限公司 | Focus incident processing method, device, server and storage medium |
CN109739975A (en) * | 2018-11-15 | 2019-05-10 | 东软集团股份有限公司 | Focus incident abstracting method, device, readable storage medium storing program for executing and electronic equipment |
CN109739975B (en) * | 2018-11-15 | 2021-03-09 | 东软集团股份有限公司 | Hot event extraction method and device, readable storage medium and electronic equipment |
CN109710944A (en) * | 2018-12-29 | 2019-05-03 | 新华网股份有限公司 | Hot word extracting method, device, electronic equipment and computer readable storage medium |
CN112528018A (en) * | 2020-12-01 | 2021-03-19 | 天津中科智能识别产业技术研究院有限公司 | Hot news discovery method based on text mining |
CN113076489A (en) * | 2021-04-14 | 2021-07-06 | 合肥工业大学 | Method for classifying social media user roles in public sentiment event |
CN113076489B (en) * | 2021-04-14 | 2022-09-13 | 合肥工业大学 | Method for classifying social media user roles in public sentiment event |
CN113822069A (en) * | 2021-09-17 | 2021-12-21 | 国家计算机网络与信息安全管理中心 | Emergency early warning method and device based on meta-knowledge and electronic device |
CN113822069B (en) * | 2021-09-17 | 2024-03-12 | 国家计算机网络与信息安全管理中心 | Sudden event early warning method and device based on meta-knowledge and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN108228808B (en) | 2020-07-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108228808A (en) | Determine the method, apparatus of focus incident and storage medium and electronic equipment | |
CN108804512B (en) | Text classification model generation device and method and computer readable storage medium | |
US10380236B1 (en) | Machine learning system for annotating unstructured text | |
CN111324728B (en) | Text event abstract generation method and device, electronic equipment and storage medium | |
CN112435656B (en) | Model training method, voice recognition method, device, equipment and storage medium | |
CN110444198B (en) | Retrieval method, retrieval device, computer equipment and storage medium | |
US11636341B2 (en) | Processing sequential interaction data | |
CN105608200A (en) | Network public opinion tendency prediction analysis method | |
CN107066537A (en) | Hot news generation method, equipment, electronic equipment | |
CN112084334B (en) | Label classification method and device for corpus, computer equipment and storage medium | |
CN110033382B (en) | Insurance service processing method, device and equipment | |
CN112860902A (en) | Public opinion emotional heat degree calculation method and device | |
Rinke et al. | Expert-informed topic models for document set discovery | |
CN107066442A (en) | Detection method, device and the electronic equipment of mood value | |
CN113627194B (en) | Information extraction method and device, and communication message classification method and device | |
CN110489730A (en) | Text handling method, device, terminal and storage medium | |
CN110147482B (en) | Method and device for acquiring burst hotspot theme | |
CN110705279A (en) | Vocabulary selection method and device and computer readable storage medium | |
CN111859955A (en) | Public opinion data analysis model based on deep learning | |
CN110705282A (en) | Keyword extraction method and device, storage medium and electronic equipment | |
CN113609833B (en) | Dynamic file generation method and device, computer equipment and storage medium | |
CN111753540B (en) | Method and system for collecting text data to perform Natural Language Processing (NLP) | |
CN115329173A (en) | Method and device for determining enterprise credit based on public opinion monitoring | |
CN110413899B (en) | Storage resource optimization method and system for server storage news | |
CN114722832A (en) | Abstract extraction method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |