CN108228808B

CN108228808B - Method and device for determining hot event, storage medium and electronic equipment

Info

Publication number: CN108228808B
Application number: CN201711484349.7A
Authority: CN
Inventors: 董超; 崔朝辉; 赵立军; 张霞
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2020-07-31
Anticipated expiration: 2037-12-29
Also published as: CN108228808A

Abstract

The present disclosure relates to a method, an apparatus, a storage medium, and an electronic device for determining a hotspot event, the method comprising: acquiring a plurality of texts to be determined in a preset time period; obtaining topic models corresponding to all texts to be determined in the preset time period, and determining a first topic conditional probability that each text to be determined belongs to different topics according to the topic models; the theme model comprises a plurality of themes; determining the heat weight of each word segmentation word in all texts to be determined according to the first subject conditional probability; and determining a hot event from the plurality of texts to be determined according to the heat weight of each word segmentation word.

Description

Method and device for determining hot event, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of information technologies, and in particular, to a method and an apparatus for determining a hotspot event, a storage medium, and an electronic device.

Background

With the rapid popularization of the internet, the social influence of the network is gradually expanded, users can acquire news information through various ways such as a portal, social software, a microblog and a forum and express their own views of the news information, and the frequent interaction process can generate some common topics among different users, and the common topics are called as hot events.

At present, all news information can be regarded as a set in the process of determining a hot event, news information of the same type is aggregated in a clustering mode, the news information of the same type is respectively subjected to popularity ranking, and the news event with the top popularity ranking is taken as the popularity event of the type, wherein factors determining the popularity ranking can be access times, the number of comment texts and the like, and the popularity ranking can be manually interfered (for example, the popularity ranking is manually performed through software or manpower), so that the hot event is determined only according to the popularity ranking and is not accurate.

Disclosure of Invention

In order to solve the above problem, the present disclosure provides a method, an apparatus, a storage medium, and an electronic device for determining a hotspot event.

According to a first aspect of embodiments of the present disclosure, there is provided a method of determining a hotspot event, the method comprising:

acquiring a plurality of texts to be determined in a preset time period;

obtaining topic models corresponding to all texts to be determined in the preset time period, and determining a first topic conditional probability that each text to be determined belongs to different topics according to the topic models; the theme model comprises a plurality of themes;

determining the heat weight of each word segmentation word in all the texts to be determined according to the first subject conditional probability;

and determining a hot event from the texts to be determined according to the heat weight of each word segmentation word.

Optionally, the obtaining of the topic models corresponding to all the texts to be determined within the preset time period includes:

performing word segmentation processing on each text to be determined in the preset time period to obtain at least one word segmentation word;

and training a preset topic model through at least one word segmentation word to obtain a topic model.

Optionally, the determining, according to the first topic conditional probability, the heat weight of each participle word in all the texts to be determined includes:

acquiring a second topic conditional probability that at least one word-segmentation word in each text to be determined belongs to different topics;

determining the theme weight of at least one word segmentation word in each text to be determined according to the first theme conditional probability and the second theme conditional probability;

and determining the heat weight of each word segmentation word according to the theme weight.

Optionally, the obtaining a second topic conditional probability that at least one word segmentation word in each text to be determined belongs to a different topic includes:

determining the occurrence probability of at least one word segmentation word in the corresponding text to be determined;

calculating the sum value of the conditional probabilities of the first subjects corresponding to the same subject to obtain the subject probability corresponding to the same subject;

obtaining word conditional probability of at least one word segmentation word in each text to be determined under different topics according to the topic model;

and determining a second topic conditional probability according to the topic probability, the occurrence probability and the word conditional probability.

Optionally, when the preset time period includes one time period, the determining the heat weight of each word segmentation word according to the topic weight includes:

and acquiring a first weight of each word segmentation word in all the texts to be determined through a weight acquisition step, and determining the first weight as the heat weight.

When the preset time period comprises a plurality of time periods, the determining the heat weight of each word segmentation word according to the theme weight comprises the following steps:

respectively acquiring a first weight of each word segmentation word in all the texts to be determined in each time period through a weight acquisition step;

and acquiring the heat weight of each word segmentation word according to the first weight.

Optionally, the weight obtaining step includes:

acquiring position information of each word segmentation word in each text to be determined; the position information comprises a text title position or a text body position;

when the position information of the word segmentation words is the text title position, determining the product of the theme weight of the word segmentation words and a preset parameter as a second weight of the word segmentation words in each text to be determined;

when the position information of the word segmentation words is the text body position, determining the theme weight of the word segmentation words as a second weight of the word segmentation words in each text to be determined;

and respectively calculating the average value of the second weights of the same word-segmentation word in all the texts to be determined as the first weight of the same word-segmentation word.

Optionally, the obtaining the heat weight of each word segmentation word according to the first weight includes:

determining a third weight of the same word segmentation word according to the first weight corresponding to the same word segmentation word in each time period;

determining the heat weight of each word segmentation word according to the third weight and the first weight of each word segmentation word.

Optionally, the hot event includes a hot word and a hot clause, and determining the hot event from the plurality of texts to be determined according to the heat weight of each word includes:

acquiring hot words with preset word quantity according to the heat weight of each word segmentation word;

obtaining clauses to be determined containing the hot words from all the texts to be determined;

sequencing a plurality of clause words included in the clause to be determined in a descending order according to the theme weight to obtain a sequencing result;

when the weight ranking of the hotness words in the ranking result is less than or equal to a preset ranking, determining the clause to be determined as a target clause, and acquiring a hot clause from the target clause;

and determining the hot words and the hot clauses as the hot events.

According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for determining a hotspot event, the apparatus comprising:

the acquisition module is used for acquiring a plurality of texts to be determined in a preset time period;

the processing module is used for acquiring topic models corresponding to all the texts to be determined in the preset time period and determining a first topic conditional probability that each text to be determined belongs to different topics according to the topic models; the theme model comprises a plurality of themes;

the first determining module is used for determining the heat weight of each word segmentation word in all the texts to be determined according to the first subject conditional probability;

and the second determining module is used for determining a hot event from the texts to be determined according to the heat weight of each word segmentation word.

Optionally, the processing module includes:

the processing submodule is used for carrying out word segmentation processing on each text to be determined in the preset time period to obtain at least one word segmentation word;

and the training submodule is used for training a preset topic model through at least one word segmentation word to obtain a topic model.

Optionally, the first determining module includes:

the first obtaining submodule is used for obtaining a second theme conditional probability that at least one word segmentation word in each text to be determined belongs to different themes;

a first determining submodule, configured to determine a topic weight of at least one word segmentation word in each text to be determined according to the first topic conditional probability and the second topic conditional probability;

and the second determining submodule is used for determining the heat weight of each word segmentation word according to the theme weight.

Optionally, the first obtaining sub-module is configured to determine an occurrence probability of at least one word segmentation word in the corresponding text to be determined; calculating the sum value of the conditional probabilities of the first subjects of the same subject to obtain the subject probability corresponding to the same subject; obtaining word conditional probability of at least one word segmentation word in each text to be determined under different topics according to the topic model; and determining a second topic conditional probability according to the topic probability, the occurrence probability and the word conditional probability.

Optionally, when the preset time period includes a time period, the second determining submodule is configured to obtain a first weight of each word segmentation word in all the texts to be determined through a weight obtaining step, and determine that the first weight is the heat weight.

When the preset time period comprises a plurality of time periods, the second determining submodule is used for respectively obtaining the first weight of each word segmentation word in all the texts to be determined in each time period through a weight obtaining step; and acquiring the heat weight of each word segmentation word according to the first weight.

Optionally, the weight obtaining step includes: acquiring position information of each word segmentation word in each text to be determined; the position information comprises a text title position or a text body position;

Optionally, the second determining submodule is configured to determine a third weight of the same participle term according to the first weight corresponding to the same participle term in each time period;

Optionally, the hotspot event includes a hotspot word and a hotspot clause, and the second determining module includes:

the second obtaining submodule is used for obtaining hot words with preset word quantity according to the heat weight of each word segmentation word;

a third obtaining submodule, configured to obtain to-be-determined clauses including the hot words from all the to-be-determined texts;

the sorting submodule is used for sorting a plurality of clause words included in the clause to be determined in a descending order according to the theme weight to obtain a sorting result;

a third determining submodule, configured to determine that the clause to be determined is a target clause when the weight rank of the hotness word in the ranking result is less than or equal to a preset rank, and obtain a hot clause from the target clause;

and the fourth determining submodule is used for determining the hot words and the hot clauses as the hot events.

According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect described above.

According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

the computer-readable storage medium of the third aspect above; and

one or more processors for executing the program in the computer-readable storage medium.

According to the technical scheme, a plurality of texts to be determined in a preset time period are obtained; obtaining topic models corresponding to all texts to be determined in the preset time period, and determining a first topic conditional probability that each text to be determined belongs to different topics according to the topic models; the theme model comprises a plurality of themes; determining the heat weight of each word segmentation word in all the texts to be determined according to the first subject conditional probability; according to the method, the hot spot events are determined from the texts to be determined according to the heat weight of each participle word, so that the topic model can be combined with the incidence relation among the texts, the topics and the words to be determined, the first topic condition probability of each text to be determined in different topics is determined based on the topic model, the heat weight of each participle word is determined according to the first topic condition probability, corresponding hot spot events are mined according to the heat weight of each participle word, and the accuracy of determining the hot spot events is improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

fig. 1 is a flowchart illustrating a method of determining a hotspot event according to an exemplary embodiment of the present disclosure;

fig. 2 is a flowchart illustrating another method of determining a hotspot event according to an exemplary embodiment of the present disclosure;

fig. 3 is a block diagram illustrating a first apparatus for determining a hotspot event according to an exemplary embodiment of the disclosure;

fig. 4 is a block diagram illustrating a second apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure;

fig. 5 is a block diagram illustrating a third apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure;

fig. 6 is a block diagram illustrating a fourth apparatus for determining a hot spot event according to an exemplary embodiment of the disclosure;

fig. 7 is a block diagram of an electronic device shown in an exemplary embodiment of the present disclosure.

Detailed Description

The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

Firstly, explaining an application scenario of the present disclosure, as a network is used as a part of life of people, news information can be spread through the network, so as to realize interaction between a user and the news information, if the user interacts with a certain news information frequently, the news information will be used as a hot event, for example, news information in a meeting period in 2017 is taken as an example, the news information includes information 1, information 2, information 3, information 4, information 5, information 6, information 7, and other related information, obtaining a hot event from a large amount of news information has a great value for government workers, business officers, financial researchers, and other related personnel related to public opinion research, that is, obtaining a hot event is beneficial to timely grasping event development dynamics and taking corresponding measures, but at present, the hot event is mainly determined by hot ranking, since the popularity ranking may be performed through manual intervention (e.g., by software or manually by a swipe ranking), determining hotspot events based solely on the popularity ranking is not accurate.

The disclosure provides a method, a device, a storage medium and an electronic device for determining a hot event, wherein a plurality of texts to be determined of news information in a preset time period are obtained, a topic model corresponding to all texts to be determined in the preset time period is obtained, a first topic condition probability that each text to be determined belongs to different topics is obtained through the topic model, a heat weight of each participle word in all texts to be determined is determined according to the first topic condition probability, so that the hot event can be determined from the text to be determined according to the heat weight, and therefore, as the topic model can be combined with the association relation among the texts to be determined, the topics and the words, the first topic condition probability of each text to be determined in different topics is determined based on the topic model, and the heat weight of each participle word is determined according to the first topic condition probability, therefore, corresponding hot events are mined through the heat weight of each word segmentation word, and the accuracy of determining the hot events is improved.

The present disclosure is described in detail below with reference to specific examples.

Fig. 1 is a flowchart illustrating a method for determining a hotspot event according to an exemplary embodiment of the present disclosure, where as shown in fig. 1, the method includes:

s101, obtaining a plurality of texts to be determined in a preset time period.

The text to be determined may be obtained from a web portal, social software, a microblog, a forum, and the like through a crawler technology, and for example, it is described by taking news information of a meeting in 2017 as an example, the news information includes information 1, information 2, information 3, information 4, information 5, information 6, information 7, and other related information, and the text to be determined may include texts corresponding to the related news information 1, information 2, information 3, information 4, information 5, information 6, information 7, and the like, which is described by way of example only, and the disclosure is not limited thereto.

S102, obtaining topic models corresponding to all texts to be determined in the preset time period, and determining first topic conditional probability that each text to be determined belongs to different topics according to the topic models.

In this step, if the preset time period is a time period, the topic model is a model generated according to all texts to be determined in the time period; if the preset time period comprises a plurality of time periods, the topic model comprises a plurality of time periods, and the topic model is respectively a model generated according to all texts to be determined in each time period.

In a possible implementation manner, each text to be determined in the same time period may be subjected to word segmentation processing to obtain at least one word segmentation word, and a preset topic model is trained through the at least one word and a preset topic number to obtain a topic model, where the preset topic model may be a model generated through an L DA (L event dictionary Allocation model) algorithm, and the preset topic model is equivalent to a three-layer bayesian probability model, that is, a three-layer structure of a word, a topic, and a text to be determined, so that a topic model including the text to be determined, a word, and a topic may be generated according to the preset topic model, and thus, a first topic conditional probability that each text to be determined belongs to a different topic and a word conditional probability that at least one word segmentation word in each text to be determined is under a different topic may be determined according to the topic model.

S103, determining the heat weight of each word segmentation word in the text to be determined according to the first subject conditional probability.

In this disclosure, if the popularity weight is larger, the popularity of the segmented word is higher, that is, the attention of the user to the segmented word is higher, and conversely, if the popularity weight is smaller, the popularity of the segmented word is lower, that is, the attention of the user to the segmented word is lower.

And S104, determining a hot event from the plurality of texts to be determined according to the heat weight of each word segmentation word.

By adopting the method, the topic model can be combined with the incidence relation among the texts, the topics and the words to be determined, the first topic condition probability of each text to be determined in different topics is determined based on the topic model, and the heat weight of each word segmentation word is determined according to the first topic condition probability, so that the corresponding hot events are mined according to the heat weight of each word segmentation word, and the accuracy of determining the hot events is improved.

Fig. 2 is a flowchart illustrating a method for determining a hotspot event according to an exemplary embodiment of the present disclosure, where as shown in fig. 2, the method includes:

s201, obtaining a plurality of texts to be determined in a preset time period.

The text to be determined may be obtained from a web portal, social software, a microblog, a forum, and the like through a crawler technology to obtain a text of a target topic, and for example, a news information of a meeting in 2017 is taken as an example for explanation, where the news information includes information 1, information 2, information 3, information 4, information 5, information 6, and information 67, the text to be determined may include texts corresponding to related news information such as information 1, information 2, information 3, information 4, information 5, information 6, information 7, and in a possible implementation manner, if the preset time period is multiple time periods, multiple texts to be determined in each time period may be represented as a text set D, where D ═ { D ═ D₁,d₂,…,d_i,…d_n}，d_iRepresents the ith text to be determined, n represents the total number of the texts to be determined in the same time period, and d_iA title (title) including the i-th text to be determined_i) And body (body)_i) I.e. d_i＝{title_i,body_iAnd determining a first weight of each participle word in all texts to be determined according to the position information of each participle word in each text to be determined in the subsequent step.

S202, obtaining the theme models corresponding to all the texts to be determined in the preset time period.

It should be noted that, if the preset time period is a time period, the topic model is a model generated according to all texts to be determined in the time period; if the preset time period comprises a plurality of time periods, the topic model comprises a plurality of time periods, and the topic model is respectively a model generated according to all texts to be determined in each time period.

In this step, the topic models corresponding to all texts to be determined in the preset time period may be obtained through the following steps:

and S11, performing word segmentation processing on each text to be determined in the preset time period to obtain at least one word segmentation word.

The word segmentation process may include a plurality of methods, such as a character matching method (i.e., a mechanical word segmentation method), specifically, each text to be determined is sequentially matched with entries in a preset dictionary, and if a certain entry corresponding to each text to be determined is found in the preset dictionary, the matching is successful, so as to identify a word. Therefore, in order to solve the problem, in another embodiment of the present disclosure, after performing word segmentation processing on each text to be determined to obtain at least one segmented word, the stop word may be removed, so that words without practical meaning can be removed, and thus, the computational complexity in the subsequent preset topic model training process is reduced under the condition of ensuring the accuracy of determining the hot spot event.

And S12, training a preset topic model according to at least one word segmentation word to obtain a topic model.

In this step, the topic model includes topics with a preset topic number, the preset topic number may be generally determined according to the number of texts of the text to be determined, generally, the preset topic number may be set to 50 to 200, the preset topic model may be a model generated by an L DA (L event Dirichlet Allocation model) algorithm, the preset topic model is equivalent to a three-layer bayesian probability model, that is, a three-layer structure of participle words, topics and the text to be determined, in this way, a topic model including the text to be determined, words and topics may be generated according to the preset topic model, so that a first topic conditional probability that each text to be determined belongs to a different topic and a word conditional probability that at least one participle word in each text to be determined belongs to a different topic may be determined according to the topic model.

It should be noted that after the at least one word segmentation word is obtained, since different texts to be determined may include the same word segmentation word, when a preset topic model is trained according to the at least one word segmentation word, the same word segmentation word is repeatedly trained, so that the processing efficiency is reducedFor example, a word set W of the preprocessed word-segmented words can be formed by combining at least one word-segmented word of each text to be determined, where W is { W ═₁,w₂,…,w_l,…w_c}，w_lThe word segmentation method comprises the steps of representing the ith preprocessed word segmentation word, wherein any two preprocessed word segmentation words are different, so that the heat weight of each word in the word set can be sequentially obtained in the subsequent steps.

S203, determining the first topic conditional probability of each text to be determined belonging to different topics according to the topic model.

Exemplarily, the ith text to be determined belongs to a subject t_pMay be expressed as p (t)_p/d_i) Wherein, t_pDenotes the p-th topic, d_iRepresents the ith text to be determined, so that the first subject conditional probability set of the ith text to be determined can be represented as { p (t)₁/d_i),p(t₂/d_i),…,p(t_p/d_i),…,p(t_k/d_i) And determining the first topic conditional probability set of each text to be determined, so that in the subsequent step, the sum value of the first topic conditional probabilities corresponding to the same topic can be calculated according to the first topic conditional probability set of each text to be determined, thereby obtaining the topic probability corresponding to the same topic.

S204, determining the occurrence probability of at least one word segmentation word in the corresponding text to be determined.

Wherein, the word subset of the at least one word segmentation word in the ith text to be determined can be represented as w_i，w_i＝{w_1i,w_2i,…w_ji,…,w_zi}，w_jiRepresenting the jth participle word in the ith text to be determined, z-tableShowing the total number of participle words in the ith text to be determined, wherein any two participle words in the word subset are different, and the jth participle word w in the word subset_jiThe calculation formula of the occurrence probability in the ith text to be determined is as follows:

count(w_ji) Represents the jth participle word w_jiThe number of occurrences in the ith text to be determined.

S205, calculating the sum value of the first theme conditional probabilities of the same theme to obtain the theme probability corresponding to the same theme.

In this step, a subject t is calculated_pThe formula of the topic probability of (1) is:

wherein, P (t)_p) Representing a topic t_pSubject probability of p (t)_p/d_i) Indicating that the ith text to be determined belongs to the subject t_pThe first subject conditional probability.

S206, obtaining word conditional probability of at least one word segmentation word in each text to be determined under different subjects according to the subject model.

The method comprises the steps that a three-layer Bayesian probability model comprising words, topics and texts to be determined is adopted on the basis of a topic model, therefore, word conditional probabilities of at least one participle word in each text to be determined under different topics can be obtained according to the constructed topic model, and exemplarily, the jth participle word w in the ith text to be determined_jiAt topic t_pThe conditional probability of a word occurring below can be expressed as p (w)_ji/t_p)。

And S207, determining a second topic conditional probability according to the topic probability, the occurrence probability and the word conditional probability.

In this step, the calculation formula of the conditional probability of the second topic is:

wherein, p (t)_p/w_ji) Indicating that the jth participle word in the ith text to be determined belongs to the subject t_pP (w) of the second topic conditional probability of_ji/t_p) Indicating that the jth participle word in the ith text to be determined is on the subject t_pConditional probability of word appearing below, P (t)_p) Representing a topic t_pSubject probability of p (w)_ji) And expressing the occurrence probability of the jth participle word in the ith text to be determined.

S208, determining the theme weight of at least one word segmentation word in each text to be determined according to the first theme conditional probability and the second theme conditional probability.

The calculation formula of the theme weight is as follows: t is t_wji＝p(t_p/d_i)*p(t_p/w_ji)，t_wjiRepresenting the subject weight of the jth participle word in the ith text to be determined, p (t)_p/d_i) Indicating that the ith text to be determined belongs to the subject t_pP (t) is the first subject conditional probability of_p/w_ji) Representing a topic t_pAnd a second subject conditional probability of occurring under the jth participle word in the ith text to be determined.

S209, determining the heat weight of each word segmentation word according to the theme weight.

In this step, since the text to be determined in one preset time period may be acquired in step S201, or the texts to be determined in a plurality of preset time periods may be acquired, for the preset time periods of different data, this step may determine the heat weight in the following different manners.

If the preset time period comprises a time period, acquiring a first weight of each word segmentation word in all the texts to be determined through a weight acquisition step, and determining that the first weight is a heat weight.

In one possible implementation manner, the weight obtaining step includes: acquiring position information of each word segmentation word in each text to be determined; the position information comprises a text title position or a text body position, because the ith text d to be determined_iA title (title) including the i-th text to be determined_i) And body (body)_i) I.e. d_i＝{title_i,body_iThus, can be according to d_iDetermining the position information of the participle word in the ith text to be determined, and when the position information of the participle word is the text title position, determining the product of the theme weight of the participle word and a preset parameter (if the preset parameter is 2) as a second weight of the participle word in each text to be determined; when the position information of the word-dividing word is the text body position, determining the subject weight of the word-dividing word as a second weight of the word-dividing word in each text to be determined; and respectively calculating the average value of the second weights of the same participle word in all the texts to be determined as the first weight of the same participle word, and determining the obtained first weight as the heat weight.

If the preset time period is multiple time periods, respectively obtaining a first weight of each word segmentation word in all the texts to be determined in each time period through a weight obtaining step, and obtaining a heat weight of each word segmentation word according to the first weight, wherein the method of the weight obtaining step can refer to the process of the weight obtaining step and is not repeated.

Determining a third weight of the same participle word according to the first weight corresponding to the same participle word in each time period; and determining the heat weight of each participle word according to the third weight and the first weight of each participle word.

For example, the preset time period may include three time periods, namely a first time period, a second time period and a third time period, where the first time period may be a current time period, the second time period may be a time period which includes the first time period and is longer than the first time period, and the third time period may be a time period which includes the second time period and is longer than the second time period, for example, the first time period may be the present week, the second time period may be the present week and the previous week of the present week, and the third time period may be the present week and the previous two weeks of the present weekEach participle word in all texts to be determined in the first time period corresponds to the first weight of the first time period, each participle word in all texts to be determined in the second time period corresponds to the first weight of the second time period, and each participle word in all texts to be determined in the third time period corresponds to the first weight of the third time period, so that the third weight of the same participle word can be determined according to the three first weights corresponding to the same participle word in the three time periods, and the calculation formula of the third weight of the participle word can be:

wherein, ww_qA third weight corresponding to the qth participle word; b₁w_qA first weight corresponding to the first time period for the qth participle word; b₂w_qA first weight corresponding to the second time period for the qth participle word; b₃w_qA first weight corresponding to a third time period for the qth participle word; a is a first preset value; b is a second preset value; c is a third predetermined value, e.g., a is 0.3, b is 0.4, and c is 0.3.

After the third weight is obtained, in order to calculate the heat weight of each participle word in the first time period (i.e. the current time period), the heat weight needs to be obtained by combining the third weight calculated according to the same participle word in the three time periods and the first weight in the first time period, and in this embodiment, the calculation formula of the heat weight of each participle word in the first time period may be obtained by the following formula: hw_q＝α*b₁w_q+β*ww_qWherein, hw_qRepresenting the heat weight of the qth participle word; b₁w_qA first weight representing that the qth participle term corresponds to the first time period; ww_qA third weight representing the q word-segmentation word, α a fourth preset value (for example, α is 0.25), β a fifth preset value (for example, β is 0.75), so that the heat can be obtained by combining the third weight and the first weight corresponding to the first time period through the formulaAnd the degree weight comprehensively considers the first weight corresponding to the second time period and the first weight corresponding to the third time period.

It should be noted that the first preset value, the second preset value, the third preset value, the fourth preset value, and the fifth preset value are obtained by repeated experiments, and a + b + c is 1, and α + β is 1.

S210, obtaining hot words with preset word quantity according to the heat weight of each word segmentation word.

In the step, the word-segmentation words are sorted in a descending order according to the heat weight to obtain word ranking, and word-segmentation words with the word ranking less than or equal to the preset word number are obtained according to word ranking results and serve as hot words.

S211, obtaining the clauses to be determined containing the hot words from all the texts to be determined.

The punctuation marks in each text to be determined can be used as dividing points to perform clause processing on the text to be determined to obtain a plurality of initial clauses, so that whether the hot words exist in the initial clauses in each text to be determined is determined in sequence, if the hot words exist in the initial clauses, the initial clauses are determined to be the clauses to be determined, the clauses to be determined are reserved, and if the hot words do not exist in the initial clauses, the initial clauses are filtered.

And S212, determining hot clauses according to the clauses to be determined.

In this step, the hot clause may be determined by:

and S21, sequencing a plurality of clause words included in the clause to be determined in a descending order according to the theme weight to obtain a sequencing result.

S22, determining whether the weight ranking of the hotness words in the ranking result is less than or equal to a preset ranking.

When the weight ranking of the hotness word in the ranking result is less than or equal to the preset ranking, executing step S23;

and when the weight ranking of the hot words in the ranking result is greater than the preset ranking, ignoring the clause to be determined.

And S23, determining the clause to be determined as a target clause, and acquiring a hot clause from the target clause.

In this step, if the sentence set of the obtained target sentences includes { S1, S2, …, S_nAnd calculating the similarity between each target clause in the clause set and other target clauses except the target clause, so as to obtain the similarity and value of each target clause, where the target clause corresponding to the maximum value of the similarity and value is the hot clause, and a specific calculation formula is as follows:

wherein x represents that the hot spot clause is the x-th target clause, u represents the total number of the target clauses, sim (S)_d,S_-d) Representing the similarity between two target clauses, S_dIndicating the d-th target clause, S_-dRepresenting other target clauses except the d-th target clause, illustratively, the similarity between the d-th target clause and the r-th target clause is calculated by the formula

S_d∩S_rIndicating the number of the same Chinese characters in the d-th target clause and the r-th target clause, S_d∪S_rAnd the number of the non-repeated Chinese characters in the d-th target clause and the r-th target clause is shown.

S213, determining the hot word and the hot clause as the hot event.

Thus, a hot clause corresponding to each hot term can be obtained, and the hot term and the hot clause are combined and displayed, for example, the obtained hot term is "a certain target task is attacked and hardened", the hot clause corresponding to "the certain target task is determined to be" a lead focus certain target task "through the steps S211 to S212, a problem guidance is highlighted, a requirement is provided for the subtask 1, the subtask 2, the subtask 3 and the like, an action number for completing the certain target task without moving firmly is issued, so that a hot event is accurately determined through the hot term and the hot clause, and the hot term and the hot clause are combined and displayed to a user, so that the user obtains an accurate hot event, and the above example is only an example and is not limited by the present disclosure.

Fig. 3 is a block diagram illustrating an apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure, as shown in fig. 3, the apparatus includes:

an obtaining module 301, configured to obtain multiple texts to be determined within a preset time period;

the processing module 302 is configured to obtain topic models corresponding to all the texts to be determined within the preset time period, and determine a first topic conditional probability that each text to be determined belongs to a different topic according to the topic model; the theme model comprises a plurality of themes;

a first determining module 303, configured to determine a heat weight of each word segmentation word in all the texts to be determined according to the first topic conditional probability;

a second determining module 304, configured to determine a hot event from the plurality of texts to be determined according to the heat weight of each word-segmentation word.

Fig. 4 is a block diagram of an apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure, and as shown in fig. 4, the processing module 302 includes:

the processing submodule 3021 is configured to perform word segmentation processing on each text to be determined within the preset time period to obtain at least one word segmentation word;

the training submodule 3022 is configured to train a preset topic model through at least one word-segmentation word to obtain a topic model.

Fig. 5 is a block diagram of an apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure, and as shown in fig. 5, the first determining module 303 includes:

the first obtaining submodule 3031 is configured to obtain a second topic conditional probability that at least one word-segmentation word in each text to be determined belongs to a different topic;

a first determining submodule 3032, configured to determine a topic weight of at least one word segmentation word in each text to be determined according to the first topic conditional probability and the second topic conditional probability;

the second determining submodule 3033 is configured to determine a popularity weight of each of the participle terms according to the topic weight.

Optionally, the first obtaining sub-module 3031 is configured to determine an occurrence probability of at least one word segmentation word in the corresponding text to be determined; calculating the sum value of the conditional probabilities of the first subjects corresponding to the same subject to obtain the subject probability corresponding to the same subject; obtaining word conditional probability of at least one word segmentation word in each text to be determined under different topics according to the topic model; and determining a second topic conditional probability according to the topic probability, the occurrence probability and the word conditional probability.

Optionally, when the preset time period includes a time period, the second determining submodule 3033 is configured to obtain, through the weight obtaining step, a first weight of each word segmentation word in all the texts to be determined, and determine that the first weight is the heat weight.

When the preset time period includes a plurality of time periods, the second determining submodule 3033 is configured to respectively obtain, through the weight obtaining step, a first weight of each word segmentation word in all the texts to be determined in each time period; and acquiring the heat weight of each word segmentation word according to the first weight.

Optionally, the weight obtaining step includes: acquiring the position information of each word segmentation word in each text to be determined; the position information comprises a text title position or a text body position;

when the position information of the word-dividing word is the text title position, determining the product of the theme weight and the preset parameter of the word-dividing word as a second weight of the word-dividing word in each text to be determined;

when the position information of the word-dividing word is the text body position, determining the subject weight of the word-dividing word as a second weight of the word-dividing word in each text to be determined;

Optionally, the second determining submodule 3033 is configured to determine a third weight of the same participle word according to the first weight corresponding to the same participle word in each time period;

and determining the heat weight of each participle word according to the third weight and the first weight of each participle word.

Fig. 6 is a block diagram of an apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure, where the hotspot event includes a hotspot word and a hotspot clause, and as shown in fig. 6, the second determining module 304 includes:

a second obtaining sub-module 3041, configured to obtain hot words with a preset number of words according to the heat weight of each word-dividing word;

a third obtaining submodule 3042, configured to obtain to-be-determined clauses including the hot word from all the to-be-determined texts;

a sorting submodule 3043, configured to sort, in a descending order, the multiple clause terms included in the clause to be determined according to the theme weight to obtain a sorting result;

a third determining submodule 3044, configured to determine, when the weight rank of the hotness word in the sorting result is less than or equal to the preset rank, that the clause to be determined is a target clause, and obtain a hot clause from the target clause;

a fourth determining submodule 3045 configured to determine that the hot word and the hot clause are the hot event.

By adopting the device, the topic model can be combined with the incidence relation among the texts, the topics and the words to be determined, the first topic condition probability of each text to be determined in different topics is determined based on the topic model, and the heat weight of each word segmentation word is determined according to the first topic condition probability, so that the corresponding hot events are mined according to the heat weight of each word segmentation word, and the accuracy of determining the hot events is improved.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 7 is a block diagram of an electronic device 700 shown in an exemplary embodiment of the present disclosure. As shown in fig. 7, the electronic device 700 may include: a processor 701, a memory 702, multimedia components 703, input/output (I/O) interfaces 704, and communication components 705.

The processor 701 is configured to control the overall operation of the electronic device 700, so as to complete all or part of the steps in the above-described method for determining a hot spot event. The memory 702 is used to store various types of data to support operation of the electronic device 700, such as instructions for any application or method operating on the electronic device 700 and application-related data. The Memory 702 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 703 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 702 or transmitted through the communication component 705. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 705 may include: Wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable logic devices (Programmable L ic devices, P L D), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components for performing the above-described method of determining hotspot events.

In another exemplary embodiment, a computer readable storage medium comprising program instructions, such as the memory 702 comprising program instructions, which are executable by the processor 701 of the electronic device 700 to perform the above-described method of determining a hotspot event is also provided.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method of determining a hotspot event, the method comprising:

acquiring a plurality of texts to be determined in a preset time period;

determining a hot event from the texts to be determined according to the heat weight of each word segmentation word, wherein the hot event comprises a hot word and a hot clause;

wherein the determining the heat weight of each participle word in all the texts to be determined according to the first topic conditional probability comprises:

determining the heat weight of each word segmentation word according to the theme weight;

the determining a hot event from the plurality of texts to be determined according to the heat weight of each word segmentation word comprises:

when the weight ranking of the hot words in the ranking result is less than or equal to a preset ranking, determining the clause to be determined as a target clause, and acquiring a hot clause from the target clause;

and determining the hot words and the hot clauses as the hot events.

2. The method according to claim 1, wherein the obtaining a second topic conditional probability that at least one of the participle words in each of the texts to be determined belongs to a different topic comprises:

3. The method of claim 2, wherein when the preset time period comprises a time period, the determining the heat weight of each word segmentation word according to the theme weight comprises: acquiring a first weight of each word segmentation word in all the texts to be determined through a weight acquisition step, and determining the first weight as the heat weight;

when the preset time period comprises a plurality of time periods, the determining the heat weight of each word segmentation word according to the theme weight comprises the following steps: respectively obtaining a first weight of each word segmentation word in all the texts to be determined in each time period through a weight obtaining step, and obtaining the heat weight of each word segmentation word according to the first weight.

4. The method of claim 3, wherein the weight obtaining step comprises:

5. The method according to claim 3 or 4, wherein the obtaining the heat weight of each word segmentation word according to the first weight comprises:

6. An apparatus for determining a hotspot event, the apparatus comprising:

the processing module is used for acquiring topic models corresponding to all the texts to be determined in the preset time period and determining a first topic conditional probability that each text to be determined belongs to different topics according to the topic models; the theme model comprises a plurality of themes; the first determining module is used for determining the heat weight of each word segmentation word in all the texts to be determined according to the first subject conditional probability;

the second determining module is used for determining a hot event from the texts to be determined according to the heat weight of each word segmentation word, wherein the hot event comprises a hot word and a hot clause;

wherein the first determining module comprises:

the second determining submodule is used for determining the heat weight of each word segmentation word according to the theme weight;

the second determining module includes:

a third determining submodule, configured to determine that the clause to be determined is a target clause when the weight rank of the hot word in the sorting result is less than or equal to a preset rank, and obtain a hot clause from the target clause;

7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.

8. An electronic device, comprising:

the computer-readable storage medium recited in claim 7; and

one or more processors to execute the program in the computer-readable storage medium.