CN108228808A

CN108228808A - Determine the method, apparatus of focus incident and storage medium and electronic equipment

Info

Publication number: CN108228808A
Application number: CN201711484349.7A
Authority: CN
Inventors: 董超; 崔朝辉; 赵立军; 张霞
Original assignee: Neusoft Corp
Current assignee: Neusoft Corp
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2018-06-29
Anticipated expiration: 2037-12-29
Also published as: CN108228808B

Abstract

This disclosure relates to the method, apparatus and storage medium and electronic equipment, this method of a kind of determining focus incident include：Obtain multiple texts to be determined in preset time period；The corresponding topic model of whole text to be determined in the preset time period is obtained, and determines that each text to be determined belongs to the first theme conditional probability of different themes according to topic model；Topic model includes multiple themes；The temperature weight of each participle word in the first theme condition determine the probability all text to be determined；Focus incident is determined from multiple texts to be determined according to the temperature weight of each participle word.

Description

Determine the method, apparatus of focus incident and storage medium and electronic equipment

Technical field

This disclosure relates to information technology field, and in particular, to a kind of method, apparatus of determining focus incident and storage are situated between Matter and electronic equipment.

Background technology

With the rapid proliferation of internet, the social influence of network is expanding increasingly, and user can pass through portal website, society The various modes such as software, microblogging, forum is handed over to go to obtain news information, and express oneself viewpoint to news information, it is this frequent Interactive process will generate some topics commons among different users, this topics common is referred to as focus incident.

At present, it determines that all news informations can be regarded as to a set during focus incident, passes through cluster Mode go the news information of same type polymerizeing, and respectively in same type news information carry out temperature row Name, and using temperature media event in the top as the temperature event of the type, wherein, determine that the factor of the temperature ranking can To be quantity of access times and comment text etc., due to the temperature ranking can by manual intervention (such as by software or Person manually carries out brush ranking), in this way, determining that focus incident is inaccurate only according to the temperature ranking.

Invention content

To solve the above-mentioned problems, the present disclosure proposes a kind of method, apparatus of determining focus incident and storage medium with And electronic equipment.

According to the embodiment of the present disclosure in a first aspect, provide a kind of method of determining focus incident, the method includes：

Obtain multiple texts to be determined in preset time period；

The corresponding topic model of whole in the preset time period text to be determined is obtained, and according to the theme Model determines that each text to be determined belongs to the first theme conditional probability of different themes；The topic model includes more A theme；

The temperature of each participle word in the first theme condition determine the probability all text to be determined Weight；

Focus incident is determined from multiple texts to be determined according to the temperature weight of each participle word.

Optionally, the corresponding topic model of the whole text to be determined obtained in preset time period includes：

Each text to be determined in the preset time period is subjected to word segmentation processing and obtains at least one participle word Language；

Preset themes model is trained to obtain topic model by least one participle word.

Optionally, each participle in the first theme condition determine the probability all text to be determined The temperature weight of word includes：

Obtain the second theme that at least one of each text to be determined participle word belongs to different themes Conditional probability；

At least one participle word is determined according to the first theme conditional probability and the second theme conditional probability Topic weights of the language in each text to be determined；

The temperature weight of each participle word is determined according to the topic weights.

Optionally, at least one of each text to be determined participle word that obtains belongs to different themes Second theme conditional probability include：

Determine probability of occurrence of at least one participle word in the corresponding text to be determined；

The theme corresponding with the same subject is worth to for calculating the corresponding first theme conditional probability of same subject is general Rate；

At least one of each text to be determined participle word is obtained in difference according to the topic model Word conditional probability under theme；

Determine that second theme condition is general according to the theme probability and the probability of occurrence and the word conditional probability Rate.

Optionally, the preset time period include a period when, it is described determined according to the topic weights it is each The temperature weight of the participle word includes：

The first weight of each participle word in all texts to be determined is obtained by Weight Acquisition step, It is the temperature weight to determine first weight.

It is described to determine each participle according to the topic weights when the preset time period includes multiple periods The temperature weight of word includes：

It is obtained respectively by Weight Acquisition step each in the whole text to be determined in each period First weight of the participle word；

According to the temperature weight of each participle word of first Weight Acquisition.

Optionally, the Weight Acquisition step includes：

Obtain location information of each participle word in each text to be determined；The location information includes text This caption position or text body position；

When the location information of the participle word is the text header position, the master of the participle word is determined Inscribe second weight of the product of weight and parameter preset for the participle word in each text to be determined；

When the location information of the participle word is the text body position, the master of the participle word is determined Inscribe second weight of the weight for the participle word in each text to be determined；

It is same to calculate the average value of second weight of the same participle word in all texts to be determined respectively First weight of the one participle word.

Optionally, the temperature weight according to each participle word of first Weight Acquisition includes：

The same participle is determined according to same corresponding first weight of word that segments in each period The third weight of word；

Each participle word is determined according to the third weight of each participle word and first weight The temperature weight.

Optionally, the focus incident includes hot spot word and hot spot subordinate sentence, the temperature of each word of the basis Weight determines that focus incident includes from multiple texts to be determined：

The hot spot word of word quantity is preset according to the temperature Weight Acquisition of each participle word；

The subordinate sentence to be determined for including the hot spot word is obtained from all texts to be determined；

Multiple subordinate sentence words that the subordinate sentence to be determined includes are carried out descending sort according to topic weights to be sorted As a result；

In the temperature word when the weight ranking of the ranking results is less than or equal to default ranking, determine described Subordinate sentence to be determined is target subordinate sentence, and hot spot subordinate sentence is obtained from the target subordinate sentence；

Determine that the hot spot word and the hot spot subordinate sentence are the focus incident.

According to the second aspect of the embodiment of the present disclosure, a kind of device of determining focus incident is provided, described device includes：

Acquisition module, for obtaining multiple texts to be determined in preset time period；

Processing module, for obtaining the corresponding topic model of the text to be determined of the whole in the preset time period, And determine that each text to be determined belongs to the first theme conditional probability of different themes according to the topic model；The master Topic model includes multiple themes；

First determining module, for every in the first theme condition determine the probability all text to be determined The temperature weight of a participle word；

Second determining module, for each temperature weight for segmenting word of basis from multiple texts to be determined Determine focus incident.

Optionally, the processing module includes：

Submodule is handled, is obtained for each text to be determined in the preset time period to be carried out word segmentation processing At least one participle word；

Training submodule, for preset themes model being trained to obtain topic model by least one participle word.

Optionally, first determining module includes：

First acquisition submodule belongs to for obtaining at least one of each text to be determined participle word The second theme conditional probability of different themes；

First determination sub-module, for being determined according to the first theme conditional probability and the second theme conditional probability Topic weights of at least one participle word in each text to be determined；

Second determination sub-module, for determining the temperature weight of each participle word according to the topic weights.

Optionally, first acquisition submodule, for determining that at least one participle word described is treated corresponding Determine the probability of occurrence in text；Calculate the corresponding with the same subject is worth to of the first theme conditional probability of same subject Theme probability；At least one of each text to be determined participle word is obtained not according to the topic model With the word conditional probability under theme；It is determined according to the theme probability and the probability of occurrence and the word conditional probability Second theme conditional probability.

Optionally, when the preset time period includes a period, second determination sub-module, for passing through power Weight obtaining step obtains the first weight of each participle word in all texts to be determined, determines first power Weight is the temperature weight.

When the preset time period includes multiple periods, second determination sub-module, for passing through Weight Acquisition Step obtains the first power of each participle word in the whole text to be determined in each period respectively Weight；According to the temperature weight of each participle word of first Weight Acquisition.

Optionally, the Weight Acquisition step includes：Each participle word is obtained in each text to be determined Location information；The location information includes text header position or text body position；

Optionally, second determination sub-module, for according to the same participle word in each period Corresponding first weight determines the third weight of the same participle word；

Optionally, the focus incident includes hot spot word and hot spot subordinate sentence, and second determining module includes：

Second acquisition submodule, for presetting the heat of word quantity according to the temperature Weight Acquisition of each participle word Point word；

Third acquisition submodule, it is to be determined comprising the hot spot word for being obtained from all texts to be determined Subordinate sentence；

Sorting sub-module, for multiple subordinate sentence words that the subordinate sentence to be determined includes to be dropped according to topic weights Sequence sorts to obtain ranking results；

Third determination sub-module, for being less than or equal in the temperature word in the weight ranking of the ranking results During default ranking, the subordinate sentence to be determined is determined as target subordinate sentence, and hot spot subordinate sentence is obtained from the target subordinate sentence；

4th determination sub-module, for determining that the hot spot word and the hot spot subordinate sentence are the focus incident.

According to the third aspect of the embodiment of the present disclosure, a kind of computer readable storage medium is provided, is stored thereon with calculating The step of machine program, which realizes above-mentioned first aspect the method when being executed by processor.

According to the fourth aspect of the embodiment of the present disclosure, a kind of electronic equipment is provided, including：

Computer readable storage medium described in the above-mentioned third aspect；And

One or to a processor, for performing the program in the computer readable storage medium.

Through the above technical solutions, obtain multiple texts to be determined in preset time period；Obtain the preset time period The corresponding topic model of the interior whole text to be determined, and each text to be determined is determined according to the topic model Belong to the first theme conditional probability of different themes；The topic model includes multiple themes；According to first title bar The temperature weight of the whole each participle word in the text to be determined of part determine the probability；According to each participle word Temperature weight determines focus incident from multiple texts to be determined, in this way, since topic model can combine text to be determined Incidence relation between sheet, theme, word determines first of each text to be determined in different themes based on the topic model Theme conditional probability, and each temperature weight for segmenting word is gone out according to the first theme condition determine the probability, so as to by each The corresponding focus incident of temperature weighted mining of word is segmented, improves the accuracy rate of determining focus incident.

Other feature and advantage of the disclosure will be described in detail in subsequent specific embodiment part.

Description of the drawings

Attached drawing is for providing further understanding of the disclosure, and a part for constitution instruction, with following tool Body embodiment is used to explain the disclosure, but do not form the limitation to the disclosure together.In the accompanying drawings：

Fig. 1 is a kind of flow chart of the method for determining focus incident shown in disclosure exemplary embodiment；

Fig. 2 determines the flow chart of the method for focus incident for the another kind shown in disclosure exemplary embodiment；

Fig. 3 is the device block diagram of the first determining focus incident shown in disclosure exemplary embodiment；

Fig. 4 is the device block diagram of second of determining focus incident shown in disclosure exemplary embodiment；

Fig. 5 is the device block diagram of the third determining focus incident shown in disclosure exemplary embodiment；

Fig. 6 is the device block diagram of the 4th kind of determining focus incident shown in disclosure exemplary embodiment；

Fig. 7 is the block diagram of a kind of electronic equipment shown in disclosure exemplary embodiment.

Specific embodiment

The specific embodiment of the disclosure is described in detail below in conjunction with attached drawing.It should be understood that this place is retouched The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.

First, the application scenarios of the disclosure are illustrated, as network makees a part for people's lives, news information It can be propagated by network, so as to fulfill the interaction between user and news information, if user is to a certain news information Interaction it is more frequent, then the news information will be used as focus incident, illustratively, using the news information during certain meeting in 2017 as , include the relevant informations such as information 1, information 2, information 3, information 4, information 5, information 6, information 7 in the news information, from Obtained in a large amount of news information focus incident for government work person, Enterprise Public Relations personnel, Financial Research personnel and other Research related personnel has very big value with public opinion, i.e., is conducive to hold event development trend in time by obtaining focus incident, And corresponding measure is taken in time, still, focus incident is mainly determined by temperature ranking at present, since the temperature ranking can lead to Manual intervention (such as carrying out brush ranking by software or manually) is crossed, in this way, determining focus incident only according to the temperature ranking It is inaccurate.

The present disclosure proposes the method, apparatus and storage medium and electronic equipment of a kind of determining focus incident, by obtaining The text to be determined of multiple news informations in preset time period is taken, and obtains the whole text to be determined in the preset time period Corresponding topic model, the first theme condition for belonging to different themes by each text to be determined of topic model acquisition are general Rate, the temperature weight of each participle word in the first theme condition determine the probability all text to be determined, so as to With according to the temperature weight from the text to be determined determine focus incident, in this way, due to topic model can combine it is to be determined Incidence relation between text, theme, word determines of each text to be determined in different themes based on the topic model One theme conditional probability, and each temperature weight for segmenting word is gone out according to the first theme condition determine the probability, so as to by every The corresponding focus incident of temperature weighted mining of a participle word improves the accuracy rate of determining focus incident.

The disclosure is described in detail with reference to specific embodiment.

Flow diagrams of the Fig. 1 for a kind of method of determining focus incident shown in disclosure exemplary embodiment, such as Fig. 1 Shown, this method includes：

Multiple texts to be determined in S101, acquisition preset time period.

Wherein, which can be by crawler technology from acquisitions such as portal website, social software, microblogging, forums To the text of target topic, illustratively, illustrated by taking the news information during the target topic is certain meeting in 2017 as an example, it should Include the relevant informations such as information 1, information 2, information 3, information 4, information 5, information 6, information 7 in news information, then this is treated really The corresponding texts of related news information such as information 1, information 2, information 3, information 4, information 5, information 6, information 7 can be included by determining text This, above-mentioned example is merely illustrative, and the disclosure is not construed as limiting this.

S102, all corresponding topic models of the text to be determined, and according to the theme in the preset time period are obtained Model determines that each text to be determined belongs to the first theme conditional probability of different themes.

In this step, if the preset time period be a period, the topic model be according to this when Between whole text generation to be determined in section model；If the preset time period includes multiple periods, the topic model Including multiple, and the topic model is respectively the model according to the whole text generation to be determined in each period.

In a kind of possible realization method, each text to be determined in the same period can be carried out at participle Reason obtains at least one participle word, and passes through at least one word and preset themes quantity training preset themes model obtains Topic model, wherein, which can be by LDA (Latent Dirichlet Allocation；Potential Di Sharp Cray distribution model) algorithm generation model, which is equivalent to three layers of bayesian probability model, as The three-decker of word, theme and text to be determined, in this way, can be generated according to the preset themes model comprising the text to be determined Originally, the topic model of word and theme, so as to determine that each text to be determined belongs to different masters according to the topic model Word of at least one of first theme conditional probability of topic and each text to be determined the participle word under different themes Conditional probability.

S103, each temperature for segmenting word in the first theme condition determine the probability all text to be determined Weight.

In the disclosure, if the temperature weight is bigger, the temperature of the participle word is higher, i.e., user is to the participle word Attention rate it is higher, if on the contrary, the temperature weight is smaller, the temperature of the participle word is lower, i.e., user is to the participle word Attention rate it is lower.

S104, focus incident is determined from multiple texts to be determined according to the temperature weight of each participle word.

Using the above method, since topic model can combine the incidence relation between text to be determined, theme, word, First theme conditional probability of each text to be determined in different themes is determined, and according to the first theme based on the topic model Conditional probability determines the temperature weight of each participle word, so as to corresponding by the temperature weighted mining of each participle word Focus incident improves the accuracy rate of determining focus incident.

Flow diagrams of the Fig. 2 for a kind of method of determining focus incident shown in disclosure exemplary embodiment, such as Fig. 2 Shown, this method includes：

Multiple texts to be determined in S201, acquisition preset time period.

Wherein, which can be by crawler technology from acquisitions such as portal website, social software, microblogging, forums To the text of target topic, illustratively, illustrated by taking the news information during the target topic is certain meeting in 2017 as an example, it should Include the relevant informations such as information 1, information 2, information 3, information 4, information 5, information 6, information 7 in news information, then this is treated really The corresponding texts of related news information such as information 1, information 2, information 3, information 4, information 5, information 6, information 7 can be included by determining text This,, can will be multiple in each period if the preset time period is multiple periods in a kind of possible realization method The text representation to be determined is text collection D, D={ d₁,d₂,…,d_i,…d_n, d_iRepresent i-th of text to be determined, n is represented should The total quantity of multiple texts to be determined in the same period, and d_iInclude the title of this i-th text to be determined (title_i) and text (body_i), i.e. d_i={ title_i,body_i, in this way, to segment word according to each in subsequent step Each the first weight for segmenting word in the location information of each text to be determined determines all texts to be determined, it is above-mentioned to show Example is merely illustrative, and the disclosure is not construed as limiting this.

S202, all corresponding topic models of the text to be determined in the preset time period are obtained.

It should be noted that if when the preset time period is a period, then this according to the topic model is One model of the whole text generation to be determined in the period；If the preset time period includes multiple periods, the master Model is inscribed including multiple, and the topic model is respectively the model according to the whole text generation to be determined in each period.

In this step, the corresponding master of whole text to be determined in the preset time period can be obtained by following steps Inscribe model：

S11, each text progress word segmentation processing to be determined in the preset time period is obtained at least one participle word Language.

Wherein, word segmentation processing can include a variety of methods, such as the method (i.e. mechanical segmentation method) of character match, specifically Ground matches each text to be determined with the entry in default dictionary successively, if being found in this presets dictionary each Some corresponding entry of the text to be determined, then successful match, so as to identify a word, it should be noted that due to one A little words as " " " " " " " " etc. stop words, practical meaning is had no, it is merely meant that the tone, if these words also added The problem of entering into subsequent preset themes model training, computation complexity being caused excessive, so as to occupy at more data Manage resource.Therefore, in order to solve this problem, in another embodiment of the disclosure, segmented to each text to be determined After processing obtains at least one participle word, stop words can be removed, in such manner, it is possible to which the word of no practical significance is gone Fall, so as in the case where ensureing to determine focus incident accuracy, reduce the calculating during follow-up preset themes model training Complexity.

S12, preset themes model is trained to obtain topic model according at least one participle word.

In this step, which includes the theme of preset themes quantity, usually can be according to text to be determined Amount of text determines the preset themes quantity, and usually, which could be provided as 50~200, the preset themes Model can be by LDA (Latent Dirichlet Allocation；Latent Dirichletal location model) algorithm generation Model, the preset themes model are equivalent to three layers of bayesian probability model, as segment word, theme and text to be determined Three-decker, in this way, the theme for including the text to be determined, word and theme can be generated according to the preset themes model Model, so as to determine that each text to be determined belongs to the first theme conditional probability of different themes according to the topic model, And word conditional probability of at least one of each text to be determined participle word under different themes.

It should be noted that after at least one participle word is got, due to can in the different texts to be determined Identical participle word can be included, in this way so that when according at least one participle word training preset themes model, exist It,, can will each this be to be determined in order to avoid the above problem so as to reduce treatment effeciency to same participle word repetition training At least one participle word of text merges processing and duplicate removal pretreatment so that obtained all texts to be determined There is no dittograph at least one participle word, and can be according to pretreated participle word training preset themes model Topic model is obtained, the temperature weight of the pretreated participle word can be obtained in subsequent step, is avoided to same point Word word computes repeatedly temperature weight, so as to improve computational efficiency, for example, by merging at least the one of each text to be determined It is a to segment word so as to form set of words W, the W={ w of the pretreated participle word₁,w₂,…,w_l,…w_c, w_l L-th of pretreated participle word is represented, and the pretreated participle word of any two is different, so as in subsequent step In can obtain the temperature weight of each word in the set of words successively.

S203, determined according to the topic model each text to be determined belong to different themes the first theme condition it is general Rate.

Illustratively, i-th of text to be determined belongs to theme t_pThe first theme conditional probability can be expressed as p (t_p/d_i), Wherein, t_pRepresent p-th of theme, d_iI-th of text to be determined is represented, in this way, the first title bar of i-th text to be determined Part Making by Probability Sets can be expressed as { p (t₁/d_i),p(t₂/d_i),…,p(t_p/d_i),…,p(t_k/d_i), in this way, every by determining First theme set of conditional probabilities of a text to be determined, thus in subsequent step, it can be according to each text to be determined It is that this first theme set of conditional probabilities calculates the corresponding first theme conditional probability of same subject and be worth to same subject Corresponding theme probability.

S204, probability of occurrence of at least one participle word in the corresponding text to be determined is determined.

Wherein, the word subclass of at least one participle word of this in i-th of text to be determined can be expressed as w_i, w_i ={ w_1i,w_2i,…w_ji,…,w_zi, w_jiRepresent j-th of participle word in i-th of text to be determined, z represents to treat for this i-th Determine the total quantity of the participle word in text, any two participle word in the word subclass is different, the word subset J-th of participle word w in conjunction_jiThe calculation formula of probability of occurrence in i-th of text to be determined is：count(w_ji) represent j-th of participle word w_jiGo out occurrence in i-th of text to be determined Number.

S205, the theme corresponding with the same subject is worth to for the first theme conditional probability for calculating same subject are general Rate.

In this step, theme t is calculated_pThe formula of theme probability be：Wherein, P (t_p) Represent theme t_pTheme probability, p (t_p/d_i) represent that i-th of text to be determined belongs to theme t_pThe first theme conditional probability.

S206, at least one of each text to be determined participle word is obtained according to the topic model in different masters Word conditional probability under topic.

It is three layers of bayesian probability model for including word, theme and text to be determined based on topic model, therefore, At least one of each text to be determined can be got according to the topic model of structure and segment word under different themes Word conditional probability, illustratively, j-th in this i-th text to be determined participle word w_jiIn theme t_pThe word of lower appearance Language conditional probability can be expressed as p (w_ji/t_p)。

S207, determine that second theme condition is general according to the theme probability and the probability of occurrence and the word conditional probability Rate.

In this step, the calculation formula of the second theme conditional probability is： Wherein, p (t_p/w_ji) represent to belong to theme t under j-th in this i-th text to be determined participle word_pSecond theme item Part probability, p (w_ji/t_p) represent to segment word in theme t j-th in this i-th text to be determined_pThe word condition of lower appearance Probability, P (t_p) represent theme t_pTheme probability, p (w_ji) represent j-th of participle word going out in i-th of text to be determined Existing probability.

S208, at least one participle word is determined according to the first theme conditional probability and the second theme conditional probability Topic weights in each text to be determined.

Wherein, the calculation formula of the topic weights is：t_wji=p (t_p/d_i)*p(t_p/w_ji), t_wjiRepresent j-th of participle word Topic weights of the language in i-th of text to be determined, p (t_p/d_i) represent that i-th of text to be determined belongs to theme t_pFirst master Inscribe conditional probability, p (t_p/w_ji) represent theme t_pSecond occurred under j-th of participle word in this i-th text to be determined Theme conditional probability.

S209, the temperature weight that each participle word is determined according to the topic weights.

In this step, since the text to be determined in a preset time period may be obtained in step s 201, The text to be determined in multiple preset time periods may be obtained, therefore, for the preset time period of different data, this step can To determine the temperature weight by following different modes.

If the preset time period includes a period, obtained in all texts to be determined by Weight Acquisition step Each participle word the first weight, determine first weight be temperature weight.

In a kind of possible realization method, Weight Acquisition step includes：Each participle word is obtained each to be determined The location information of text；The location information includes text header position or text body position, due to i-th of text to be determined d_iInclude the title (title of this i-th text to be determined_i) and text (body_i), i.e. d_i={ title_i,body_i, because This, can be according to d_iIt determines location information of the participle word in i-th of text to be determined, and believes in the position of the participle word When ceasing for text caption position, determining the topic weights of the participle word and parameter preset (if the parameter preset is 2) Product is second weight of the participle word in each text to be determined；It is the text in the location information of the participle word During text position, the topic weights for determining the participle word are second power of the participle word in each text to be determined Weight；The of the average value of second weight of the same participle word in all texts to be determined for same participle word is calculated respectively One weight, and the first weight determined is temperature weight.

If the preset time period is multiple periods, obtained respectively by Weight Acquisition step complete in each period First weight of each participle word in portion's text to be determined, according to each participle word of first Weight Acquisition Temperature weight, wherein, the method for the Weight Acquisition step can refer to the process of above-mentioned Weight Acquisition step, repeat no more.

Wherein, the of same participle word is determined according to corresponding first weight of same participle word in each period Three weights；And the temperature weight of each participle word is determined according to the third weight of each participle word and the first weight.

Illustratively, when which can include three periods, i.e. first time period, second time period and third Between section, wherein, which can be current slot, which can include the first time period, and It is longer than the period of the first time period, which can include the second time period, and be longer than second time The period of section, for example, the first time period can be this week, which can be the last week of this week and this week, The third period can be the last fortnight of this week and this week, in a kind of possible realization method, when can get first Between each participle word in whole text to be determined in section correspond to the first weight and the second time of the first time period Each participle word in whole text to be determined in section corresponds to the first weight of the second time period and third period Each participle word in interior whole text to be determined corresponds to the first weight of third period, in this way, can be according to this Corresponding three first weights of same participle word in three periods determine the third weight of same participle word, this point The calculation formula of the third weight of word word can be：Wherein, ww_qIt is q-th Segment the corresponding third weight of word；b₁w_qThe first weight of the first time period is corresponded to for q-th of participle word；b₂w_qFor q A participle word corresponds to the first weight of the second time period；b₃w_qThe first power of third period is corresponded to for q-th of participle word Weight；A is the first preset value；B is the second preset value；C is third preset value, for example, a is 0.3, b 0.4, c 0.3.

After the third weight is got, in order to calculate each participle in first time period (i.e. current slot) The temperature weight of word, need to combine the third weight that is calculated of the above-mentioned same participle word according in three periods with And the first weight in first time period obtains temperature weight, in the present embodiment, each be somebody's turn to do can be obtained by the following formula The calculation formula for segmenting temperature weight of the word in first time period is：hw_q=α * b₁w_q+β*ww_q, wherein, hw_qRepresent q The temperature weight of a participle word；b₁w_qRepresent that q-th of participle word corresponds to the first weight of the first time period；ww_qRepresent the The third weight of q participle word；α represents the 4th preset value (such as α values are 0.25)；The 5th preset value of β expressions (such as β Value be 0.75), can be on the basis of corresponding first weight of first time period, with reference to above-mentioned in this way, by above-mentioned formula Third weight obtains temperature weight, which has considered corresponding first weight of second time period and third period Corresponding first weight.

It should be noted that first preset value, second preset value, the third preset value, the 4th preset value and 5th preset value after testing repeatedly by obtaining, and a+b+c=1, alpha+beta=1.

S210, the hot spot word that word quantity is preset according to the temperature Weight Acquisition of each participle word.

In this step, which is subjected to descending sort according to temperature weight and obtains word ranking, according to word Ranking result obtains word ranking and is less than or equal to the participle word of the default word quantity as hot spot word.

S211, the subordinate sentence to be determined for including the hot spot word is obtained from all texts to be determined.

Wherein it is possible to the text to be determined is divided using the punctuation mark in each text to be determined as division points Sentence processing obtains multiple initial subordinate sentences, should in this way, whether there is in the initial subordinate sentence in determining each text to be determined successively Hot spot word, if there are the hot spot words in the initial subordinate sentence, it is determined that the initial subordinate sentence is subordinate sentence to be determined, retains this and treats really Determine subordinate sentence, if there is no the hot spot words in the initial subordinate sentence, which is filtered out.

S212, hot spot subordinate sentence is determined according to the subordinate sentence to be determined.

In this step, the hot spot subordinate sentence can be determined by following steps：

S21, multiple subordinate sentence words that the subordinate sentence to be determined includes are arranged according to topic weights progress descending sort Sequence result.

S22, determine whether the temperature word is less than or equal to default ranking in the weight ranking of the ranking results.

In the temperature word when the weight ranking of the ranking results is less than or equal to default ranking, step S23 is performed；

In the temperature word when the weight ranking of the ranking results is more than default ranking, ignore the subordinate sentence to be determined.

S23, the subordinate sentence to be determined is determined as target subordinate sentence, and hot spot subordinate sentence is obtained from the target subordinate sentence.

In this step, if the subordinate sentence set of the target subordinate sentence got includes { S1, S2 ..., S_n, then it can count respectively The similarity of each target subordinate sentence and other target subordinate sentences in addition to the target subordinate sentence in the subordinate sentence set is calculated, so as to The similarity and value of each target subordinate sentence are got, similarity target subordinate sentence corresponding with the maximum value in value is The hot spot subordinate sentence, specific formula for calculation are as follows：

Wherein, x represents that hot spot subordinate sentence is x-th of target subordinate sentence, and u represents the total quantity of the target subordinate sentence, sim (S_d,S_-d) Represent the similarity between two target subordinate sentences, S_dRepresent d-th of target subordinate sentence, S_-dIt represents other than d-th of target subordinate sentence Other target subordinate sentences, illustratively, the formula for calculating the similarity of d-th of target subordinate sentence and r-th of target subordinate sentence isS_d∩S_rRepresent that there are of identical Chinese character in the d target subordinate sentence and r-th of target subordinate sentence Number, S_d∪S_rIt represents present in d-th of target subordinate sentence and r-th of target subordinate sentence not repeat the number of Chinese character.

S213, determine that the hot spot word and the hot spot subordinate sentence are the focus incident.

In this way, the corresponding hot spot subordinate sentence of each hot spot word can be got, and by the hot spot word and the hot spot subordinate sentence Merging is shown, can be with by above-mentioned steps S211 to S212 for example, the hot spot word obtained is " certain goal task assaults fortified position " Determine " certain goal task is assaulted fortified position " the corresponding hot spot subordinate sentence for " leader focuses on certain goal task, and outstanding problem is oriented to, just son Task 1, subtask 2 and subtask 3 etc. claim, and have issued the action slogan for unswervingly completing certain goal task ", So as to accurately determine focus incident by the hot spot word and the hot spot subordinate sentence, and the hot spot word and the hot spot subordinate sentence are closed And show user so that user gets accurate focus incident, and above-mentioned example is merely illustrative, and the disclosure does not make this It limits.

Fig. 3 is a kind of device block diagram of determining focus incident shown in disclosure exemplary embodiment, as shown in figure 3, should Device includes：

Acquisition module 301, for obtaining multiple texts to be determined in preset time period；

Processing module 302, for obtaining all corresponding topic models of the text to be determined in the preset time period, and Determine that each text to be determined belongs to the first theme conditional probability of different themes according to the topic model；In the topic model Including multiple themes；

First determining module 303, for every in the first theme condition determine the probability all text to be determined The temperature weight of a participle word；

Second determining module 304, for according to the temperature weight of each participle word from multiple texts to be determined Determine focus incident.

Fig. 4 is a kind of device block diagram of determining focus incident shown in disclosure exemplary embodiment, as shown in figure 4, should Processing module 302 includes：

Submodule 3021 is handled, is obtained for each text to be determined in the preset time period to be carried out word segmentation processing At least one participle word；

Training submodule 3022, for preset themes model being trained to obtain theme mould by least one participle word Type.

Fig. 5 is a kind of device block diagram of determining focus incident shown in disclosure exemplary embodiment, as shown in figure 5, should First determining module 303 includes：

First acquisition submodule 3031 belongs to for obtaining at least one of each text to be determined participle word The second theme conditional probability of different themes；

First determination sub-module 3032, for being determined according to the first theme conditional probability and the second theme conditional probability Topic weights of at least one participle word in each text to be determined；

Second determination sub-module 3033, for determining the temperature weight of each participle word according to the topic weights.

Optionally, first acquisition submodule 3031, for determining that at least one participle word is treated really in corresponding this Determine the probability of occurrence in text；Calculate the corresponding with the same subject is worth to of the corresponding first theme conditional probability of same subject Theme probability；At least one of each text to be determined participle word is obtained in different themes according to the topic model Under word conditional probability；Second theme condition is determined according to the theme probability and the probability of occurrence and the word conditional probability Probability.

Optionally, when the preset time period includes a period, second determination sub-module 3033, for passing through power Weight obtaining step obtains the first weight of each participle word in all texts to be determined, determines first weight to be somebody's turn to do Temperature weight.

When the preset time period includes multiple periods, second determination sub-module 3033, for passing through Weight Acquisition Step obtains the first weight of each participle word in all texts to be determined in each period respectively；According to The temperature weight of each participle word of first Weight Acquisition.

Optionally, which includes：Each participle word is obtained in the position of each text to be determined Information；The location information includes text header position or text body position；

When the location information of the participle word is text caption position, determine the topic weights of the participle word with The product of parameter preset is second weight of the participle word in each text to be determined；

When the location information of the participle word is text text position, the topic weights for determining the participle word are Second weight of the participle word in each text to be determined；

The average value for calculating second weight of the same participle word in all texts to be determined respectively is somebody's turn to do to be same Segment the first weight of word.

Optionally, second determination sub-module 3033, for being corresponded to according to the same participle word in each period The first weight determine it is same participle word third weight；

The temperature weight of each participle word is determined according to the third weight of each participle word and the first weight.

Device block diagrams of the Fig. 6 for a kind of determining focus incident shown in disclosure exemplary embodiment, the focus incident packet Hot spot word and hot spot subordinate sentence are included, as shown in fig. 6, second determining module 304 includes：

Second acquisition submodule 3041, for presetting word quantity according to the temperature Weight Acquisition of each participle word Hot spot word；

Third acquisition submodule 3042, it is to be determined comprising the hot spot word for being obtained from all texts to be determined Subordinate sentence；

Sorting sub-module 3043, for multiple subordinate sentence words that the subordinate sentence to be determined includes to be carried out according to topic weights Descending sort obtains ranking results；

Third determination sub-module 3044, for being less than or equal in the temperature word in the weight ranking of the ranking results During default ranking, the subordinate sentence to be determined is determined as target subordinate sentence, and hot spot subordinate sentence is obtained from the target subordinate sentence；

4th determination sub-module 3045, for determining that the hot spot word and the hot spot subordinate sentence are the focus incident.

Using above device, since topic model can combine the incidence relation between text to be determined, theme, word, First theme conditional probability of each text to be determined in different themes is determined, and according to the first theme based on the topic model Conditional probability determines the temperature weight of each participle word, so as to corresponding by the temperature weighted mining of each participle word Focus incident improves the accuracy rate of determining focus incident.

About the device in above-described embodiment, wherein modules perform the concrete mode of operation in related this method Embodiment in be described in detail, explanation will be not set forth in detail herein.

Fig. 7 is the block diagram of a kind of electronic equipment 700 shown in disclosure exemplary embodiment.As shown in fig. 7, the electronics is set Standby 700 can include：Processor 701, memory 702, multimedia component 703, input/output (I/O) interface 704, Yi Jitong Believe component 705.

Wherein, processor 701 is used to control the integrated operation of the electronic equipment 700, to complete determining heat described above All or part of step in the method for point event.Memory 702 is used to store various types of data to support in the electronics The operation of equipment 700, these data can for example include for any application program operated on the electronic equipment 700 or side The instruction of method and the relevant data of application program.The memory 702 by any kind of volatibility or non-volatile can be deposited It stores up equipment or combination thereof is realized, such as static RAM (Static Random Access Memory, Abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read- Only Memory, abbreviation EEPROM), Erasable Programmable Read Only Memory EPROM (Erasable Programmable Read- Only Memory, abbreviation EPROM), and programmable read only memory (Programmable Read-Only Memory, referred to as PROM), read-only memory (Read-Only Memory, abbreviation ROM), magnetic memory, flash memory, disk or CD.It is more Media component 703 can include screen and audio component.Wherein screen for example can be touch screen, and audio component is used to export And/or input audio signal.For example, audio component can include a microphone, microphone is used to receive external audio signal. The received audio signal can be further stored in memory 702 or be sent by communication component 705.Audio component also wraps At least one loud speaker is included, for exports audio signal.I/O interfaces 704 carry between processor 701 and other interface modules For interface, other above-mentioned interface modules can be keyboard, mouse, button etc..These buttons can be virtual push button or entity Button.Communication component 705 is used to carry out wired or wireless communication between the electronic equipment 700 and other equipment.Wireless communication, example Such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they in One or more of combinations, therefore the corresponding communication component 705 can include：Wi-Fi module, bluetooth module, NFC module.

In one exemplary embodiment, electronic equipment 700 can be by one or more application application-specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for performing the method for determining focus incident described above.

In a further exemplary embodiment, a kind of computer readable storage medium including program instruction, example are additionally provided Such as include the memory 702 of program instruction, above procedure instruction can be performed by the processor 701 of electronic equipment 700 in completion The method for stating the determining focus incident.

The preferred embodiment of the disclosure is described in detail above in association with attached drawing, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure Monotropic type, these simple variants belong to the protection domain of the disclosure.

It is further to note that specific technical features described in the above specific embodiments, in not lance In the case of shield, can be combined by any suitable means, in order to avoid unnecessary repetition, the disclosure to it is various can The combination of energy no longer separately illustrates.

In addition, arbitrary combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally Disclosed thought should equally be considered as disclosure disclosure of that.

Claims

A kind of 1. method of determining focus incident, which is characterized in that the method includes：

Obtain multiple texts to be determined in preset time period；

The corresponding topic model of whole in the preset time period text to be determined is obtained, and according to the topic model Determine that each text to be determined belongs to the first theme conditional probability of different themes；The topic model includes multiple institutes State theme；

The temperature weight of each participle word in the first theme condition determine the probability all text to be determined；

Focus incident is determined from multiple texts to be determined according to the temperature weight of each participle word.
It is 2. according to the method described in claim 1, it is characterized in that, described according to the first theme condition determine the probability whole The temperature weight of each participle word in the text to be determined includes：

Obtain the second theme condition that at least one of each text to be determined participle word belongs to different themes Probability；

Determine that at least one participle word exists according to the first theme conditional probability and the second theme conditional probability Topic weights in each text to be determined；

The temperature weight of each participle word is determined according to the topic weights.
3. according to the method described in claim 2, it is characterized in that, it is described obtain in each text to be determined at least one The second theme conditional probability that a participle word belongs to different themes includes：

Determine probability of occurrence of at least one participle word in the corresponding text to be determined；

Calculate the theme probability corresponding with the same subject is worth to of the corresponding first theme conditional probability of same subject；

At least one of each text to be determined participle word is obtained in different themes according to the topic model Under word conditional probability；

Second theme conditional probability is determined according to the theme probability and the probability of occurrence and the word conditional probability.
4. according to the method described in claim 2, it is characterized in that, the preset time period include a period when, institute It states and determines that the temperature weight of each participle word includes according to the topic weights：It is obtained all by Weight Acquisition step First weight of each participle word in the text to be determined, it is the temperature weight to determine first weight.

It is described to determine each participle word according to the topic weights when the preset time period includes multiple periods Temperature weight include：It is obtained respectively by Weight Acquisition step in the whole text to be determined in each period Each participle word the first weight, and according to the temperature of each participle word of first Weight Acquisition Weight.
5. according to the method described in claim 4, it is characterized in that, the Weight Acquisition step includes：

Obtain location information of each participle word in each text to be determined；The location information includes text mark Inscribe position or text body position；

When the location information of the participle word is the text header position, the theme power of the participle word is determined Second weight of the product of weight and parameter preset for the participle word in each text to be determined；

When the location information of the participle word is the text body position, the theme power of the participle word is determined Second weight of the weight for the participle word in each text to be determined；

The average value for calculating second weight of the same participle word in all texts to be determined respectively is same institute State the first weight of participle word.
6. method according to claim 4 or 5, which is characterized in that described each described according to first Weight Acquisition The temperature weight of participle word includes：

The same participle word is determined according to same corresponding first weight of word that segments in each period Third weight；

The institute of each participle word is determined according to the third weight of each participle word and first weight State temperature weight.
7. according to the method described in claim 1, it is characterized in that, the focus incident include hot spot word and hot spot subordinate sentence, The temperature weight of each word of the basis determines that focus incident includes from multiple texts to be determined：

The hot spot word of word quantity is preset according to the temperature Weight Acquisition of each participle word；

The subordinate sentence to be determined for including the hot spot word is obtained from all texts to be determined；

Multiple subordinate sentence words that the subordinate sentence to be determined includes are subjected to descending sort according to topic weights and obtain ranking results；

In the temperature word when the weight ranking of the ranking results is less than or equal to default ranking, described treat really is determined Subordinate sentence is determined for target subordinate sentence, and hot spot subordinate sentence is obtained from the target subordinate sentence；

Determine that the hot spot word and the hot spot subordinate sentence are the focus incident.
8. a kind of device of determining focus incident, which is characterized in that described device includes：

Acquisition module, for obtaining multiple texts to be determined in preset time period；

Processing module, for obtaining the corresponding topic model of the text to be determined of the whole in the preset time period, and root Determine that each text to be determined belongs to the first theme conditional probability of different themes according to the topic model；The theme mould Type includes multiple themes；

First determining module, for each point in the first theme condition determine the probability all text to be determined The temperature weight of word word；

Second determining module, for being determined from multiple texts to be determined according to the temperature weight of each participle word Focus incident.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1 to 7 the method is realized during row.
10. a kind of electronic equipment, which is characterized in that including：

Computer readable storage medium described in claim 9；And

One or to a processor, for performing the program in the computer readable storage medium.