CN108228808B - Method and device for determining hot event, storage medium and electronic equipment - Google Patents

Method and device for determining hot event, storage medium and electronic equipment Download PDF

Info

Publication number
CN108228808B
CN108228808B CN201711484349.7A CN201711484349A CN108228808B CN 108228808 B CN108228808 B CN 108228808B CN 201711484349 A CN201711484349 A CN 201711484349A CN 108228808 B CN108228808 B CN 108228808B
Authority
CN
China
Prior art keywords
word
weight
determined
determining
hot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711484349.7A
Other languages
Chinese (zh)
Other versions
CN108228808A (en
Inventor
董超
崔朝辉
赵立军
张霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201711484349.7A priority Critical patent/CN108228808B/en
Publication of CN108228808A publication Critical patent/CN108228808A/en
Application granted granted Critical
Publication of CN108228808B publication Critical patent/CN108228808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The present disclosure relates to a method, an apparatus, a storage medium, and an electronic device for determining a hotspot event, the method comprising: acquiring a plurality of texts to be determined in a preset time period; obtaining topic models corresponding to all texts to be determined in the preset time period, and determining a first topic conditional probability that each text to be determined belongs to different topics according to the topic models; the theme model comprises a plurality of themes; determining the heat weight of each word segmentation word in all texts to be determined according to the first subject conditional probability; and determining a hot event from the plurality of texts to be determined according to the heat weight of each word segmentation word.

Description

Method and device for determining hot event, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of information technologies, and in particular, to a method and an apparatus for determining a hotspot event, a storage medium, and an electronic device.
Background
With the rapid popularization of the internet, the social influence of the network is gradually expanded, users can acquire news information through various ways such as a portal, social software, a microblog and a forum and express their own views of the news information, and the frequent interaction process can generate some common topics among different users, and the common topics are called as hot events.
At present, all news information can be regarded as a set in the process of determining a hot event, news information of the same type is aggregated in a clustering mode, the news information of the same type is respectively subjected to popularity ranking, and the news event with the top popularity ranking is taken as the popularity event of the type, wherein factors determining the popularity ranking can be access times, the number of comment texts and the like, and the popularity ranking can be manually interfered (for example, the popularity ranking is manually performed through software or manpower), so that the hot event is determined only according to the popularity ranking and is not accurate.
Disclosure of Invention
In order to solve the above problem, the present disclosure provides a method, an apparatus, a storage medium, and an electronic device for determining a hotspot event.
According to a first aspect of embodiments of the present disclosure, there is provided a method of determining a hotspot event, the method comprising:
acquiring a plurality of texts to be determined in a preset time period;
obtaining topic models corresponding to all texts to be determined in the preset time period, and determining a first topic conditional probability that each text to be determined belongs to different topics according to the topic models; the theme model comprises a plurality of themes;
determining the heat weight of each word segmentation word in all the texts to be determined according to the first subject conditional probability;
and determining a hot event from the texts to be determined according to the heat weight of each word segmentation word.
Optionally, the obtaining of the topic models corresponding to all the texts to be determined within the preset time period includes:
performing word segmentation processing on each text to be determined in the preset time period to obtain at least one word segmentation word;
and training a preset topic model through at least one word segmentation word to obtain a topic model.
Optionally, the determining, according to the first topic conditional probability, the heat weight of each participle word in all the texts to be determined includes:
acquiring a second topic conditional probability that at least one word-segmentation word in each text to be determined belongs to different topics;
determining the theme weight of at least one word segmentation word in each text to be determined according to the first theme conditional probability and the second theme conditional probability;
and determining the heat weight of each word segmentation word according to the theme weight.
Optionally, the obtaining a second topic conditional probability that at least one word segmentation word in each text to be determined belongs to a different topic includes:
determining the occurrence probability of at least one word segmentation word in the corresponding text to be determined;
calculating the sum value of the conditional probabilities of the first subjects corresponding to the same subject to obtain the subject probability corresponding to the same subject;
obtaining word conditional probability of at least one word segmentation word in each text to be determined under different topics according to the topic model;
and determining a second topic conditional probability according to the topic probability, the occurrence probability and the word conditional probability.
Optionally, when the preset time period includes one time period, the determining the heat weight of each word segmentation word according to the topic weight includes:
and acquiring a first weight of each word segmentation word in all the texts to be determined through a weight acquisition step, and determining the first weight as the heat weight.
When the preset time period comprises a plurality of time periods, the determining the heat weight of each word segmentation word according to the theme weight comprises the following steps:
respectively acquiring a first weight of each word segmentation word in all the texts to be determined in each time period through a weight acquisition step;
and acquiring the heat weight of each word segmentation word according to the first weight.
Optionally, the weight obtaining step includes:
acquiring position information of each word segmentation word in each text to be determined; the position information comprises a text title position or a text body position;
when the position information of the word segmentation words is the text title position, determining the product of the theme weight of the word segmentation words and a preset parameter as a second weight of the word segmentation words in each text to be determined;
when the position information of the word segmentation words is the text body position, determining the theme weight of the word segmentation words as a second weight of the word segmentation words in each text to be determined;
and respectively calculating the average value of the second weights of the same word-segmentation word in all the texts to be determined as the first weight of the same word-segmentation word.
Optionally, the obtaining the heat weight of each word segmentation word according to the first weight includes:
determining a third weight of the same word segmentation word according to the first weight corresponding to the same word segmentation word in each time period;
determining the heat weight of each word segmentation word according to the third weight and the first weight of each word segmentation word.
Optionally, the hot event includes a hot word and a hot clause, and determining the hot event from the plurality of texts to be determined according to the heat weight of each word includes:
acquiring hot words with preset word quantity according to the heat weight of each word segmentation word;
obtaining clauses to be determined containing the hot words from all the texts to be determined;
sequencing a plurality of clause words included in the clause to be determined in a descending order according to the theme weight to obtain a sequencing result;
when the weight ranking of the hotness words in the ranking result is less than or equal to a preset ranking, determining the clause to be determined as a target clause, and acquiring a hot clause from the target clause;
and determining the hot words and the hot clauses as the hot events.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for determining a hotspot event, the apparatus comprising:
the acquisition module is used for acquiring a plurality of texts to be determined in a preset time period;
the processing module is used for acquiring topic models corresponding to all the texts to be determined in the preset time period and determining a first topic conditional probability that each text to be determined belongs to different topics according to the topic models; the theme model comprises a plurality of themes;
the first determining module is used for determining the heat weight of each word segmentation word in all the texts to be determined according to the first subject conditional probability;
and the second determining module is used for determining a hot event from the texts to be determined according to the heat weight of each word segmentation word.
Optionally, the processing module includes:
the processing submodule is used for carrying out word segmentation processing on each text to be determined in the preset time period to obtain at least one word segmentation word;
and the training submodule is used for training a preset topic model through at least one word segmentation word to obtain a topic model.
Optionally, the first determining module includes:
the first obtaining submodule is used for obtaining a second theme conditional probability that at least one word segmentation word in each text to be determined belongs to different themes;
a first determining submodule, configured to determine a topic weight of at least one word segmentation word in each text to be determined according to the first topic conditional probability and the second topic conditional probability;
and the second determining submodule is used for determining the heat weight of each word segmentation word according to the theme weight.
Optionally, the first obtaining sub-module is configured to determine an occurrence probability of at least one word segmentation word in the corresponding text to be determined; calculating the sum value of the conditional probabilities of the first subjects of the same subject to obtain the subject probability corresponding to the same subject; obtaining word conditional probability of at least one word segmentation word in each text to be determined under different topics according to the topic model; and determining a second topic conditional probability according to the topic probability, the occurrence probability and the word conditional probability.
Optionally, when the preset time period includes a time period, the second determining submodule is configured to obtain a first weight of each word segmentation word in all the texts to be determined through a weight obtaining step, and determine that the first weight is the heat weight.
When the preset time period comprises a plurality of time periods, the second determining submodule is used for respectively obtaining the first weight of each word segmentation word in all the texts to be determined in each time period through a weight obtaining step; and acquiring the heat weight of each word segmentation word according to the first weight.
Optionally, the weight obtaining step includes: acquiring position information of each word segmentation word in each text to be determined; the position information comprises a text title position or a text body position;
when the position information of the word segmentation words is the text title position, determining the product of the theme weight of the word segmentation words and a preset parameter as a second weight of the word segmentation words in each text to be determined;
when the position information of the word segmentation words is the text body position, determining the theme weight of the word segmentation words as a second weight of the word segmentation words in each text to be determined;
and respectively calculating the average value of the second weights of the same word-segmentation word in all the texts to be determined as the first weight of the same word-segmentation word.
Optionally, the second determining submodule is configured to determine a third weight of the same participle term according to the first weight corresponding to the same participle term in each time period;
determining the heat weight of each word segmentation word according to the third weight and the first weight of each word segmentation word.
Optionally, the hotspot event includes a hotspot word and a hotspot clause, and the second determining module includes:
the second obtaining submodule is used for obtaining hot words with preset word quantity according to the heat weight of each word segmentation word;
a third obtaining submodule, configured to obtain to-be-determined clauses including the hot words from all the to-be-determined texts;
the sorting submodule is used for sorting a plurality of clause words included in the clause to be determined in a descending order according to the theme weight to obtain a sorting result;
a third determining submodule, configured to determine that the clause to be determined is a target clause when the weight rank of the hotness word in the ranking result is less than or equal to a preset rank, and obtain a hot clause from the target clause;
and the fourth determining submodule is used for determining the hot words and the hot clauses as the hot events.
According to a third aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect described above.
According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
the computer-readable storage medium of the third aspect above; and
one or more processors for executing the program in the computer-readable storage medium.
According to the technical scheme, a plurality of texts to be determined in a preset time period are obtained; obtaining topic models corresponding to all texts to be determined in the preset time period, and determining a first topic conditional probability that each text to be determined belongs to different topics according to the topic models; the theme model comprises a plurality of themes; determining the heat weight of each word segmentation word in all the texts to be determined according to the first subject conditional probability; according to the method, the hot spot events are determined from the texts to be determined according to the heat weight of each participle word, so that the topic model can be combined with the incidence relation among the texts, the topics and the words to be determined, the first topic condition probability of each text to be determined in different topics is determined based on the topic model, the heat weight of each participle word is determined according to the first topic condition probability, corresponding hot spot events are mined according to the heat weight of each participle word, and the accuracy of determining the hot spot events is improved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
fig. 1 is a flowchart illustrating a method of determining a hotspot event according to an exemplary embodiment of the present disclosure;
fig. 2 is a flowchart illustrating another method of determining a hotspot event according to an exemplary embodiment of the present disclosure;
fig. 3 is a block diagram illustrating a first apparatus for determining a hotspot event according to an exemplary embodiment of the disclosure;
fig. 4 is a block diagram illustrating a second apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure;
fig. 5 is a block diagram illustrating a third apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure;
fig. 6 is a block diagram illustrating a fourth apparatus for determining a hot spot event according to an exemplary embodiment of the disclosure;
fig. 7 is a block diagram of an electronic device shown in an exemplary embodiment of the present disclosure.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
Firstly, explaining an application scenario of the present disclosure, as a network is used as a part of life of people, news information can be spread through the network, so as to realize interaction between a user and the news information, if the user interacts with a certain news information frequently, the news information will be used as a hot event, for example, news information in a meeting period in 2017 is taken as an example, the news information includes information 1, information 2, information 3, information 4, information 5, information 6, information 7, and other related information, obtaining a hot event from a large amount of news information has a great value for government workers, business officers, financial researchers, and other related personnel related to public opinion research, that is, obtaining a hot event is beneficial to timely grasping event development dynamics and taking corresponding measures, but at present, the hot event is mainly determined by hot ranking, since the popularity ranking may be performed through manual intervention (e.g., by software or manually by a swipe ranking), determining hotspot events based solely on the popularity ranking is not accurate.
The disclosure provides a method, a device, a storage medium and an electronic device for determining a hot event, wherein a plurality of texts to be determined of news information in a preset time period are obtained, a topic model corresponding to all texts to be determined in the preset time period is obtained, a first topic condition probability that each text to be determined belongs to different topics is obtained through the topic model, a heat weight of each participle word in all texts to be determined is determined according to the first topic condition probability, so that the hot event can be determined from the text to be determined according to the heat weight, and therefore, as the topic model can be combined with the association relation among the texts to be determined, the topics and the words, the first topic condition probability of each text to be determined in different topics is determined based on the topic model, and the heat weight of each participle word is determined according to the first topic condition probability, therefore, corresponding hot events are mined through the heat weight of each word segmentation word, and the accuracy of determining the hot events is improved.
The present disclosure is described in detail below with reference to specific examples.
Fig. 1 is a flowchart illustrating a method for determining a hotspot event according to an exemplary embodiment of the present disclosure, where as shown in fig. 1, the method includes:
s101, obtaining a plurality of texts to be determined in a preset time period.
The text to be determined may be obtained from a web portal, social software, a microblog, a forum, and the like through a crawler technology, and for example, it is described by taking news information of a meeting in 2017 as an example, the news information includes information 1, information 2, information 3, information 4, information 5, information 6, information 7, and other related information, and the text to be determined may include texts corresponding to the related news information 1, information 2, information 3, information 4, information 5, information 6, information 7, and the like, which is described by way of example only, and the disclosure is not limited thereto.
S102, obtaining topic models corresponding to all texts to be determined in the preset time period, and determining first topic conditional probability that each text to be determined belongs to different topics according to the topic models.
In this step, if the preset time period is a time period, the topic model is a model generated according to all texts to be determined in the time period; if the preset time period comprises a plurality of time periods, the topic model comprises a plurality of time periods, and the topic model is respectively a model generated according to all texts to be determined in each time period.
In a possible implementation manner, each text to be determined in the same time period may be subjected to word segmentation processing to obtain at least one word segmentation word, and a preset topic model is trained through the at least one word and a preset topic number to obtain a topic model, where the preset topic model may be a model generated through an L DA (L event dictionary Allocation model) algorithm, and the preset topic model is equivalent to a three-layer bayesian probability model, that is, a three-layer structure of a word, a topic, and a text to be determined, so that a topic model including the text to be determined, a word, and a topic may be generated according to the preset topic model, and thus, a first topic conditional probability that each text to be determined belongs to a different topic and a word conditional probability that at least one word segmentation word in each text to be determined is under a different topic may be determined according to the topic model.
S103, determining the heat weight of each word segmentation word in the text to be determined according to the first subject conditional probability.
In this disclosure, if the popularity weight is larger, the popularity of the segmented word is higher, that is, the attention of the user to the segmented word is higher, and conversely, if the popularity weight is smaller, the popularity of the segmented word is lower, that is, the attention of the user to the segmented word is lower.
And S104, determining a hot event from the plurality of texts to be determined according to the heat weight of each word segmentation word.
By adopting the method, the topic model can be combined with the incidence relation among the texts, the topics and the words to be determined, the first topic condition probability of each text to be determined in different topics is determined based on the topic model, and the heat weight of each word segmentation word is determined according to the first topic condition probability, so that the corresponding hot events are mined according to the heat weight of each word segmentation word, and the accuracy of determining the hot events is improved.
Fig. 2 is a flowchart illustrating a method for determining a hotspot event according to an exemplary embodiment of the present disclosure, where as shown in fig. 2, the method includes:
s201, obtaining a plurality of texts to be determined in a preset time period.
The text to be determined may be obtained from a web portal, social software, a microblog, a forum, and the like through a crawler technology to obtain a text of a target topic, and for example, a news information of a meeting in 2017 is taken as an example for explanation, where the news information includes information 1, information 2, information 3, information 4, information 5, information 6, and information 67, the text to be determined may include texts corresponding to related news information such as information 1, information 2, information 3, information 4, information 5, information 6, information 7, and in a possible implementation manner, if the preset time period is multiple time periods, multiple texts to be determined in each time period may be represented as a text set D, where D ═ { D ═ D1,d2,…,di,…dn},diRepresents the ith text to be determined, n represents the total number of the texts to be determined in the same time period, and diA title (title) including the i-th text to be determinedi) And body (body)i) I.e. di={titlei,bodyiAnd determining a first weight of each participle word in all texts to be determined according to the position information of each participle word in each text to be determined in the subsequent step.
S202, obtaining the theme models corresponding to all the texts to be determined in the preset time period.
It should be noted that, if the preset time period is a time period, the topic model is a model generated according to all texts to be determined in the time period; if the preset time period comprises a plurality of time periods, the topic model comprises a plurality of time periods, and the topic model is respectively a model generated according to all texts to be determined in each time period.
In this step, the topic models corresponding to all texts to be determined in the preset time period may be obtained through the following steps:
and S11, performing word segmentation processing on each text to be determined in the preset time period to obtain at least one word segmentation word.
The word segmentation process may include a plurality of methods, such as a character matching method (i.e., a mechanical word segmentation method), specifically, each text to be determined is sequentially matched with entries in a preset dictionary, and if a certain entry corresponding to each text to be determined is found in the preset dictionary, the matching is successful, so as to identify a word. Therefore, in order to solve the problem, in another embodiment of the present disclosure, after performing word segmentation processing on each text to be determined to obtain at least one segmented word, the stop word may be removed, so that words without practical meaning can be removed, and thus, the computational complexity in the subsequent preset topic model training process is reduced under the condition of ensuring the accuracy of determining the hot spot event.
And S12, training a preset topic model according to at least one word segmentation word to obtain a topic model.
In this step, the topic model includes topics with a preset topic number, the preset topic number may be generally determined according to the number of texts of the text to be determined, generally, the preset topic number may be set to 50 to 200, the preset topic model may be a model generated by an L DA (L event Dirichlet Allocation model) algorithm, the preset topic model is equivalent to a three-layer bayesian probability model, that is, a three-layer structure of participle words, topics and the text to be determined, in this way, a topic model including the text to be determined, words and topics may be generated according to the preset topic model, so that a first topic conditional probability that each text to be determined belongs to a different topic and a word conditional probability that at least one participle word in each text to be determined belongs to a different topic may be determined according to the topic model.
It should be noted that after the at least one word segmentation word is obtained, since different texts to be determined may include the same word segmentation word, when a preset topic model is trained according to the at least one word segmentation word, the same word segmentation word is repeatedly trained, so that the processing efficiency is reducedFor example, a word set W of the preprocessed word-segmented words can be formed by combining at least one word-segmented word of each text to be determined, where W is { W ═1,w2,…,wl,…wc},wlThe word segmentation method comprises the steps of representing the ith preprocessed word segmentation word, wherein any two preprocessed word segmentation words are different, so that the heat weight of each word in the word set can be sequentially obtained in the subsequent steps.
S203, determining the first topic conditional probability of each text to be determined belonging to different topics according to the topic model.
Exemplarily, the ith text to be determined belongs to a subject tpMay be expressed as p (t)p/di) Wherein, tpDenotes the p-th topic, diRepresents the ith text to be determined, so that the first subject conditional probability set of the ith text to be determined can be represented as { p (t)1/di),p(t2/di),…,p(tp/di),…,p(tk/di) And determining the first topic conditional probability set of each text to be determined, so that in the subsequent step, the sum value of the first topic conditional probabilities corresponding to the same topic can be calculated according to the first topic conditional probability set of each text to be determined, thereby obtaining the topic probability corresponding to the same topic.
S204, determining the occurrence probability of at least one word segmentation word in the corresponding text to be determined.
Wherein, the word subset of the at least one word segmentation word in the ith text to be determined can be represented as wi,wi={w1i,w2i,…wji,…,wzi},wjiRepresenting the jth participle word in the ith text to be determined, z-tableShowing the total number of participle words in the ith text to be determined, wherein any two participle words in the word subset are different, and the jth participle word w in the word subsetjiThe calculation formula of the occurrence probability in the ith text to be determined is as follows:
Figure GDA0001641138330000121
count(wji) Represents the jth participle word wjiThe number of occurrences in the ith text to be determined.
S205, calculating the sum value of the first theme conditional probabilities of the same theme to obtain the theme probability corresponding to the same theme.
In this step, a subject t is calculatedpThe formula of the topic probability of (1) is:
Figure GDA0001641138330000131
wherein, P (t)p) Representing a topic tpSubject probability of p (t)p/di) Indicating that the ith text to be determined belongs to the subject tpThe first subject conditional probability.
S206, obtaining word conditional probability of at least one word segmentation word in each text to be determined under different subjects according to the subject model.
The method comprises the steps that a three-layer Bayesian probability model comprising words, topics and texts to be determined is adopted on the basis of a topic model, therefore, word conditional probabilities of at least one participle word in each text to be determined under different topics can be obtained according to the constructed topic model, and exemplarily, the jth participle word w in the ith text to be determinedjiAt topic tpThe conditional probability of a word occurring below can be expressed as p (w)ji/tp)。
And S207, determining a second topic conditional probability according to the topic probability, the occurrence probability and the word conditional probability.
In this step, the calculation formula of the conditional probability of the second topic is:
Figure GDA0001641138330000132
wherein, p (t)p/wji) Indicating that the jth participle word in the ith text to be determined belongs to the subject tpP (w) of the second topic conditional probability ofji/tp) Indicating that the jth participle word in the ith text to be determined is on the subject tpConditional probability of word appearing below, P (t)p) Representing a topic tpSubject probability of p (w)ji) And expressing the occurrence probability of the jth participle word in the ith text to be determined.
S208, determining the theme weight of at least one word segmentation word in each text to be determined according to the first theme conditional probability and the second theme conditional probability.
The calculation formula of the theme weight is as follows: t is twji=p(tp/di)*p(tp/wji),twjiRepresenting the subject weight of the jth participle word in the ith text to be determined, p (t)p/di) Indicating that the ith text to be determined belongs to the subject tpP (t) is the first subject conditional probability ofp/wji) Representing a topic tpAnd a second subject conditional probability of occurring under the jth participle word in the ith text to be determined.
S209, determining the heat weight of each word segmentation word according to the theme weight.
In this step, since the text to be determined in one preset time period may be acquired in step S201, or the texts to be determined in a plurality of preset time periods may be acquired, for the preset time periods of different data, this step may determine the heat weight in the following different manners.
If the preset time period comprises a time period, acquiring a first weight of each word segmentation word in all the texts to be determined through a weight acquisition step, and determining that the first weight is a heat weight.
In one possible implementation manner, the weight obtaining step includes: acquiring position information of each word segmentation word in each text to be determined; the position information comprises a text title position or a text body position, because the ith text d to be determinediA title (title) including the i-th text to be determinedi) And body (body)i) I.e. di={titlei,bodyiThus, can be according to diDetermining the position information of the participle word in the ith text to be determined, and when the position information of the participle word is the text title position, determining the product of the theme weight of the participle word and a preset parameter (if the preset parameter is 2) as a second weight of the participle word in each text to be determined; when the position information of the word-dividing word is the text body position, determining the subject weight of the word-dividing word as a second weight of the word-dividing word in each text to be determined; and respectively calculating the average value of the second weights of the same participle word in all the texts to be determined as the first weight of the same participle word, and determining the obtained first weight as the heat weight.
If the preset time period is multiple time periods, respectively obtaining a first weight of each word segmentation word in all the texts to be determined in each time period through a weight obtaining step, and obtaining a heat weight of each word segmentation word according to the first weight, wherein the method of the weight obtaining step can refer to the process of the weight obtaining step and is not repeated.
Determining a third weight of the same participle word according to the first weight corresponding to the same participle word in each time period; and determining the heat weight of each participle word according to the third weight and the first weight of each participle word.
For example, the preset time period may include three time periods, namely a first time period, a second time period and a third time period, where the first time period may be a current time period, the second time period may be a time period which includes the first time period and is longer than the first time period, and the third time period may be a time period which includes the second time period and is longer than the second time period, for example, the first time period may be the present week, the second time period may be the present week and the previous week of the present week, and the third time period may be the present week and the previous two weeks of the present weekEach participle word in all texts to be determined in the first time period corresponds to the first weight of the first time period, each participle word in all texts to be determined in the second time period corresponds to the first weight of the second time period, and each participle word in all texts to be determined in the third time period corresponds to the first weight of the third time period, so that the third weight of the same participle word can be determined according to the three first weights corresponding to the same participle word in the three time periods, and the calculation formula of the third weight of the participle word can be:
Figure GDA0001641138330000151
wherein, wwqA third weight corresponding to the qth participle word; b1wqA first weight corresponding to the first time period for the qth participle word; b2wqA first weight corresponding to the second time period for the qth participle word; b3wqA first weight corresponding to a third time period for the qth participle word; a is a first preset value; b is a second preset value; c is a third predetermined value, e.g., a is 0.3, b is 0.4, and c is 0.3.
After the third weight is obtained, in order to calculate the heat weight of each participle word in the first time period (i.e. the current time period), the heat weight needs to be obtained by combining the third weight calculated according to the same participle word in the three time periods and the first weight in the first time period, and in this embodiment, the calculation formula of the heat weight of each participle word in the first time period may be obtained by the following formula: hwq=α*b1wq+β*wwqWherein, hwqRepresenting the heat weight of the qth participle word; b1wqA first weight representing that the qth participle term corresponds to the first time period; wwqA third weight representing the q word-segmentation word, α a fourth preset value (for example, α is 0.25), β a fifth preset value (for example, β is 0.75), so that the heat can be obtained by combining the third weight and the first weight corresponding to the first time period through the formulaAnd the degree weight comprehensively considers the first weight corresponding to the second time period and the first weight corresponding to the third time period.
It should be noted that the first preset value, the second preset value, the third preset value, the fourth preset value, and the fifth preset value are obtained by repeated experiments, and a + b + c is 1, and α + β is 1.
S210, obtaining hot words with preset word quantity according to the heat weight of each word segmentation word.
In the step, the word-segmentation words are sorted in a descending order according to the heat weight to obtain word ranking, and word-segmentation words with the word ranking less than or equal to the preset word number are obtained according to word ranking results and serve as hot words.
S211, obtaining the clauses to be determined containing the hot words from all the texts to be determined.
The punctuation marks in each text to be determined can be used as dividing points to perform clause processing on the text to be determined to obtain a plurality of initial clauses, so that whether the hot words exist in the initial clauses in each text to be determined is determined in sequence, if the hot words exist in the initial clauses, the initial clauses are determined to be the clauses to be determined, the clauses to be determined are reserved, and if the hot words do not exist in the initial clauses, the initial clauses are filtered.
And S212, determining hot clauses according to the clauses to be determined.
In this step, the hot clause may be determined by:
and S21, sequencing a plurality of clause words included in the clause to be determined in a descending order according to the theme weight to obtain a sequencing result.
S22, determining whether the weight ranking of the hotness words in the ranking result is less than or equal to a preset ranking.
When the weight ranking of the hotness word in the ranking result is less than or equal to the preset ranking, executing step S23;
and when the weight ranking of the hot words in the ranking result is greater than the preset ranking, ignoring the clause to be determined.
And S23, determining the clause to be determined as a target clause, and acquiring a hot clause from the target clause.
In this step, if the sentence set of the obtained target sentences includes { S1, S2, …, SnAnd calculating the similarity between each target clause in the clause set and other target clauses except the target clause, so as to obtain the similarity and value of each target clause, where the target clause corresponding to the maximum value of the similarity and value is the hot clause, and a specific calculation formula is as follows:
Figure GDA0001641138330000171
wherein x represents that the hot spot clause is the x-th target clause, u represents the total number of the target clauses, sim (S)d,S-d) Representing the similarity between two target clauses, SdIndicating the d-th target clause, S-dRepresenting other target clauses except the d-th target clause, illustratively, the similarity between the d-th target clause and the r-th target clause is calculated by the formula
Figure GDA0001641138330000172
Sd∩SrIndicating the number of the same Chinese characters in the d-th target clause and the r-th target clause, Sd∪SrAnd the number of the non-repeated Chinese characters in the d-th target clause and the r-th target clause is shown.
S213, determining the hot word and the hot clause as the hot event.
Thus, a hot clause corresponding to each hot term can be obtained, and the hot term and the hot clause are combined and displayed, for example, the obtained hot term is "a certain target task is attacked and hardened", the hot clause corresponding to "the certain target task is determined to be" a lead focus certain target task "through the steps S211 to S212, a problem guidance is highlighted, a requirement is provided for the subtask 1, the subtask 2, the subtask 3 and the like, an action number for completing the certain target task without moving firmly is issued, so that a hot event is accurately determined through the hot term and the hot clause, and the hot term and the hot clause are combined and displayed to a user, so that the user obtains an accurate hot event, and the above example is only an example and is not limited by the present disclosure.
By adopting the method, the topic model can be combined with the incidence relation among the texts, the topics and the words to be determined, the first topic condition probability of each text to be determined in different topics is determined based on the topic model, and the heat weight of each word segmentation word is determined according to the first topic condition probability, so that the corresponding hot events are mined according to the heat weight of each word segmentation word, and the accuracy of determining the hot events is improved.
Fig. 3 is a block diagram illustrating an apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure, as shown in fig. 3, the apparatus includes:
an obtaining module 301, configured to obtain multiple texts to be determined within a preset time period;
the processing module 302 is configured to obtain topic models corresponding to all the texts to be determined within the preset time period, and determine a first topic conditional probability that each text to be determined belongs to a different topic according to the topic model; the theme model comprises a plurality of themes;
a first determining module 303, configured to determine a heat weight of each word segmentation word in all the texts to be determined according to the first topic conditional probability;
a second determining module 304, configured to determine a hot event from the plurality of texts to be determined according to the heat weight of each word-segmentation word.
Fig. 4 is a block diagram of an apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure, and as shown in fig. 4, the processing module 302 includes:
the processing submodule 3021 is configured to perform word segmentation processing on each text to be determined within the preset time period to obtain at least one word segmentation word;
the training submodule 3022 is configured to train a preset topic model through at least one word-segmentation word to obtain a topic model.
Fig. 5 is a block diagram of an apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure, and as shown in fig. 5, the first determining module 303 includes:
the first obtaining submodule 3031 is configured to obtain a second topic conditional probability that at least one word-segmentation word in each text to be determined belongs to a different topic;
a first determining submodule 3032, configured to determine a topic weight of at least one word segmentation word in each text to be determined according to the first topic conditional probability and the second topic conditional probability;
the second determining submodule 3033 is configured to determine a popularity weight of each of the participle terms according to the topic weight.
Optionally, the first obtaining sub-module 3031 is configured to determine an occurrence probability of at least one word segmentation word in the corresponding text to be determined; calculating the sum value of the conditional probabilities of the first subjects corresponding to the same subject to obtain the subject probability corresponding to the same subject; obtaining word conditional probability of at least one word segmentation word in each text to be determined under different topics according to the topic model; and determining a second topic conditional probability according to the topic probability, the occurrence probability and the word conditional probability.
Optionally, when the preset time period includes a time period, the second determining submodule 3033 is configured to obtain, through the weight obtaining step, a first weight of each word segmentation word in all the texts to be determined, and determine that the first weight is the heat weight.
When the preset time period includes a plurality of time periods, the second determining submodule 3033 is configured to respectively obtain, through the weight obtaining step, a first weight of each word segmentation word in all the texts to be determined in each time period; and acquiring the heat weight of each word segmentation word according to the first weight.
Optionally, the weight obtaining step includes: acquiring the position information of each word segmentation word in each text to be determined; the position information comprises a text title position or a text body position;
when the position information of the word-dividing word is the text title position, determining the product of the theme weight and the preset parameter of the word-dividing word as a second weight of the word-dividing word in each text to be determined;
when the position information of the word-dividing word is the text body position, determining the subject weight of the word-dividing word as a second weight of the word-dividing word in each text to be determined;
and respectively calculating the average value of the second weights of the same word-segmentation word in all the texts to be determined as the first weight of the same word-segmentation word.
Optionally, the second determining submodule 3033 is configured to determine a third weight of the same participle word according to the first weight corresponding to the same participle word in each time period;
and determining the heat weight of each participle word according to the third weight and the first weight of each participle word.
Fig. 6 is a block diagram of an apparatus for determining a hotspot event according to an exemplary embodiment of the present disclosure, where the hotspot event includes a hotspot word and a hotspot clause, and as shown in fig. 6, the second determining module 304 includes:
a second obtaining sub-module 3041, configured to obtain hot words with a preset number of words according to the heat weight of each word-dividing word;
a third obtaining submodule 3042, configured to obtain to-be-determined clauses including the hot word from all the to-be-determined texts;
a sorting submodule 3043, configured to sort, in a descending order, the multiple clause terms included in the clause to be determined according to the theme weight to obtain a sorting result;
a third determining submodule 3044, configured to determine, when the weight rank of the hotness word in the sorting result is less than or equal to the preset rank, that the clause to be determined is a target clause, and obtain a hot clause from the target clause;
a fourth determining submodule 3045 configured to determine that the hot word and the hot clause are the hot event.
By adopting the device, the topic model can be combined with the incidence relation among the texts, the topics and the words to be determined, the first topic condition probability of each text to be determined in different topics is determined based on the topic model, and the heat weight of each word segmentation word is determined according to the first topic condition probability, so that the corresponding hot events are mined according to the heat weight of each word segmentation word, and the accuracy of determining the hot events is improved.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 7 is a block diagram of an electronic device 700 shown in an exemplary embodiment of the present disclosure. As shown in fig. 7, the electronic device 700 may include: a processor 701, a memory 702, multimedia components 703, input/output (I/O) interfaces 704, and communication components 705.
The processor 701 is configured to control the overall operation of the electronic device 700, so as to complete all or part of the steps in the above-described method for determining a hot spot event. The memory 702 is used to store various types of data to support operation of the electronic device 700, such as instructions for any application or method operating on the electronic device 700 and application-related data. The Memory 702 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 703 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 702 or transmitted through the communication component 705. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 705 may include: Wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the electronic Device 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable logic devices (Programmable L ic devices, P L D), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components for performing the above-described method of determining hotspot events.
In another exemplary embodiment, a computer readable storage medium comprising program instructions, such as the memory 702 comprising program instructions, which are executable by the processor 701 of the electronic device 700 to perform the above-described method of determining a hotspot event is also provided.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (8)

1. A method of determining a hotspot event, the method comprising:
acquiring a plurality of texts to be determined in a preset time period;
obtaining topic models corresponding to all texts to be determined in the preset time period, and determining a first topic conditional probability that each text to be determined belongs to different topics according to the topic models; the theme model comprises a plurality of themes;
determining the heat weight of each word segmentation word in all the texts to be determined according to the first subject conditional probability;
determining a hot event from the texts to be determined according to the heat weight of each word segmentation word, wherein the hot event comprises a hot word and a hot clause;
wherein the determining the heat weight of each participle word in all the texts to be determined according to the first topic conditional probability comprises:
acquiring a second topic conditional probability that at least one word-segmentation word in each text to be determined belongs to different topics;
determining the theme weight of at least one word segmentation word in each text to be determined according to the first theme conditional probability and the second theme conditional probability;
determining the heat weight of each word segmentation word according to the theme weight;
the determining a hot event from the plurality of texts to be determined according to the heat weight of each word segmentation word comprises:
acquiring hot words with preset word quantity according to the heat weight of each word segmentation word;
obtaining clauses to be determined containing the hot words from all the texts to be determined;
sequencing a plurality of clause words included in the clause to be determined in a descending order according to the theme weight to obtain a sequencing result;
when the weight ranking of the hot words in the ranking result is less than or equal to a preset ranking, determining the clause to be determined as a target clause, and acquiring a hot clause from the target clause;
and determining the hot words and the hot clauses as the hot events.
2. The method according to claim 1, wherein the obtaining a second topic conditional probability that at least one of the participle words in each of the texts to be determined belongs to a different topic comprises:
determining the occurrence probability of at least one word segmentation word in the corresponding text to be determined;
calculating the sum value of the conditional probabilities of the first subjects corresponding to the same subject to obtain the subject probability corresponding to the same subject;
obtaining word conditional probability of at least one word segmentation word in each text to be determined under different topics according to the topic model;
and determining a second topic conditional probability according to the topic probability, the occurrence probability and the word conditional probability.
3. The method of claim 2, wherein when the preset time period comprises a time period, the determining the heat weight of each word segmentation word according to the theme weight comprises: acquiring a first weight of each word segmentation word in all the texts to be determined through a weight acquisition step, and determining the first weight as the heat weight;
when the preset time period comprises a plurality of time periods, the determining the heat weight of each word segmentation word according to the theme weight comprises the following steps: respectively obtaining a first weight of each word segmentation word in all the texts to be determined in each time period through a weight obtaining step, and obtaining the heat weight of each word segmentation word according to the first weight.
4. The method of claim 3, wherein the weight obtaining step comprises:
acquiring position information of each word segmentation word in each text to be determined; the position information comprises a text title position or a text body position;
when the position information of the word segmentation words is the text title position, determining the product of the theme weight of the word segmentation words and a preset parameter as a second weight of the word segmentation words in each text to be determined;
when the position information of the word segmentation words is the text body position, determining the theme weight of the word segmentation words as a second weight of the word segmentation words in each text to be determined;
and respectively calculating the average value of the second weights of the same word-segmentation word in all the texts to be determined as the first weight of the same word-segmentation word.
5. The method according to claim 3 or 4, wherein the obtaining the heat weight of each word segmentation word according to the first weight comprises:
determining a third weight of the same word segmentation word according to the first weight corresponding to the same word segmentation word in each time period;
determining the heat weight of each word segmentation word according to the third weight and the first weight of each word segmentation word.
6. An apparatus for determining a hotspot event, the apparatus comprising:
the acquisition module is used for acquiring a plurality of texts to be determined in a preset time period;
the processing module is used for acquiring topic models corresponding to all the texts to be determined in the preset time period and determining a first topic conditional probability that each text to be determined belongs to different topics according to the topic models; the theme model comprises a plurality of themes; the first determining module is used for determining the heat weight of each word segmentation word in all the texts to be determined according to the first subject conditional probability;
the second determining module is used for determining a hot event from the texts to be determined according to the heat weight of each word segmentation word, wherein the hot event comprises a hot word and a hot clause;
wherein the first determining module comprises:
the first obtaining submodule is used for obtaining a second theme conditional probability that at least one word segmentation word in each text to be determined belongs to different themes;
a first determining submodule, configured to determine a topic weight of at least one word segmentation word in each text to be determined according to the first topic conditional probability and the second topic conditional probability;
the second determining submodule is used for determining the heat weight of each word segmentation word according to the theme weight;
the second determining module includes:
the second obtaining submodule is used for obtaining hot words with preset word quantity according to the heat weight of each word segmentation word;
a third obtaining submodule, configured to obtain to-be-determined clauses including the hot words from all the to-be-determined texts;
the sorting submodule is used for sorting a plurality of clause words included in the clause to be determined in a descending order according to the theme weight to obtain a sorting result;
a third determining submodule, configured to determine that the clause to be determined is a target clause when the weight rank of the hot word in the sorting result is less than or equal to a preset rank, and obtain a hot clause from the target clause;
and the fourth determining submodule is used for determining the hot words and the hot clauses as the hot events.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
8. An electronic device, comprising:
the computer-readable storage medium recited in claim 7; and
one or more processors to execute the program in the computer-readable storage medium.
CN201711484349.7A 2017-12-29 2017-12-29 Method and device for determining hot event, storage medium and electronic equipment Active CN108228808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711484349.7A CN108228808B (en) 2017-12-29 2017-12-29 Method and device for determining hot event, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711484349.7A CN108228808B (en) 2017-12-29 2017-12-29 Method and device for determining hot event, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN108228808A CN108228808A (en) 2018-06-29
CN108228808B true CN108228808B (en) 2020-07-31

Family

ID=62647311

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711484349.7A Active CN108228808B (en) 2017-12-29 2017-12-29 Method and device for determining hot event, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN108228808B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109151498B (en) * 2018-09-03 2021-02-09 北京达佳互联信息技术有限公司 Hotspot event processing method and device, server and storage medium
CN109739975B (en) * 2018-11-15 2021-03-09 东软集团股份有限公司 Hot event extraction method and device, readable storage medium and electronic equipment
CN109710944A (en) * 2018-12-29 2019-05-03 新华网股份有限公司 Hot word extracting method, device, electronic equipment and computer readable storage medium
CN112528018A (en) * 2020-12-01 2021-03-19 天津中科智能识别产业技术研究院有限公司 Hot news discovery method based on text mining
CN113076489B (en) * 2021-04-14 2022-09-13 合肥工业大学 Method for classifying social media user roles in public sentiment event
CN113822069B (en) * 2021-09-17 2024-03-12 国家计算机网络与信息安全管理中心 Sudden event early warning method and device based on meta-knowledge and electronic device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2211282A2 (en) * 2009-01-27 2010-07-28 Palo Alto Research Center Incorporated System and method for managing user attention by detecting hot and cold topics in social indexes
CN103970756A (en) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 Hot topic extracting method, device and server
WO2014127673A1 (en) * 2013-02-25 2014-08-28 Tencent Technology (Shenzhen) Company Limited Method and apparatus for acquiring hot topics
CN104216875A (en) * 2014-09-26 2014-12-17 中国科学院自动化研究所 Automatic microblog text abstracting method based on unsupervised key bigram extraction
CN104281645A (en) * 2014-08-27 2015-01-14 北京理工大学 Method for identifying emotion key sentence on basis of lexical semantics and syntactic dependency
CN105824959A (en) * 2016-03-31 2016-08-03 首都信息发展股份有限公司 Public opinion monitoring method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2211282A2 (en) * 2009-01-27 2010-07-28 Palo Alto Research Center Incorporated System and method for managing user attention by detecting hot and cold topics in social indexes
CN103970756A (en) * 2013-01-28 2014-08-06 腾讯科技(深圳)有限公司 Hot topic extracting method, device and server
WO2014127673A1 (en) * 2013-02-25 2014-08-28 Tencent Technology (Shenzhen) Company Limited Method and apparatus for acquiring hot topics
CN104281645A (en) * 2014-08-27 2015-01-14 北京理工大学 Method for identifying emotion key sentence on basis of lexical semantics and syntactic dependency
CN104216875A (en) * 2014-09-26 2014-12-17 中国科学院自动化研究所 Automatic microblog text abstracting method based on unsupervised key bigram extraction
CN105824959A (en) * 2016-03-31 2016-08-03 首都信息发展股份有限公司 Public opinion monitoring method and system

Also Published As

Publication number Publication date
CN108228808A (en) 2018-06-29

Similar Documents

Publication Publication Date Title
CN108228808B (en) Method and device for determining hot event, storage medium and electronic equipment
US11138378B2 (en) Intelligently summarizing and presenting textual responses with machine learning
US11093854B2 (en) Emoji recommendation method and device thereof
CN108804512B (en) Text classification model generation device and method and computer readable storage medium
CN106649818B (en) Application search intention identification method and device, application search method and server
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
Alam et al. Social media sentiment analysis through parallel dilated convolutional neural network for smart city applications
KR102288249B1 (en) Information processing method, terminal, and computer storage medium
CN107480162B (en) Search method, device and equipment based on artificial intelligence and computer readable storage medium
CN110888990B (en) Text recommendation method, device, equipment and medium
KR101830061B1 (en) Identifying activities using a hybrid user-activity model
EP3128439A1 (en) Text classification and transformation based on author
CN111428010B (en) Man-machine intelligent question-answering method and device
CN112395385B (en) Text generation method and device based on artificial intelligence, computer equipment and medium
CN111930792B (en) Labeling method and device for data resources, storage medium and electronic equipment
CN108345612B (en) Problem processing method and device for problem processing
CN111144120A (en) Training sentence acquisition method and device, storage medium and electronic equipment
CN111274797A (en) Intention recognition method, device and equipment for terminal and storage medium
CN113360622A (en) User dialogue information processing method and device and computer equipment
US11640420B2 (en) System and method for automatic summarization of content with event based analysis
CN110738056B (en) Method and device for generating information
CN112395391A (en) Concept graph construction method and device, computer equipment and storage medium
CN114722832A (en) Abstract extraction method, device, equipment and storage medium
CN111555960A (en) Method for generating information
CN113609833B (en) Dynamic file generation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant