CN112307336B - Hot spot information mining and previewing method and device, computer equipment and storage medium - Google Patents

Hot spot information mining and previewing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112307336B
CN112307336B CN202011189110.9A CN202011189110A CN112307336B CN 112307336 B CN112307336 B CN 112307336B CN 202011189110 A CN202011189110 A CN 202011189110A CN 112307336 B CN112307336 B CN 112307336B
Authority
CN
China
Prior art keywords
hot spot
spot information
information
propagation
abstract
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011189110.9A
Other languages
Chinese (zh)
Other versions
CN112307336A (en
Inventor
蔡静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202011189110.9A priority Critical patent/CN112307336B/en
Publication of CN112307336A publication Critical patent/CN112307336A/en
Application granted granted Critical
Publication of CN112307336B publication Critical patent/CN112307336B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a hot spot information mining and previewing method, a device, computer equipment and a storage medium, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: constructing a news topic identification classifier according to the pre-acquired interest content types of the users, and acquiring hot spot information matched with the interest content types of the users by utilizing the news topic identification classifier; extracting target hot spot information with high heat from the hot spot information according to the propagation frequency and propagation speed of the hot spot information; and extracting elements from the target hot spot information according to a pre-trained event extraction model, and extracting a core abstract from the extracted elements to generate an abstract express matched with the interest content type of the user. The method and the system can screen the hot spot information which accords with the user concern from the mass information, and generate the corresponding abstract express, so that the user can conveniently and quickly acquire valuable information.

Description

Hot spot information mining and previewing method and device, computer equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a hotspot information mining and previewing method, apparatus, computer device, and storage medium.
Background
At present, most of channels for users to acquire news are Internet media, but the content of the Internet media is too wide in spreading and reporting, information is flooded, users are difficult to screen effective information in the Internet media, the Internet media have respective flow supporting strategies, and news/general life knowledge/product knowledge and the like which are concerned by the users cannot be focused for target user groups.
To solve the above problem, the prior art generally needs to manually screen and refine dynamic news and hot topics matching the target user. However, manual screening and extraction requires a lot of search time to pick out high-quality content, and also requires a lot of effort to extract and write the abstract, which is extremely inefficient.
Disclosure of Invention
The invention aims to provide a hot spot information mining and previewing method, a hot spot information mining and previewing device, computer equipment and a storage medium, and aims to solve the problems of low hot spot information mining and previewing efficiency, inconvenience and the like in the prior art.
In a first aspect, an embodiment of the present invention provides a hotspot information mining and previewing method, including:
constructing a news topic identification classifier according to the pre-acquired interest content types of the users, and acquiring hot spot information matched with the interest content types of the users by utilizing the news topic identification classifier;
extracting target hot spot information with high heat from the hot spot information according to the propagation frequency and propagation speed of the hot spot information;
and extracting elements from the target hot spot information according to a pre-trained event extraction model, and extracting a core abstract from the extracted elements to generate an abstract express matched with the interest content type of the user.
In a second aspect, an embodiment of the present invention provides a hotspot information mining and previewing device, including:
the search tracking unit is used for constructing a news topic identification classifier according to the pre-acquired interest content types of the users, and acquiring hot spot information matched with the interest content types of the users by utilizing the news topic identification classifier;
a sorting unit for extracting high-heat target hot spot information from the hot spot information according to the propagation frequency and propagation rate of the hot spot information;
and the aggregation unit is used for extracting the elements of the target hot spot information according to a pre-trained event extraction model, and extracting the core abstract of the extracted elements to generate abstract express matched with the type of the interest content of the user.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the hotspot information mining and previewing method according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program when executed by a processor causes the processor to perform the hotspot information mining and previewing method according to the first aspect.
The embodiment of the invention provides a hot spot information mining and previewing method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: constructing a news topic identification classifier according to the pre-acquired interest content types of the users, and acquiring hot spot information matched with the interest content types of the users by utilizing the news topic identification classifier; extracting target hot spot information with high heat from the hot spot information according to the propagation frequency and propagation speed of the hot spot information; and extracting elements from the target hot spot information according to a pre-trained event extraction model, and extracting a core abstract from the extracted elements to generate an abstract express matched with the interest content type of the user. The method provided by the embodiment of the invention can screen the hot spot information which accords with the user concern from the massive information according to the personalized requirements of the user, and generate the corresponding abstract express, thereby facilitating the user to quickly acquire the valuable information.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a hot spot information mining and previewing method according to an embodiment of the present invention;
FIG. 2 is a schematic sub-flowchart of a hot spot information mining and previewing method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of another sub-flowchart of a hot spot information mining and previewing method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another sub-flowchart of a hot spot information mining and previewing method according to an embodiment of the present invention;
FIG. 5 is a schematic block diagram of a hot spot information mining and previewing apparatus according to an embodiment of the present invention;
FIG. 6 is a schematic block diagram illustrating a sub-unit of a hotspot information mining and previewing apparatus according to an embodiment of the present invention;
FIG. 7 is a schematic block diagram illustrating another sub-unit of a hotspot information mining and previewing apparatus according to an embodiment of the present invention;
FIG. 8 is a schematic block diagram of another sub-unit of a hotspot information mining and previewing apparatus according to an embodiment of the present invention;
fig. 9 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1, fig. 1 is a flowchart of a hot spot information mining and previewing method according to an embodiment of the present invention, which includes steps S101 to S103:
s101, constructing a news topic identification classifier according to a pre-acquired interest content type of a user, and acquiring hot spot information matched with the interest content type of the user by utilizing the news topic identification classifier;
s102, extracting target hot spot information with high heat from the hot spot information according to the propagation frequency and propagation speed of the hot spot information;
and S103, extracting elements from the target hot spot information according to a pre-trained event extraction model, and extracting a core abstract from the extracted elements to generate an abstract express matched with the type of the interest content of the user.
The method provided by the embodiment of the invention can screen the hot spot information which accords with the user concern from the massive information according to the personalized requirements of the user, and generate the corresponding abstract express, thereby facilitating the user to quickly acquire the valuable information.
In one embodiment, as shown in fig. 2, the step S101 includes steps S201 to S203:
s201, acquiring an information sample set matched with the interest content type of the user in advance, and adopting a naive Bayesian classifier for classification training to construct the news topic identification classifier;
firstly, the interest content type of the user is acquired, wherein the interest content type can be actively selected by the user, such as sports, finance, military and the like, and the interest content type can also be obtained through learning user behaviors, and the user behaviors refer to operations of the user in the past period, such as browsed news types, browsing time of various types of news, shared and commented news types and the like. In this way, the interest content type can be determined, which of course can be more than one, i.e. several news types can be selected as interest content types for the user.
After determining the content type of interest of the user, a news topic identification classifier may be constructed. The news topic identification classifier is mainly used for determining topics of news, and the interest content type of the user is input into the news topic identification classifier, so that the news topic identification classifier can classify news according to the interest content type and determine news of the interest content type.
In the process of constructing the news topic identification classifier, an information sample set matched with the interest content type of the user is acquired, and then the news topic identification classifier is constructed by adopting a naive Bayesian classifier for classification training. The naive Bayes classifier is a classification algorithm, and directly models joint probability P (x, c) by means of Bayes theorem to obtain a target probability value.
S202, tracking hot spot information propagation dynamics in the vertical field by utilizing the news topic identification classifier;
in this step, the news topic identification classifier is utilized to track the hot information propagation dynamics in the vertical domain, i.e. the vertical domain of the news interest type, i.e. the subdivision domain of the news interest type, so that the hot information of the corresponding type can be tracked more comprehensively.
The step mainly tracks the propagation dynamics of the hot spot information, such as the network propagation total quantity, comment number, propagation concurrency in unit time and the like of the hot spot information. Thus, the propagation state of the hot spot information can be recorded.
S203, determining the number of keywords in the hot spot information and the domain to which the keywords belong, carrying out weighted synthesis on the keywords in the hot spot information according to preset domain weights to obtain a domain value with the highest score of the hot spot information, and determining the hot spot information matched with the interest content type of the user according to the domain value.
Because of a piece of hot information, it may be fused with content in different areas, e.g., the same piece of news may include entertainment stars and sports stars, which requires determining the specific type of news. The embodiment can be integrated in a weighted manner, for example, statistics of entertainment component proportion and sports component proportion of a piece of news, and weight setting is performed on each component, so that a final score can be obtained, the type of the hot spot information is determined according to the score, and the hot spot information matched with the type of the user interest content is obtained.
Specifically, the number of keywords in the hot spot information and the domain to which each keyword belongs can be determined, and meanwhile, domain weights are preset for each domain, so that statistics can be performed on domain values in each domain in the hot spot information, for example, a piece of hot spot information contains 10 keywords, 5 keywords of the hot spot information are sports domains, 2 keywords are financial domains, 3 keywords are entertainment domains, the weight of the sports domains is 0.1, the weight of the financial domains is 0.15, and the weight of the entertainment domains is 0.12, the domain value of the hot spot information in the sports domains is 0.1×5=0.5, the domain value of the hot spot information in the financial domains is 0.15×2=0.3, the weight of the hot spot information in the entertainment domains is 0.12×3=0.36, the domain to which the hot spot information belongs is sports domains, and thus, the information matched with the interest content types of the hot spot information can be selected for users.
In one embodiment, as shown in fig. 3, the step S102 includes steps S301 to S304:
s301, acquiring the network propagation total quantity and the comment number of the hot spot information, and determining the propagation frequency of the hot spot information according to the network propagation total quantity and the comment number;
in this step, the method for determining the propagation frequency of the hotspot information is determined by the total network propagation amount and the comment number of the hotspot information, and certainly, because the total network propagation amount and the comment number of different hotspot information are possibly not in the same order of magnitude, the embodiment of the invention can normalize the total network propagation amount and the comment number, so that the total network propagation amount and the comment number of different hotspot information can be compared in the same dimension, and finally the propagation frequency of the hotspot information is determined.
S302, acquiring the propagation concurrency of the hot spot information in unit time, and determining the propagation rate of the hot spot information according to the propagation concurrency;
in this step, the propagation rate of the hot spot information is determined by the propagation concurrency in unit time, that is, the number of times the hot spot information propagates in unit time, and the propagation rate may represent the propagation speed of the hot spot information, thereby representing the popularity of the hot spot information to a certain extent. Similarly, the embodiment of the invention can normalize the propagation concurrency in unit time, so that the propagation concurrency in unit time of different hot spot information can be compared in the same dimension.
S303, weighting the propagation depth and the propagation breadth of the hot spot information according to the propagation frequency and the propagation speed weight of the hot spot information to obtain a hot spot value of each hot spot information;
the propagation frequency and propagation velocity of the hot spot information represent the heat degree of the hot spot information in different dimensions, the propagation frequency represents the propagation breadth and the propagation velocity represents the propagation depth, and the embodiment of the invention needs to comprehensively consider the data in the two dimensions, so that the weight is set for the propagation frequency and the propagation velocity, and the propagation depth and the propagation breadth of the hot spot information are weighted according to the weight, so that the hot spot value of the hot spot information is obtained, and the hot spot value represents the heat degree level of the hot spot information.
S304, ordering the weighted hot spot information according to the order of the hot spot values from big to small to obtain a hot spot information queue, and sequentially extracting a predetermined number of target hot spot information from the beginning to the end in the hot spot information queue.
The step can sort the hot spot information according to the order of the hot spot values of the hot spot information from big to small, the information heat degree of the front sorting is higher, and the information heat degree of the back sorting is lower.
After sorting, the preset number of hot spot information in front of the sorting can be extracted from the hot spot information, namely, the hot spot information is sequentially extracted from the beginning to the end, so that the preset number of target hot spot information is extracted, excessive hot spot information selection is avoided, and push content is simplified.
In one embodiment, as shown in fig. 4, the step S103 includes steps S401 to S403:
s401, constructing an event extraction model, and training and testing the event extraction model by adopting an information sample;
in the embodiment of the invention, an event extraction model is firstly constructed, and the event extraction model has the function of carrying out element identification and extraction on the hot spot information.
The embodiment of the invention adopts the information sample to train and test the event extraction model. Before training, firstly calibrating elements of an information sample, then identifying and extracting by using the event extraction model, comparing the identified and extracted elements with the calibrated elements, optimizing the event extraction model by adopting a loss function, enabling the event extraction model to achieve a convergence effect, and finally testing the trained event extraction model, wherein the online operation can be realized after the test requirement is met.
S402, performing text extraction and segmentation on the hot spot information by adopting the event extraction model to obtain elements of the hot spot information;
in this step, the foregoing predetermined number of hot spot information is identified and extracted by using the foregoing trained event extraction model, and the identification and extraction process is similar to the training process, except that the object processed in this step is the actual hot spot information.
The elements may be 5W1H, 5W refer to why Why, object What, place Where, time When white, person What, respectively, where 1H refers to method (He Fa How).
S403, performing sequence labeling on the extracted elements, and then executing a sentence sorting task to generate abstract express matched with the interest content types of the users.
In this step, the extracted elements are subjected to core abstract extraction, and the purpose of the core abstract extraction is to further extract key information from the elements, so that the information content is further simplified, and a user can know the news profile without spending too much time. The result of the core abstract extraction is an abstract express matching the user's interest content type.
The abstract express according to the embodiment of the invention can be divided into a single document abstract and a multi-document abstract according to the input type. A single document digest creates a digest from a given one of the documents, and a multiple document digest creates a digest from a given set of the subject-related documents.
The abstract express in the embodiment of the invention can be generated in a removable abstract mode, namely, keywords and key sentences are directly selected from the original text to form an abstract. The method has low error rate in grammar and syntax, and ensures certain effect.
In the process of extracting and executing the core abstract in the step, the elements can be marked in sequence first, and then the sentence sorting task is executed.
Before sequence labeling, word segmentation operation can be performed, for example, a dictionary-based maximum matching word segmentation method, a word sequence labeling-based method or a transfer-based word segmentation method can be adopted.
The sequence labeling process mainly comprises the steps of part-of-speech labeling, semantic role labeling, information extraction and information integration.
Part-of-speech tagging refers to a given sentence and assigning a category to each word in the sentence, which may be nouns, verbs, adjectives, etc.
The information extraction is to process unstructured/semi-structured text input (such as news web pages, commodity pages, microblogs, forum pages and the like), extract various structured information such as entities, relationships, commodity records, lists, attributes and the like, and integrate the information at different levels in a manner such as knowledge deduplication, knowledge linking, knowledge system construction and the like. The information extraction is mainly completed through named entity recognition, relation extraction, event extraction and information integration steps.
Named entity recognition is the task of identifying entities of specified categories in text, mainly including person names, place names, institution names, proper nouns, etc. Named entity recognition consists of two parts: entity boundary recognition and entity classification, wherein the entity boundary recognition is to judge whether a character string is an entity or not, and the entity classification is to divide the recognized entity into different preset classes.
Relationship extraction is the detection and identification of semantic relationships between entities in text and linking "mentions" representing the same semantic relationship. The output of the relationship extraction is a triplet (entity 1, relationship class, entity 2) indicating that there is a semantic relationship of a particular class between entity 1 and entity 2. The relationship category can be preset, or can be automatically found (i.e. extracted according to open domain information) according to the requirement. The relation extraction consists of two core parts: and the relationship detection and the relationship classification, wherein the relationship detection judges whether a semantic relationship exists between the two entities, and the relationship classification divides the entity pair with the semantic relationship into pre-designated categories. Under certain scenarios and tasks, relationship discovery may also be included, the primary purpose of which is to discover entities and semantic relationship categories that exist between entities.
Event extraction is the extraction of event information from unstructured text and its presentation in structured form. Event extraction includes event type identification and event element population. Event type recognition is used to determine whether a sentence expresses a particular type of event. The event type determines the template of the event representation, with different types of events having different templates. Event elements are key elements for forming an event, and event element filling is to extract corresponding elements according to an associated event template and label the corresponding elements.
Since entities, relationships and events represent information of different granularity in a single text, respectively. In the embodiment of the invention, the information from different data sources and different texts is needed to be integrated for decision, and the embodiment of the invention adopts an information integration method to complete the clustering of the information, and specifically comprises coreference resolution and entity linking. The process of coreference resolution is to detect and link together different references to the same entity/relationship/event. An entity link is a real world entity that determines the entity name points to.
In general, the sequence labeling marks 0 and 1 on each sentence representation, and is distinguished from the fact that the sentence sorting task is to output the probability of whether each sentence is a summary sentence or not according to each sentence, and finally, according to the probability, a plurality of sentences are selected as final summaries.
In the embodiment of the invention, the generation of the abstract express can be completed by combining a Seq2Seq model and reinforcement learning on the basis of sequence labeling. Specifically, a sentence compression model is learned by using the Seq2Seq, the sentence compression model is used for measuring the quality of a selected sentence, and model training is completed by combining reinforcement learning. In this embodiment, the model is trained without calculating the label-level penalty using the sequence labeling method, but rather, the sequence labeling is used as an intermediate step. After the probability distribution of the sequence annotation is obtained, the candidate abstract set is sampled, and the loss is calculated by comparing with the standard abstract, so that the information in the standard abstract can be better utilized.
In one embodiment, the hotspot information mining and previewing method further includes:
constructing a user portrait according to the interest content type of the user;
cross-comparing the attribute characteristics of the user portraits with the content labels of the hot spot information to obtain matched hot spot information;
and carrying out element identification and extraction on the hot spot information, and generating abstract express matched with the interest content type of the user.
In this embodiment, for user portrait construction, user reading preference data may be introduced based on a large number of content classification understanding, semantic association events, association knowledge, etc., to obtain user attribute features, and user portraits may be constructed according to the user attribute features.
And then carrying out real-time hot spot mining according to the attribute characteristics to obtain matched hot spot information, and then carrying out element identification and extraction to generate abstract express. Reference may be made to the methods of the previous embodiments with respect to the element identification and extraction and summary delivery generation process. Unlike the previous embodiments, this embodiment performs hot news mining in combination with user portraits and makes recommendations, which incorporate user attribute features. For example, the input is suitable for female to read, and the information reporting scheme can be automatically combined: the current hot information content in the fields of civil policy, fashion, child care, cosmetology and the like.
In one embodiment, the hotspot information mining and previewing method further includes:
establishing a feature library of the appointed date, date association historical event, association knowledge and associated social phenomenon based on the appointed date with the appointed meaning;
when the current date is the appointed date, extracting date features from the feature library, and mining hot spot information matched with the date features according to the date features;
and carrying out element identification and extraction on the hot spot information, and generating abstract express matched with the interest content type of the user.
In the step, a feature library of appointed date, date associated historical event, associated knowledge and associated social phenomenon can be established based on common festival or characteristic meaning date at home and abroad. After the current date reaches the appointed date, date features can be extracted from the feature library, the contents of the scenes are selected by utilizing the date features, and the information reporting scheme is automatically combined to generate the abstract express. For example, the information reporting scheme can be automatically combined by inputting spring festival: spring ticketing schedule, travel spot dynamics, social gift hot flashes, time-of-day life general knowledge/skills, etc. Unlike the foregoing embodiments, the present embodiment performs hot spot mining based on date, and the mined hot spot information is content related to date, and the process of element identification and extraction and summary delivery generation can refer to the method of the foregoing embodiments.
Referring to fig. 5, fig. 5 is a schematic block diagram of a hot spot information mining and previewing apparatus according to an embodiment of the present invention, wherein the hot spot information mining and previewing apparatus 500 includes:
a search tracking unit 501, configured to construct a news topic identification classifier according to a pre-acquired interest content type of a user, and acquire hot spot information matched with the interest content type of the user by using the news topic identification classifier;
a sorting unit 502, configured to extract high-heat target hotspot information from the hotspot information according to the propagation frequency and propagation rate of the hotspot information;
and the aggregation unit 503 is configured to extract elements from the target hotspot information according to a pre-trained event extraction model, and extract a core abstract from the extracted elements to generate an abstract express matching with the interest content type of the user.
In one embodiment, as shown in fig. 6, the search tracking unit 501 includes:
the classifier construction unit 601 is configured to obtain in advance an information sample set matched with the interest content type of the user, and adopt a naive bayes classifier for classification training to construct the news topic identification classifier;
the dynamic tracking unit 602 is configured to track the hot spot information propagation dynamics in the vertical domain by using the news topic identification classifier;
and the weighting synthesis unit 603 is configured to determine the number of keywords in the hotspot information and the domain to which each keyword belongs, perform weighting synthesis on each keyword in the hotspot information according to a preset domain weight, obtain a domain value with the highest score of the hotspot information, and determine the hotspot information matched with the interest content type of the user according to the domain value.
In one embodiment, as shown in fig. 7, the sorting unit 502 includes:
a propagation frequency calculating unit 701, configured to obtain a total network propagation amount and a comment number of the hotspot information, and determine a propagation frequency of the hotspot information according to the total network propagation amount and the comment number;
a propagation rate calculating unit 702, configured to obtain a propagation concurrency amount of the hotspot information in a unit time, and determine a propagation rate of the hotspot information according to the propagation concurrency amount;
a weighting unit 703, configured to weight the propagation depth and propagation breadth of the hotspot information according to the propagation frequency and propagation rate of the hotspot information to obtain a hotspot value of each hotspot information;
and a screening unit 704, configured to sort the weighted hotspot information according to a sequence of from the big hotspot value to the small hotspot value to obtain a hotspot information queue, and sequentially extract a predetermined number of target hotspot information from the beginning to the end in the hotspot information queue.
In one embodiment, as shown in fig. 8, the aggregation unit 503 includes:
a training test unit 801, configured to construct an event extraction model, and train and test the event extraction model by using an information sample;
an extraction and segmentation unit 802, configured to extract and segment text of the hotspot information by using the event extraction model, so as to obtain elements of the hotspot information;
and the labeling and sorting unit 803 is used for sequentially labeling the extracted elements, then executing sentence sorting tasks and generating abstract express matched with the interest content types of the users.
In one embodiment, the sequence tags include part-of-speech tags, semantic role tags, information extraction and information integration.
In one embodiment, the hotspot information mining and previewing apparatus 500 further comprises:
the portrait construction unit is used for constructing a user portrait according to the interest content type of the user;
the portrait matching unit is used for carrying out cross comparison on the attribute characteristics of the user portrait and the content labels of the hot spot information to obtain the matched hot spot information;
and the portrait mining unit is used for carrying out element identification and extraction on the hot spot information and generating abstract express matched with the interest content type of the user.
In one embodiment, the hotspot information mining and previewing apparatus 500 further comprises:
a date establishing unit for establishing a feature library of appointed date, date association history event, association knowledge and association social phenomenon based on appointed date with appointed meaning;
a date matching unit, configured to extract date features from the feature library when a current date is the specified date, and mine hotspot information matched with the date features according to the date features;
and the date mining unit is used for carrying out element identification and extraction on the hot spot information and generating abstract express matched with the interest content type of the user.
According to the device provided by the embodiment of the invention, the hot spot information which accords with the user concern is screened from the massive information according to the personalized requirements of the user, and the corresponding abstract express is generated, so that the user can conveniently and quickly acquire valuable information.
The hot spot information mining and previewing apparatus 500 may be implemented in the form of a computer program, which may run on a computer device as illustrated in FIG. 9.
Referring to fig. 9, fig. 9 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 900 is a server, and the server may be a stand-alone server or a server cluster formed by a plurality of servers.
With reference to fig. 9, the computer device 900 includes a processor 902, a memory, and a network interface 905, which are connected by a system bus 901, wherein the memory may include a non-volatile storage medium 903 and an internal memory 904.
The non-volatile storage medium 903 may store an operating system 9031 and a computer program 9032. The computer program 9032, when executed, causes the processor 902 to perform the hotspot information mining and preview method.
The processor 902 is operative to provide computing and control capabilities supporting the operation of the entire computer device 900.
The internal memory 904 provides an environment for the execution of the computer program 9032 in the non-volatile storage medium 903, which computer program 9032, when executed by the processor 902, causes the processor 902 to perform the hotspot information mining and previewing method.
The network interface 905 is used for network communication such as providing transmission of data information, etc. It will be appreciated by those skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computer device 900 to which the present inventive arrangements may be implemented, and that a particular computer device 900 may include more or less components than those shown, or may combine some components, or have a different arrangement of components.
The processor 902 is configured to execute a computer program 9032 stored in a memory, so as to implement the following functions: constructing a news topic identification classifier according to the pre-acquired interest content types of the users, and acquiring hot spot information matched with the interest content types of the users by utilizing the news topic identification classifier; extracting target hot spot information with high heat from the hot spot information according to the propagation frequency and propagation speed of the hot spot information; and extracting elements from the target hot spot information according to a pre-trained event extraction model, and extracting a core abstract from the extracted elements to generate an abstract express matched with the interest content type of the user.
Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 9 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 9, and will not be described again.
It should be appreciated that in an embodiment of the invention, the processor 902 may be a central processing unit (Central Processing Unit, CPU), the processor 902 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor performs the steps of: constructing a news topic identification classifier according to the pre-acquired interest content types of the users, and acquiring hot spot information matched with the interest content types of the users by utilizing the news topic identification classifier; extracting target hot spot information with high heat from the hot spot information according to the propagation frequency and propagation speed of the hot spot information; and extracting elements from the target hot spot information according to a pre-trained event extraction model, and extracting a core abstract from the extracted elements to generate an abstract express matched with the interest content type of the user.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (7)

1. The hot spot information mining and previewing method is characterized by comprising the following steps:
constructing a news topic identification classifier according to the pre-acquired interest content types of the users, and acquiring hot spot information matched with the interest content types of the users by utilizing the news topic identification classifier;
extracting target hot spot information with high heat from the hot spot information according to the propagation frequency and propagation speed of the hot spot information;
extracting elements from the target hot spot information according to a pre-trained event extraction model, and extracting a core abstract from the extracted elements to generate an abstract express matched with the interest content type of the user;
the method for constructing a news topic identification classifier according to the pre-acquired interest content types of the users and acquiring hot spot information matched with the interest content types of the users by utilizing the news topic identification classifier comprises the following steps: an information sample set matched with the interest content type of the user is obtained in advance, and a naive Bayesian classifier is adopted for classification training to construct the news topic identification classifier; tracking hot information propagation dynamics in the vertical field by utilizing the news topic identification classifier, wherein the propagation dynamics comprise: the network transmission total quantity, comment quantity and transmission concurrency quantity in unit time of the hot spot information; determining the number of keywords in the hot spot information and the domain to which the keywords belong, weighting and synthesizing the keywords in the hot spot information according to preset domain weights to obtain a domain value with the highest score of the hot spot information, and determining hot spot information matched with the interest content type of the user according to the domain value;
the extracting the target hot spot information with high heat from the hot spot information according to the propagation frequency and propagation speed of the hot spot information comprises the following steps: acquiring the network propagation total quantity and the comment number of the hot spot information, and determining the propagation frequency of the hot spot information according to the network propagation total quantity and the comment number; acquiring the propagation concurrency of the hot spot information in unit time, and determining the propagation rate of the hot spot information according to the propagation concurrency; weighting the propagation depth and the propagation breadth of the hot spot information according to the propagation frequency and the propagation speed weight of the hot spot information to obtain a hot spot value of each hot spot information; ordering the weighted hot spot information according to the order of hot spot values from big to small to obtain a hot spot information queue, and sequentially extracting a preset number of target hot spot information from the beginning to the end in the hot spot information queue;
the element extraction is carried out on the target hot spot information according to a pre-trained event extraction model, and core abstract extraction is carried out on the extracted element to generate abstract express matched with the interest content type of the user, and the method comprises the following steps: constructing an event extraction model, and training and testing the event extraction model by adopting an information sample; text extraction and segmentation are carried out on the hot spot information by adopting the event extraction model, so that elements of the hot spot information are obtained; and (3) performing sequence labeling on the extracted elements, then executing a sentence sorting task, after obtaining probability distribution of sequence labeling, learning a sentence compression model by using the Seq2Seq, measuring the quality of sentences by using the sentence compression model, sampling a candidate abstract set, comparing the candidate abstract set with a standard abstract, calculating loss, and completing model training by combining reinforcement learning to generate abstract express matched with the interest content type of the user.
2. The hotspot information mining and previewing method according to claim 1, wherein the sequence tags comprise part-of-speech tags, semantic role tags, information extraction and information integration.
3. The hotspot information mining and previewing method according to claim 1, further comprising:
constructing a user portrait according to the interest content type of the user;
cross-comparing the attribute characteristics of the user portraits with the content labels of the hot spot information to obtain matched hot spot information;
and carrying out element identification and extraction on the hot spot information, and generating abstract express matched with the interest content type of the user.
4. The hotspot information mining and previewing method according to claim 1, further comprising:
establishing a feature library of the appointed date, date association historical event, association knowledge and associated social phenomenon based on the appointed date with the appointed meaning;
when the current date is the appointed date, extracting date features from the feature library, and mining hot spot information matched with the date features according to the date features;
and carrying out element identification and extraction on the hot spot information, and generating abstract express matched with the interest content type of the user.
5. A hotspot information mining and previewing device, comprising:
the search tracking unit is used for constructing a news topic identification classifier according to the pre-acquired interest content types of the users, and acquiring hot spot information matched with the interest content types of the users by utilizing the news topic identification classifier;
a sorting unit for extracting high-heat target hot spot information from the hot spot information according to the propagation frequency and propagation rate of the hot spot information;
the aggregation unit is used for extracting elements from the target hot spot information according to a pre-trained event extraction model, and extracting a core abstract from the extracted elements to generate an abstract express matched with the type of the interest content of the user;
the search tracking unit includes:
the classifier construction unit is used for acquiring an information sample set matched with the interest content type of the user in advance, and constructing the news topic identification classifier by adopting a naive Bayesian classifier for classification training;
the dynamic tracking unit is used for tracking the hot information propagation dynamic in the vertical field by utilizing the news topic identification classifier, and the propagation dynamic comprises: the network transmission total quantity, comment quantity and transmission concurrency quantity in unit time of the hot spot information;
the weighting synthesis unit is used for determining the number of the keywords in the hot spot information and the domain to which the keywords belong, carrying out weighting synthesis on the keywords in the hot spot information according to preset domain weights to obtain a domain value with the highest score of the hot spot information, and determining hot spot information matched with the interest content type of the user according to the domain value;
the sorting unit includes:
the propagation frequency calculation unit is used for acquiring the network propagation total quantity and the comment number of the hot spot information, and determining the propagation frequency of the hot spot information according to the network propagation total quantity and the comment number;
the transmission rate calculation unit is used for obtaining the transmission concurrency of the hot spot information in unit time and determining the transmission rate of the hot spot information according to the transmission concurrency;
the weighting unit is used for weighting the propagation depth and the propagation breadth of the hot spot information according to the propagation frequency and the propagation speed weight of the hot spot information to obtain a hot spot value of each hot spot information;
the screening unit is used for sequencing the weighted hot spot information according to the sequence of the hot spot values from large to small to obtain a hot spot information queue, and sequentially extracting a preset number of target hot spot information from the beginning to the end in the hot spot information queue;
the polymerization unit includes:
the training test unit is used for constructing an event extraction model and training and testing the event extraction model by adopting an information sample;
the extraction and segmentation unit is used for carrying out text extraction and segmentation on the hot spot information by adopting the event extraction model to obtain elements of the hot spot information;
the annotation ordering unit is used for carrying out sequence annotation on the extracted elements, then executing sentence ordering task, learning a sentence compression model by using the Seq2Seq after obtaining probability distribution of the sequence annotation, using the sentence compression model to measure the quality of sentences, sampling candidate abstract sets, comparing the candidate abstract sets with standard abstract sets to calculate loss, and combining reinforcement learning to finish model training to generate abstract express matched with the interest content types of the users.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the hotspot information mining and preview method of any of claims 1 to 4 when the computer program is executed by the processor.
7. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which when executed by a processor causes the processor to perform the hotspot information mining and preview method according to any one of claims 1 to 4.
CN202011189110.9A 2020-10-30 2020-10-30 Hot spot information mining and previewing method and device, computer equipment and storage medium Active CN112307336B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011189110.9A CN112307336B (en) 2020-10-30 2020-10-30 Hot spot information mining and previewing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011189110.9A CN112307336B (en) 2020-10-30 2020-10-30 Hot spot information mining and previewing method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112307336A CN112307336A (en) 2021-02-02
CN112307336B true CN112307336B (en) 2024-04-16

Family

ID=74332542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011189110.9A Active CN112307336B (en) 2020-10-30 2020-10-30 Hot spot information mining and previewing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112307336B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836110B (en) * 2021-02-07 2022-09-16 四川封面传媒有限责任公司 Hotspot information mining method and device, computer equipment and storage medium
CN113158671B (en) * 2021-03-25 2023-08-11 胡明昊 Open domain information extraction method combined with named entity identification
CN113407842B (en) * 2021-06-28 2024-03-22 携程旅游信息技术(上海)有限公司 Model training method, theme recommendation reason acquisition method and system and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN104657496A (en) * 2015-03-09 2015-05-27 杭州朗和科技有限公司 Method and equipment for calculating information hot value
CN107992531A (en) * 2017-11-21 2018-05-04 吉浦斯信息咨询(深圳)有限公司 News personalization intelligent recommendation method and system based on deep learning
CN109033074A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 News in brief generation method, device, equipment and computer-readable medium
CN109800350A (en) * 2018-12-21 2019-05-24 中国电子科技集团公司信息科学研究院 A kind of Personalize News recommended method and system, storage medium
CN110489542A (en) * 2019-08-10 2019-11-22 刘莎 A kind of auto-abstracting method of internet web page and text information
CN111241410A (en) * 2020-01-22 2020-06-05 深圳司南数据服务有限公司 Industry news recommendation method and terminal
CN111753197A (en) * 2020-06-18 2020-10-09 达而观信息科技(上海)有限公司 News element extraction method and device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062389A (en) * 2017-12-15 2018-05-22 北京百度网讯科技有限公司 Bulletin generation method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN104657496A (en) * 2015-03-09 2015-05-27 杭州朗和科技有限公司 Method and equipment for calculating information hot value
CN107992531A (en) * 2017-11-21 2018-05-04 吉浦斯信息咨询(深圳)有限公司 News personalization intelligent recommendation method and system based on deep learning
CN109033074A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 News in brief generation method, device, equipment and computer-readable medium
CN109800350A (en) * 2018-12-21 2019-05-24 中国电子科技集团公司信息科学研究院 A kind of Personalize News recommended method and system, storage medium
CN110489542A (en) * 2019-08-10 2019-11-22 刘莎 A kind of auto-abstracting method of internet web page and text information
CN111241410A (en) * 2020-01-22 2020-06-05 深圳司南数据服务有限公司 Industry news recommendation method and terminal
CN111753197A (en) * 2020-06-18 2020-10-09 达而观信息科技(上海)有限公司 News element extraction method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112307336A (en) 2021-02-02

Similar Documents

Publication Publication Date Title
CN106815297B (en) Academic resource recommendation service system and method
CN112307336B (en) Hot spot information mining and previewing method and device, computer equipment and storage medium
US20200202071A1 (en) Content scoring
WO2016179938A1 (en) Method and device for question recommendation
US8027977B2 (en) Recommending content using discriminatively trained document similarity
US9785888B2 (en) Information processing apparatus, information processing method, and program for prediction model generated based on evaluation information
US20150213361A1 (en) Predicting interesting things and concepts in content
CN109670039A (en) Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering
CN110888990A (en) Text recommendation method, device, equipment and medium
Khatter et al. An intelligent personalized web blog searching technique using fuzzy-based feedback recurrent neural network
CN110008309A (en) A kind of short phrase picking method and device
Xiong et al. Affective impression: Sentiment-awareness POI suggestion via embedding in heterogeneous LBSNs
WO2015084757A1 (en) Systems and methods for processing data stored in a database
CN116049379A (en) Knowledge recommendation method, knowledge recommendation device, electronic equipment and storage medium
CN113934835A (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
CN107908649B (en) Text classification control method
Wei et al. Online education recommendation model based on user behavior data analysis
CN112084376A (en) Map knowledge based recommendation method and system and electronic device
CN111859955A (en) Public opinion data analysis model based on deep learning
CN105975508A (en) Personalized meta-search engine searched result merging and sorting method
CN106372147B (en) Heterogeneous topic network construction and visualization method based on text network
US11954137B2 (en) Data generation device and data generation method
Liao et al. TIRR: A code reviewer recommendation algorithm with topic model and reviewer influence
Chin Knowledge transfer: what, how, and why
CN111538898B (en) Web service package recommendation method and system based on combined feature extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant