CN112307336A - Hotspot information mining and previewing method and device, computer equipment and storage medium - Google Patents

Hotspot information mining and previewing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112307336A
CN112307336A CN202011189110.9A CN202011189110A CN112307336A CN 112307336 A CN112307336 A CN 112307336A CN 202011189110 A CN202011189110 A CN 202011189110A CN 112307336 A CN112307336 A CN 112307336A
Authority
CN
China
Prior art keywords
information
hotspot information
user
hot spot
hotspot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011189110.9A
Other languages
Chinese (zh)
Other versions
CN112307336B (en
Inventor
蔡静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202011189110.9A priority Critical patent/CN112307336B/en
Publication of CN112307336A publication Critical patent/CN112307336A/en
Application granted granted Critical
Publication of CN112307336B publication Critical patent/CN112307336B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a method and a device for mining and previewing hotspot information, computer equipment and a storage medium, which relate to the technical field of artificial intelligence, and the method comprises the following steps: constructing a news theme recognition classifier according to the type of the interest content of the user, which is acquired in advance, and acquiring hot information matched with the type of the interest content of the user by using the news theme recognition classifier; extracting high-heat target hotspot information from the hotspot information according to the propagation frequency and the propagation rate of the hotspot information; and extracting elements of the target hotspot information according to a pre-trained event extraction model, and performing core abstract extraction on the extracted elements to generate abstract express delivery matched with the interest content types of the users. The invention can discriminate the hotspot information which accords with the attention of the user from the mass information and generate the corresponding abstract express delivery, thereby facilitating the user to quickly obtain valuable information.

Description

Hotspot information mining and previewing method and device, computer equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for mining and previewing hotspot information, computer equipment and a storage medium.
Background
At present, most of channels for users to acquire news are internet media, but the contents of internet media propagation and reporting are too wide, information is flooded, users are difficult to discriminate effective information in the information, and the internet media have respective flow supporting strategies, so that news, general living knowledge, product knowledge and the like concerned by the users cannot be focused and reported for target user groups.
In order to solve the above problems, the prior art generally needs to manually screen and refine dynamic news and hot topics matching the target user. However, manual screening and refining need to spend a lot of search time to select high-quality content, and also need to spend a lot of effort to abstract, extract and write, which is extremely inefficient.
Disclosure of Invention
The invention aims to provide a hotspot information mining and previewing method, a hotspot information mining and previewing device, computer equipment and a storage medium, and aims to solve the problems of low hotspot information mining and previewing efficiency, inconvenience and the like in the prior art.
In a first aspect, an embodiment of the present invention provides a method for mining and previewing hotspot information, where the method includes:
constructing a news theme recognition classifier according to the type of the interest content of the user, which is acquired in advance, and acquiring hot information matched with the type of the interest content of the user by using the news theme recognition classifier;
extracting high-heat target hotspot information from the hotspot information according to the propagation frequency and the propagation rate of the hotspot information;
and extracting elements of the target hotspot information according to a pre-trained event extraction model, and performing core abstract extraction on the extracted elements to generate abstract express delivery matched with the interest content types of the users.
In a second aspect, an embodiment of the present invention provides a hotspot information mining and previewing apparatus, including:
the searching and tracking unit is used for constructing a news theme recognition classifier according to the type of the interest content of the user, which is acquired in advance, and acquiring hot spot information matched with the type of the interest content of the user by using the news theme recognition classifier;
the sorting unit is used for extracting high-heat target hotspot information from the hotspot information according to the propagation frequency and the propagation rate of the hotspot information;
and the aggregation unit is used for extracting elements of the target hotspot information according to a pre-trained event extraction model, and performing core abstract extraction on the extracted elements to generate abstract express delivery matched with the interest content type of the user.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the hotspot information mining and previewing method according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the hotspot information mining and previewing method according to the first aspect.
The embodiment of the invention provides a method and a device for mining and previewing hotspot information, computer equipment and a storage medium, wherein the method comprises the following steps: constructing a news theme recognition classifier according to the type of the interest content of the user, which is acquired in advance, and acquiring hot information matched with the type of the interest content of the user by using the news theme recognition classifier; extracting high-heat target hotspot information from the hotspot information according to the propagation frequency and the propagation rate of the hotspot information; and extracting elements of the target hotspot information according to a pre-trained event extraction model, and performing core abstract extraction on the extracted elements to generate abstract express delivery matched with the interest content types of the users. The method provided by the embodiment of the invention can be used for screening the hot information which accords with the attention of the user from the mass information according to the personalized requirements of the user and generating the corresponding abstract express delivery, thereby facilitating the user to quickly obtain valuable information.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for mining and previewing hot spot information according to an embodiment of the present invention;
FIG. 2 is a schematic sub-flow chart of a method for mining and previewing hot spot information according to an embodiment of the present invention;
FIG. 3 is a schematic view of another sub-process of a method for mining and previewing hot spot information according to an embodiment of the present invention;
FIG. 4 is a schematic view of another sub-process of a method for mining and previewing hot spot information according to an embodiment of the present invention;
FIG. 5 is a schematic block diagram of a hotspot information mining and previewing device according to an embodiment of the present invention;
FIG. 6 is a schematic block diagram of a sub-unit of a hotspot information mining and previewing device according to an embodiment of the present invention;
FIG. 7 is a schematic block diagram of another sub-unit of an apparatus for mining and previewing hot spot information according to an embodiment of the present invention;
FIG. 8 is a schematic block diagram of another sub-unit of an apparatus for mining and previewing hot spot information according to an embodiment of the present invention;
FIG. 9 is a schematic block diagram of a computer device provided by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Referring to fig. 1, fig. 1 is a schematic flow chart of a hotspot information mining and previewing method according to an embodiment of the present invention, which includes steps S101 to S103:
s101, constructing a news theme recognition classifier according to the type of the interest content of the user, which is acquired in advance, and acquiring hot spot information matched with the type of the interest content of the user by using the news theme recognition classifier;
s102, extracting high-heat target hotspot information from the hotspot information according to the propagation frequency and the propagation rate of the hotspot information;
s103, extracting elements of the target hotspot information according to a pre-trained event extraction model, and performing core abstract extraction on the extracted elements to generate abstract express delivery matched with the interest content type of the user.
The method provided by the embodiment of the invention can be used for screening the hot information which accords with the attention of the user from the mass information according to the personalized requirements of the user and generating the corresponding abstract express delivery, thereby facilitating the user to quickly obtain valuable information.
In one embodiment, as shown in fig. 2, the step S101 includes steps S201 to S203:
s201, acquiring an information sample set matched with the interest content type of the user in advance, and constructing the news topic identification classifier by adopting naive Bayes classifier classification training;
the method comprises the steps of firstly obtaining an interest content type of a user, wherein the interest content type can be actively selected by the user, such as sports, finance, military affairs and the like, or obtained by learning user behaviors, and the user behaviors refer to user operations in a past period, such as browsed news types, browsing time of various types of news, shared and commented news types and the like. In this way, the type of the content of interest can be determined, and of course, the type of the content of interest can be more than one, that is, several news types can be selected as the type of the content of interest of the user.
Having determined the content type of interest of the user, a news topic identification classifier may be constructed. The news topic identification classifier is mainly used for determining news topics and inputting the interest content types of the users into the news topic identification classifier, so that the news topic identification classifier can classify news according to the interest content types and determine news of the interest content types.
In the process of constructing the news topic identification classifier, an information sample set matched with the interest content type of the user is acquired first, and then classification training is carried out by adopting a naive Bayesian classifier, so that the news topic identification classifier is constructed. The naive Bayes classifier is a classification algorithm, and the target probability value is obtained by directly modeling the joint probability P (x, c) by means of Bayes' theorem.
S202, tracking hot spot information propagation dynamics in a vertical field by using the news topic identification classifier;
in this step, the news topic identification classifier is used for tracking the hot spot information propagation dynamics in the vertical field, wherein the vertical field refers to the vertical field of the news interest type, namely the subdivision field of the news interest type, so that the hot spot information of the corresponding type can be more comprehensively tracked.
The step is mainly to track the propagation dynamics of the hot spot information, such as the total network propagation amount, the number of comments, the propagation concurrency amount in unit time, and the like of the hot spot information. Thus, the propagation status of the hot spot information can be recorded.
S203, determining the number of the keywords in the hotspot information and the field to which the keywords belong, performing weighted synthesis on the keywords in the hotspot information according to preset field weight to obtain a field value with the highest score of the hotspot information, and determining the hotspot information matched with the interest content type of the user according to the field value.
Because a piece of hot information may be merged with contents in different fields, for example, the same news may include entertainment stars and sports stars, which requires determining the specific type of the news. The embodiment can be integrated in a weighting mode, for example, statistics of entertainment component proportion and sports component proportion of a piece of news is performed, and weights are set for the components, so that a final score can be obtained, the type of the hot spot information is determined according to the score, and the hot spot information is matched with the interest content type of the user, so that the hot spot information matched with the interest content type of the user is obtained.
Specifically, the number of keywords in the hot spot information and the domain to which each keyword belongs can be determined, meanwhile, a domain weight is preset for each domain, so that the domain values of each domain in the hot spot information can be counted, for example, a hot spot information contains 10 keywords, wherein 5 keywords are in the sports field, 2 keywords are in the financial field, 3 keywords are in the entertainment field, while the weight of the sports field is 0.1, the weight of the finance field is 0.15, the weight of the entertainment field is 0.12, the domain value of the hot spot information in the sports field is 0.1 × 5-0.5, the domain value of the hot spot information in the financial field is 0.15 × 2-0.3, the weight of the hot spot information in the entertainment field is 0.12 × 3-0.36, therefore, the domain to which the hot spot information belongs is the sports domain, so that the domain to which each piece of hot spot information belongs can be obtained, and the hot spot information matched with the interested content type of the user can be screened out.
In one embodiment, as shown in fig. 3, the step S102 includes steps S301 to S304:
s301, acquiring the network propagation total amount and the number of comments of the hotspot information, and determining the propagation frequency of the hotspot information according to the network propagation total amount and the number of comments;
in this step, the manner of determining the propagation frequency of the hotspot information is determined by the total network propagation amount and the number of comments of the hotspot information, and certainly, since the total network propagation amount and the number of comments of different hotspot information may not be in the same order of magnitude, the embodiment of the present invention may further perform normalization processing on the total network propagation amount and the number of comments, so that the total network propagation amount and the number of comments of different hotspot information can be compared in the same dimension, and the propagation frequency of the hotspot information is finally determined.
S302, acquiring the propagation concurrency of the hotspot information in unit time, and determining the propagation rate of the hotspot information according to the propagation concurrency;
in this step, the propagation rate of the hot spot information is determined by the propagation concurrency amount in unit time, that is, the number of times the hot spot information propagates in unit time, and the propagation rate can represent the propagation speed of the hot spot information, thereby representing the popularity of the hot spot information to a certain extent. Similarly, the embodiment of the invention can perform normalization processing on the propagation concurrency amount in unit time, so that the propagation concurrency amounts in unit time of different hot spot information can be compared in the same dimension.
S303, weighting the propagation depth and the propagation width of the hotspot information according to the propagation frequency and the propagation rate weight of the hotspot information to obtain a hotspot value of each hotspot information;
the propagation frequency and the propagation rate of the hot spot information represent the heat degree of the hot spot information in different dimensions, the propagation frequency represents the propagation width, and the propagation rate represents the propagation depth.
S304, sorting the weighted hot spot information according to the order of the hot spot values from large to small to obtain a hot spot information queue, and sequentially extracting a preset amount of target hot spot information from the beginning to the end of the hot spot information queue.
The step can sort the hot spot information according to the order of the hot spot values of the hot spot information from big to small, wherein the hot spot information in the front of the sorting is higher in hot spot degree, and the hot spot information in the back of the sorting is lower in hot spot degree.
After sorting, a predetermined number of hot spot information sorted in the front can be extracted from the hot spot information, namely, the hot spot information is sequentially extracted from the beginning to the end, so that a predetermined number of target hot spot information is extracted, excessive hot spot information is prevented from being selected, and the pushed content is simplified.
In one embodiment, as shown in fig. 4, the step S103 includes steps S401 to S403:
s401, constructing an event extraction model, and training and testing the event extraction model by adopting an information sample;
in the embodiment of the invention, an event extraction model is constructed firstly, and the event extraction model is used for carrying out element identification and extraction on the hot spot information.
The embodiment of the invention firstly adopts the information sample to train and test the event extraction model. Before training, elements of an information sample are calibrated, then the event extraction model is used for identifying and extracting, the identified and extracted elements are compared with the calibrated elements, the event extraction model is optimized by adopting a loss function, the event extraction model achieves a convergence effect, and finally the trained event extraction model is tested, and the online test can be realized after the test requirements are met.
S402, text extraction and segmentation are carried out on the hot spot information by adopting the event extraction model to obtain elements of the hot spot information;
in this step, the trained event extraction model is used to identify and extract the predetermined amount of hot spot information, the identification and extraction process is similar to the training process, except that the target processed in this step is the actual hot spot information.
The elements may be 5W1H, 5W referring to reason (why Why), object (Who When), place (Where When), time (When), person (Who), respectively, Where 1H refers to method (How).
And S403, carrying out sequence marking on the extracted elements, then executing a sentence sequencing task, and generating an abstract express delivery matched with the interest content type of the user.
In the step, the extracted elements are subjected to core abstract extraction, and the purpose of the core abstract extraction is to further extract key information from the elements, so that the information content is further simplified, and a user can know news general information without spending too much time. The result of performing the core abstract extraction is abstract express delivery matched with the interest content type of the user.
The abstract quick delivery of the embodiment of the invention can be divided into single-document abstract and multi-document abstract according to the input type. A single document digest generates a digest from a given one of the documents, and a multiple document digest generates a digest from a given set of topic-related documents.
The abstract quick delivery in the embodiment of the invention can be generated in an extraction type abstract mode, namely, keywords and key sentences are directly selected from the original text to form the abstract. The method has low error rate in grammar and syntax and ensures certain effect.
In the process of extracting and executing the core abstract in the step, the elements can be labeled in sequence first, and then a sentence sequencing task is executed.
Before the sequence tagging is performed, word segmentation operation may be performed, for example, a maximum matching word segmentation method based on a dictionary, a word sequence tagging-based method, or a transfer-based word segmentation method may be used.
The sequence labeling process mainly comprises the steps of part-of-speech labeling, semantic role labeling, information extraction and information integration.
Part-of-speech tagging refers to a given sentence and assigns a category to each word in the sentence, where the category may be a noun, a verb, an adjective, etc.
The information extraction is to process unstructured/semi-structured text input (such as news web pages, commodity pages, micro blogs, forum pages, and the like), extract various structured information such as entities, relationships, commodity records, lists, attributes, and the like, and integrate the information at different levels, and the integration mode may be, for example, knowledge duplication removal, knowledge linking, knowledge system construction, and the like. The information extraction is mainly completed through the steps of named entity identification, relation extraction, event extraction and information integration.
Named entity recognition is the task of recognizing entities in a specified category in text, mainly including names of people, places, organizations, proper nouns, and the like. Named entity recognition consists of two parts: the method comprises the steps of entity boundary identification and entity classification, wherein the entity boundary identification is used for judging whether a character string is an entity, and the entity classification is used for classifying the identified entities into different types which are given in advance.
Relationship extraction is the detection and recognition of semantic relationships between entities in text and the linking of "mentions" that represent the same semantic relationship. The output of the relationship extraction is a triple (entity 1, relationship class, entity 2) indicating that there is a semantic relationship of a particular class between entity 1 and entity 2. The relationship category can be preset, and can be automatically discovered according to the requirement (namely, according to the extraction of the open domain information). The relationship extraction contains two core components: the method comprises the following steps of relation detection and relation classification, wherein the relation detection judges whether a semantic relation exists between two entities, and the relation classification divides entity pairs with the semantic relation into pre-specified categories. Under certain scenarios and tasks, relationship discovery may also be included, with the primary purpose of discovering entities and semantic relationship categories that exist between entities.
Event extraction is the extraction of event information from unstructured text and its presentation in structured form. Event extraction includes event type identification and event element population. Event type identification is used to determine whether a sentence expresses an event of a particular type. The event type determines the template of the event representation, with different types of events having different templates. Event elements are key elements forming an event, and event element filling is to extract corresponding elements according to the event template to which the elements belong and label the elements with element labels.
Since entities, relationships, and events each represent information of different granularity in a single text. In the embodiment of the invention, information from different data sources and different texts needs to be integrated for decision making, and the embodiment of the invention adopts an information integration method to finish information clustering, which specifically comprises coreference resolution and entity linking. The process of coreference resolution is to detect and link together different mentions of the same entity/relationship/event. An entity link is a real-world entity to which a determined entity name points.
In general, the sequence annotation marks a 0 and 1 label for each sentence representation, and is different from the marking, the sentence ordering task is to output the probability of whether each sentence is a summary sentence or not for each sentence, and finally, a plurality of sentences are selected as the final summary according to the probability.
In the embodiment of the invention, the generation of the abstract express delivery can be completed by combining a Seq2Seq model and reinforcement learning on the basis of sequence labeling. Specifically, a sentence compression model is learned by using the Seq2Seq, the sentence compression model is used for measuring the quality of a selected sentence, and the model training is completed by combining reinforcement learning. In this embodiment, instead of training the model by calculating the loss at the label level using the sequence labeling method, sequence labeling is used as an intermediate step. After the probability distribution of the sequence labels is obtained, a candidate abstract set is sampled from the sequence labels, and loss is calculated by comparing the candidate abstract set with the standard abstract, so that information in the standard abstract can be better utilized.
In an embodiment, the method for mining and previewing hotspot information further includes:
constructing a user portrait according to the interest content type of the user;
cross-comparing the attribute characteristics of the user portrait with the content tags of the hot spot information to obtain the matched hot spot information;
and performing element identification and extraction on the hot spot information, and generating a summary express matched with the interest content type of the user.
In the embodiment, for the construction of the user portrait, based on a large amount of content classification understanding, semantic association events, association knowledge and the like, user reading preference data is introduced to obtain user attribute characteristics, and the user portrait is constructed according to the user attribute characteristics.
And then, performing real-time hot spot mining according to the attribute characteristics to obtain matched hot spot information, and performing element identification and extraction to generate abstract express delivery. Reference is made to the method of the preceding embodiment with respect to the generation of element identification and extraction and summary courier. Different from the foregoing embodiments, in the present embodiment, hot news mining and recommendation are performed in combination with user images, and user attribute features are fused. For example, entering a suitable female reading may automatically combine the information reporting schemes: the current hot information content in the fields of policy, fashion, child care, beauty treatment and the like.
In an embodiment, the method for mining and previewing hotspot information further includes:
establishing a characteristic library of the appointed date, date-associated historical events, associated knowledge and associated social phenomena based on the appointed date with appointed significance;
when the current date is the appointed date, extracting date features from the feature library, and mining hotspot information matched with the date features according to the date features;
and performing element identification and extraction on the hot spot information, and generating a summary express matched with the interest content type of the user.
In this step, a characteristic library of related historical events, related knowledge and related social phenomena of the specified date and the date can be established based on common festivals or characteristic meaning dates at home and abroad. After the current date reaches the appointed date, date characteristics can be extracted from the characteristic library, the date characteristics are used for selecting the corresponding time and scene content, and an information reporting scheme is automatically combined to generate abstract express delivery. For example, entering the spring festival, the information reporting schemes can be automatically combined: spring shipping ticketing schedules, travel scenic spot dynamics, social gift delivery hot trends, on-time living general knowledge/skills, and the like. Unlike the previous embodiment, in the present embodiment, the hotspot mining is performed based on the date, and the mined hotspot information is the content related to the date, and the method of the previous embodiment can be referred to for the processes of element identification and extraction and summary delivery.
Referring to fig. 5, fig. 5 is a schematic block diagram of a hotspot information mining and previewing apparatus according to an embodiment of the present invention, wherein the hotspot information mining and previewing apparatus 500 includes:
the search tracking unit 501 is configured to construct a news topic identification classifier according to a pre-acquired interest content type of a user, and acquire hot spot information matched with the interest content type of the user by using the news topic identification classifier;
a sorting unit 502, configured to extract high-heat target hotspot information from the hotspot information according to the propagation frequency and the propagation rate of the hotspot information;
and an aggregation unit 503, configured to perform element extraction on the target hotspot information according to a pre-trained event extraction model, and perform core abstract extraction on the extracted elements to generate an abstract express matching the interest content type of the user.
In one embodiment, as shown in fig. 6, the search tracking unit 501 includes:
a classifier construction unit 601, configured to obtain an information sample set matching the interest content type of the user in advance, and construct the news topic identification classifier by using naive bayes classifier classification training;
a dynamic tracking unit 602, configured to track hot spot information propagation dynamics in a vertical field by using the news topic identification classifier;
the weighted synthesis unit 603 is configured to determine the number of each keyword in the hotspot information and the domain to which each keyword belongs, perform weighted synthesis on each keyword in the hotspot information according to a preset domain weight to obtain a domain value with the highest score of the hotspot information, and determine the hotspot information matched with the interest content type of the user according to the domain value.
In one embodiment, as shown in fig. 7, the sorting unit 502 includes:
a propagation frequency calculation unit 701, configured to obtain a total network propagation amount and a number of comments of the hotspot information, and determine a propagation frequency of the hotspot information according to the total network propagation amount and the number of comments;
a propagation rate calculating unit 702, configured to obtain a propagation concurrency amount of the hotspot information in unit time, and determine a propagation rate of the hotspot information according to the propagation concurrency amount;
the weighting unit 703 is configured to weight the propagation depth and the propagation width of the hotspot information according to the weights of the propagation frequency and the propagation rate of the hotspot information, so as to obtain a hotspot value of each hotspot information;
the screening unit 704 is configured to sort the weighted hot spot information according to a sequence from a large hot spot value to a small hot spot value to obtain a hot spot information queue, and sequentially extract a predetermined number of target hot spot information from a beginning to an end in the hot spot information queue.
In one embodiment, as shown in fig. 8, the aggregation unit 503 includes:
the training and testing unit 801 is used for constructing an event extraction model, and training and testing the event extraction model by adopting an information sample;
an extracting and dividing unit 802, configured to perform text extraction and division on the hotspot information by using the event extraction model to obtain elements of the hotspot information;
and a labeling and sorting unit 803, configured to perform sequence labeling on the extracted elements, then execute a sentence sorting task, and generate a summary express matching the interest content type of the user.
In one embodiment, the sequence labels include part-of-speech labels, semantic role labels, information extraction, and information integration.
In one embodiment, the hotspot information mining and previewing device 500 further comprises:
the portrait construction unit is used for constructing a user portrait according to the interest content type of the user;
the image matching unit is used for performing cross comparison on the attribute characteristics of the user image and the content tag of the hotspot information to acquire the matched hotspot information;
and the image mining unit is used for performing element identification and extraction on the hotspot information and generating abstract delivery matched with the interest content type of the user.
In one embodiment, the hotspot information mining and previewing device 500 further comprises:
the date establishing unit is used for establishing a characteristic library of the appointed date, the date-associated historical event, the associated knowledge and the associated social phenomenon based on the appointed date with the appointed significance;
the date matching unit is used for extracting date features from the feature library when the current date is the specified date, and mining hotspot information matched with the date features according to the date features;
and the date mining unit is used for performing element identification and extraction on the hotspot information and generating abstract delivery matched with the interest content type of the user.
According to the device provided by the embodiment of the invention, the hotspot information which accords with the attention of the user is screened from the mass information according to the personalized requirements of the user, and the corresponding abstract is generated for quick passing, so that the user can conveniently and quickly obtain valuable information.
The hotspot information mining and previewing apparatus 500 may be implemented in the form of a computer program, which may run on a computer device as shown in fig. 9.
Referring to fig. 9, fig. 9 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 900 is a server, which may be an independent server or a server cluster composed of a plurality of servers.
Referring to fig. 9, the computer device 900 includes a processor 902, memory, and a network interface 905 connected by a system bus 901, where the memory may include a non-volatile storage medium 903 and an internal memory 904.
The non-volatile storage medium 903 may store an operating system 9031 and a computer program 9032. The computer program 9032, when executed, causes the processor 902 to perform a hotspot information mining and previewing method.
The processor 902 is used to provide computing and control capabilities, supporting the operation of the overall computer device 900.
The internal memory 904 provides an environment for running a computer program 9032 in the non-volatile storage medium 903, and the computer program 9032 when executed by the processor 902 causes the processor 902 to perform the hotspot information mining and previewing method.
The network interface 905 is used for network communication, such as providing data information transmission. Those skilled in the art will appreciate that the configuration shown in fig. 9 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing device 900 to which aspects of the present invention may be applied, and that a particular computing device 900 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor 902 is configured to run a computer program 9032 stored in the memory to implement the following functions: constructing a news theme recognition classifier according to the type of the interest content of the user, which is acquired in advance, and acquiring hot information matched with the type of the interest content of the user by using the news theme recognition classifier; extracting high-heat target hotspot information from the hotspot information according to the propagation frequency and the propagation rate of the hotspot information; and extracting elements of the target hotspot information according to a pre-trained event extraction model, and performing core abstract extraction on the extracted elements to generate abstract express delivery matched with the interest content types of the users.
Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 9 does not constitute a limitation on the specific construction of the computer device, and that in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 9, and are not described herein again.
It should be understood that, in the embodiment of the present invention, the Processor 902 may be a Central Processing Unit (CPU), and the Processor 902 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer-readable storage medium stores a computer program, wherein the computer program when executed by a processor implements the steps of: constructing a news theme recognition classifier according to the type of the interest content of the user, which is acquired in advance, and acquiring hot information matched with the type of the interest content of the user by using the news theme recognition classifier; extracting high-heat target hotspot information from the hotspot information according to the propagation frequency and the propagation rate of the hotspot information; and extracting elements of the target hotspot information according to a pre-trained event extraction model, and performing core abstract extraction on the extracted elements to generate abstract express delivery matched with the interest content types of the users.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method for mining and previewing hotspot information is characterized by comprising the following steps:
constructing a news theme recognition classifier according to the type of the interest content of the user, which is acquired in advance, and acquiring hot information matched with the type of the interest content of the user by using the news theme recognition classifier;
extracting high-heat target hotspot information from the hotspot information according to the propagation frequency and the propagation rate of the hotspot information;
and extracting elements of the target hotspot information according to a pre-trained event extraction model, and performing core abstract extraction on the extracted elements to generate abstract express delivery matched with the interest content types of the users.
2. The method for mining and previewing the hot spot information according to claim 1, wherein the step of constructing a news topic identification classifier according to the type of the user's interest content acquired in advance, and acquiring the hot spot information matched with the type of the user's interest content by using the news topic identification classifier comprises the steps of:
acquiring an information sample set matched with the interest content type of the user in advance, and constructing the news topic identification classifier by adopting naive Bayes classifier classification training;
tracking hot spot information propagation dynamics in a vertical field by using the news topic identification classifier;
determining the number of keywords in the hotspot information and the field to which each keyword belongs, performing weighted synthesis on each keyword in the hotspot information according to preset field weight to obtain a field value with the highest score of the hotspot information, and determining the hotspot information matched with the interest content type of the user according to the field value.
3. The method for mining and previewing hotspot information according to claim 1, wherein the extracting high-heat target hotspot information from the hotspot information according to the propagation frequency and propagation rate of the hotspot information comprises:
acquiring the network propagation total amount and the number of comments of the hotspot information, and determining the propagation frequency of the hotspot information according to the network propagation total amount and the number of comments;
acquiring the propagation concurrency of the hotspot information in unit time, and determining the propagation rate of the hotspot information according to the propagation concurrency;
weighting the propagation depth and the propagation width of the hotspot information according to the propagation frequency and the propagation rate weight of the hotspot information to obtain a hotspot value of each hotspot information;
and sequencing the weighted hot spot information according to the sequence of hot spot values from large to small to obtain a hot spot information queue, and sequentially extracting a preset amount of target hot spot information from the beginning to the end of the hot spot information queue.
4. The method for mining and previewing hotspot information according to claim 1, wherein said extracting elements from the target hotspot information according to a pre-trained event extraction model, and performing core summarization on the extracted elements to generate a summary express matching with the interest content type of the user comprises:
constructing an event extraction model, and training and testing the event extraction model by adopting an information sample;
adopting the event extraction model to extract and divide the text of the hot spot information to obtain the elements of the hot spot information;
and carrying out sequence marking on the extracted elements, then executing a sentence sequencing task, and generating a summary express delivery matched with the interest content type of the user.
5. The method for mining and previewing hot spot information according to claim 4, wherein said sequence labels include part-of-speech labels, semantic role labels, information extraction and information integration.
6. The method for mining and previewing hot spot information according to claim 1, further comprising:
constructing a user portrait according to the interest content type of the user;
cross-comparing the attribute characteristics of the user portrait with the content tags of the hot spot information to obtain the matched hot spot information;
and performing element identification and extraction on the hot spot information, and generating a summary express matched with the interest content type of the user.
7. The method for mining and previewing hot spot information according to claim 1, further comprising:
establishing a characteristic library of the appointed date, date-associated historical events, associated knowledge and associated social phenomena based on the appointed date with appointed significance;
when the current date is the appointed date, extracting date features from the feature library, and mining hotspot information matched with the date features according to the date features;
and performing element identification and extraction on the hot spot information, and generating a summary express matched with the interest content type of the user.
8. An apparatus for mining and previewing hotspot information, comprising:
the searching and tracking unit is used for constructing a news theme recognition classifier according to the type of the interest content of the user, which is acquired in advance, and acquiring hot spot information matched with the type of the interest content of the user by using the news theme recognition classifier;
the sorting unit is used for extracting high-heat target hotspot information from the hotspot information according to the propagation frequency and the propagation rate of the hotspot information;
and the aggregation unit is used for extracting elements of the target hotspot information according to a pre-trained event extraction model, and performing core abstract extraction on the extracted elements to generate abstract express delivery matched with the interest content type of the user.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the hotspot information mining and previewing method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to execute the hotspot information mining and previewing method of any one of claims 1 to 7.
CN202011189110.9A 2020-10-30 2020-10-30 Hot spot information mining and previewing method and device, computer equipment and storage medium Active CN112307336B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011189110.9A CN112307336B (en) 2020-10-30 2020-10-30 Hot spot information mining and previewing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011189110.9A CN112307336B (en) 2020-10-30 2020-10-30 Hot spot information mining and previewing method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112307336A true CN112307336A (en) 2021-02-02
CN112307336B CN112307336B (en) 2024-04-16

Family

ID=74332542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011189110.9A Active CN112307336B (en) 2020-10-30 2020-10-30 Hot spot information mining and previewing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112307336B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836110A (en) * 2021-02-07 2021-05-25 四川封面传媒有限责任公司 Hotspot information mining method and device, computer equipment and storage medium
CN113158671A (en) * 2021-03-25 2021-07-23 胡明昊 Open domain information extraction method combining named entity recognition
CN113407842A (en) * 2021-06-28 2021-09-17 携程旅游信息技术(上海)有限公司 Model training method, method and system for obtaining theme recommendation reason and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN104657496A (en) * 2015-03-09 2015-05-27 杭州朗和科技有限公司 Method and equipment for calculating information hot value
CN107992531A (en) * 2017-11-21 2018-05-04 吉浦斯信息咨询(深圳)有限公司 News personalization intelligent recommendation method and system based on deep learning
CN109033074A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 News in brief generation method, device, equipment and computer-readable medium
CN109800350A (en) * 2018-12-21 2019-05-24 中国电子科技集团公司信息科学研究院 A kind of Personalize News recommended method and system, storage medium
US20190188329A1 (en) * 2017-12-15 2019-06-20 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and device for generating briefing
CN110489542A (en) * 2019-08-10 2019-11-22 刘莎 A kind of auto-abstracting method of internet web page and text information
CN111241410A (en) * 2020-01-22 2020-06-05 深圳司南数据服务有限公司 Industry news recommendation method and terminal
CN111753197A (en) * 2020-06-18 2020-10-09 达而观信息科技(上海)有限公司 News element extraction method and device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077190A (en) * 2012-12-20 2013-05-01 人民搜索网络股份公司 Hot event ranking method based on order learning technology
CN104657496A (en) * 2015-03-09 2015-05-27 杭州朗和科技有限公司 Method and equipment for calculating information hot value
CN107992531A (en) * 2017-11-21 2018-05-04 吉浦斯信息咨询(深圳)有限公司 News personalization intelligent recommendation method and system based on deep learning
US20190188329A1 (en) * 2017-12-15 2019-06-20 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and device for generating briefing
CN109033074A (en) * 2018-06-29 2018-12-18 北京百度网讯科技有限公司 News in brief generation method, device, equipment and computer-readable medium
CN109800350A (en) * 2018-12-21 2019-05-24 中国电子科技集团公司信息科学研究院 A kind of Personalize News recommended method and system, storage medium
CN110489542A (en) * 2019-08-10 2019-11-22 刘莎 A kind of auto-abstracting method of internet web page and text information
CN111241410A (en) * 2020-01-22 2020-06-05 深圳司南数据服务有限公司 Industry news recommendation method and terminal
CN111753197A (en) * 2020-06-18 2020-10-09 达而观信息科技(上海)有限公司 News element extraction method and device, computer equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836110A (en) * 2021-02-07 2021-05-25 四川封面传媒有限责任公司 Hotspot information mining method and device, computer equipment and storage medium
CN112836110B (en) * 2021-02-07 2022-09-16 四川封面传媒有限责任公司 Hotspot information mining method and device, computer equipment and storage medium
CN113158671A (en) * 2021-03-25 2021-07-23 胡明昊 Open domain information extraction method combining named entity recognition
CN113158671B (en) * 2021-03-25 2023-08-11 胡明昊 Open domain information extraction method combined with named entity identification
CN113407842A (en) * 2021-06-28 2021-09-17 携程旅游信息技术(上海)有限公司 Model training method, method and system for obtaining theme recommendation reason and electronic equipment
CN113407842B (en) * 2021-06-28 2024-03-22 携程旅游信息技术(上海)有限公司 Model training method, theme recommendation reason acquisition method and system and electronic equipment

Also Published As

Publication number Publication date
CN112307336B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
Kausar et al. A sentiment polarity categorization technique for online product reviews
Zhao et al. Automatic detection of cyberbullying on social networks based on bullying features
Wu et al. Flame: A probabilistic model combining aspect based opinion mining and collaborative filtering
Hassan et al. Beyond DCG: user behavior as a predictor of a successful search
Rubin et al. Veracity roadmap: Is big data objective, truthful and credible?
Smeureanu et al. Applying supervised opinion mining techniques on online user reviews
WO2016179938A1 (en) Method and device for question recommendation
US20180366013A1 (en) System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
US20080319973A1 (en) Recommending content using discriminatively trained document similarity
US20090300046A1 (en) Method and system for document classification based on document structure and written style
US11023503B2 (en) Suggesting text in an electronic document
CN112307336B (en) Hot spot information mining and previewing method and device, computer equipment and storage medium
Banjar et al. Aspect-Based Sentiment Analysis for Polarity Estimation of Customer Reviews on Twitter.
George et al. Comparison of LDA and NMF topic modeling techniques for restaurant reviews
CN107908649B (en) Text classification control method
CN109254993B (en) Text-based character data analysis method and system
Chin Knowledge transfer: what, how, and why
CN114328895A (en) News abstract generation method and device and computer equipment
Lipka Modeling Non-Standard Text Classification Tasks
Gao et al. Deep learning based network news text classification system
Mason An n-gram based approach to the automatic classification of web pages by genre
KR102623256B1 (en) Learning content recommendation method
KR102625347B1 (en) A method for extracting food menu nouns using parts of speech such as verbs and adjectives, a method for updating a food dictionary using the same, and a system for the same
Rezaei Author gender identification from text
Amezoug Inferring personal attributes based on different modalities on social networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant