CN113127576A - Hotspot discovery method and system based on user content consumption analysis - Google Patents

Hotspot discovery method and system based on user content consumption analysis Download PDF

Info

Publication number
CN113127576A
CN113127576A CN202110405034.9A CN202110405034A CN113127576A CN 113127576 A CN113127576 A CN 113127576A CN 202110405034 A CN202110405034 A CN 202110405034A CN 113127576 A CN113127576 A CN 113127576A
Authority
CN
China
Prior art keywords
data
hotspot
user
content
hot spot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110405034.9A
Other languages
Chinese (zh)
Other versions
CN113127576B (en
Inventor
王东星
李云辉
鄂佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weimeng Chuangke Network Technology China Co Ltd
Original Assignee
Weimeng Chuangke Network Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weimeng Chuangke Network Technology China Co Ltd filed Critical Weimeng Chuangke Network Technology China Co Ltd
Priority to CN202110405034.9A priority Critical patent/CN113127576B/en
Publication of CN113127576A publication Critical patent/CN113127576A/en
Application granted granted Critical
Publication of CN113127576B publication Critical patent/CN113127576B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Computational Linguistics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Bioethics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Economics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Game Theory and Decision Science (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a hotspot discovery method and system based on user content consumption analysis, which comprises the following steps: storing data of various information streams generated in a platform with an interactive function to form massive user consumption data; carrying out classification processing on mass user consumption data of the platform to form standby data, wherein the standby data comprises the following types: content data, user desensitization cold data, relationship data; and screening and associating the content data through the hot spot structural model obtained through training, classifying the content data with association, respectively calculating various content data to form respective hot spots, and respectively associating each hot spot with the user in the user desensitization cold data through the relational data. By analyzing and classifying mass content data, a huge content library is constructed, the real-time performance of the content data is guaranteed, the hot spot can be pre-judged at the beginning, and the whole period of the hot spot can be mastered.

Description

Hotspot discovery method and system based on user content consumption analysis
Technical Field
The invention relates to the field of internet, in particular to a hotspot discovery method and system based on user content consumption analysis.
Background
In the prior art, in the hot spot discovery of a big data platform, a large amount of texts of the big data platform are analyzed, so that the hot spot discovery is performed. In the process of implementing the invention, the applicant finds that at least the following problems exist in the prior art: text recognition is performed only by means of terms when analysis is performed, and therefore hot spot extraction is achieved. And no real-time effective content support exists, and once the terms are not updated timely, the initial opportunity of finding the hot spots is missed, so that the real-time performance of the hot spots cannot be guaranteed.
Disclosure of Invention
The embodiment of the invention provides a hotspot discovery method and system based on user content consumption analysis, which construct a huge content library by analyzing and classifying massive content data, ensure the real-time performance of the content data, ensure that hotspots can be pre-judged at the beginning and further control the whole hotspot cycle.
To achieve the above object, in one aspect, an embodiment of the present invention provides a hotspot discovery method based on user content consumption analysis, including:
storing data of various information streams generated in a platform with an interactive function to form massive user consumption data;
carrying out classification processing on mass user consumption data of the platform to form standby data, wherein the standby data comprises the following types: content data, user desensitization cold data, relationship data; the content data refers to data of consumer behavior actually generated on the platform by a user, the user desensitization cold data refers to multi-granularity user portrait data formed by a user portrait technology after cleaning and desensitization, and the relationship data refers to an interaction relationship between the user and the content;
and screening and associating the content data through the hot spot structural model obtained through training, classifying the content data with association, respectively calculating various content data to form respective hot spots, and respectively associating each hot spot with the user in the user desensitization cold data through the relational data.
In another aspect, an embodiment of the present invention provides a hotspot discovery system based on user content consumption analysis, including:
the data acquisition unit is used for storing data of various information streams generated in the platform with the interaction function to form mass user consumption data;
the data processing unit is used for carrying out classification processing on mass user consumption data of the platform to form standby data, and the standby data comprises the following types: content data, user desensitization cold data, relationship data; the content data refers to data of consumer behavior actually generated on the platform by a user, the user desensitization cold data refers to multi-granularity user portrait data formed by a user portrait technology after cleaning and desensitization, and the relationship data refers to an interaction relationship between the user and the content;
and the hot spot discovery unit is used for screening and associating the content data through the trained hot spot structural model, classifying the content data with association, respectively calculating various content data to form respective hot spots, and respectively associating each hot spot with the user in the user desensitization cold data through the relational data.
The technical scheme has the following beneficial effects: by analyzing and classifying mass content data, a huge content library is constructed, the real-time performance of the content data is guaranteed, the hot spot can be pre-judged at the beginning, and the whole period of the hot spot can be mastered.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a hotspot discovery method based on user content consumption analysis according to an embodiment of the present invention;
FIG. 2 is a block diagram of a hotspot discovery system based on user content consumption analysis according to an embodiment of the invention;
fig. 3 is a flowchart of another hotspot discovery method based on user content consumption analysis according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, in conjunction with the embodiment of the present invention, there is provided a hotspot discovery method based on user content consumption analysis, including:
s101: storing data of various information streams generated in a platform with an interactive function to form massive user consumption data;
s102: carrying out classification processing on mass user consumption data of the platform to form standby data, wherein the standby data comprises the following types: content data, user desensitization cold data, relationship data; the content data refers to data of consumer behavior actually generated on the platform by a user, the user desensitization cold data refers to multi-granularity user portrait data formed by a user portrait technology after cleaning and desensitization, and the relationship data refers to an interaction relationship between the user and the content;
s103: and screening and associating the content data through the hot spot structural model obtained through training, classifying the content data with association, respectively calculating various content data to form respective hot spots, and respectively associating each hot spot with the user in the user desensitization cold data through the relational data.
Preferably, the method further comprises the step of S104:
obtaining historical internet hotspot data, labeling content data corresponding to the historical internet hotspot data respectively, wherein the labels corresponding to the historical internet hotspot data comprise primary hotspot labels or secondary hotspot labels, and training the labeled content data through an algorithm model supporting information extraction to obtain a hotspot label structured model;
respectively carrying out theme marking on content data corresponding to historical hotspot data of each internet, thereby marking hotspot theme keywords of each content data, and training the content data marked with the hotspot theme keywords through a probability theme model to obtain a hotspot theme analysis model;
and taking the hotspot tag structural model and the hotspot topic analysis model as hotspot structural models.
Preferably, in step 103, the screening and associating content data through the hotspot structural model, classifying the content data with association, and calculating each type of content data to form a respective hotspot specifically includes:
s1031: determining the labels of the content data through a hotspot label structured model, and automatically screening the content data corresponding to the associated labels to be classified into one category;
s1032: and aiming at each type of screened content data, determining hot topic keywords of the screened content data through a hot topic analysis model, and determining corresponding hot spots according to the hot topic keywords, wherein the hot topic keywords comprise entity words which are identified from each type of content data through an entity identification model.
Preferably, the method further comprises the following steps:
s105: causality reasoning is carried out on each hotspot through a hotspot causality model, and whether each hotspot generates a new hotspot is judged;
s106: and monitoring whether each hotspot becomes a hot search topic of the platform or not through a real-time hotspot monitoring model.
Preferably, the method further comprises the following steps:
s107: the hotspot structural model is used for other platforms with interactive functions in the industry, historical information flow data are obtained from other platforms with interactive functions, historical hotspots are found from the obtained historical information flow data, hotspot subject keywords and tags corresponding to the historical hotspots are added into a hotspot library, and the hotspot library is a database for storing the hotspot subject keywords and tags in the industry.
As shown in fig. 2, in conjunction with an embodiment of the present invention, there is provided a hotspot discovery system based on user content consumption analysis, including:
the data acquisition unit 21 is configured to store data of various information streams generated in the platform with the interactive function, so as to form massive user consumption data;
the data processing unit 22 is configured to classify and process mass user consumption data of the platform to form standby data, where the standby data includes the following types: content data, user desensitization cold data, relationship data; the content data refers to data of consumer behavior actually generated on the platform by a user, the user desensitization cold data refers to multi-granularity user portrait data formed by a user portrait technology after cleaning and desensitization, and the relationship data refers to an interaction relationship between the user and the content;
the hot spot discovery unit 23 is configured to perform screening association on content data through the trained hot spot structured model, classify the content data having the association, calculate each type of content data to form a respective hot spot, and associate each hot spot with a user in the user desensitization cold data through the relationship data.
Preferably, the method further comprises the following steps:
the hotspot structural model establishing unit 24 is configured to acquire internet historical hotspot data, label the content data corresponding to each internet historical hotspot data, where the label corresponding to the internet historical hotspot data includes a primary hotspot label or a secondary hotspot label, and train the content data labeled with the label through an algorithm model supporting information extraction to obtain a hotspot label structural model;
respectively carrying out theme marking on content data corresponding to historical hotspot data of each internet, thereby marking hotspot theme keywords of each content data, and training the content data marked with the hotspot theme keywords through a probability theme model to obtain a hotspot theme analysis model;
and taking the hotspot tag structural model and the hotspot topic analysis model as hotspot structural models.
Preferably, the hot spot discovery unit 23 includes:
the hot tag determining subunit 231 is configured to determine tags of the content data through the hot tag structured model, and automatically screen out content data corresponding to the associated tags to be classified into one category;
and a hot spot determining subunit 232, configured to determine, according to each type of content data that is screened, a hot spot topic keyword of the content data through a hot spot topic analysis model, and determine a corresponding hot spot according to the hot spot topic keyword, where the hot spot topic keyword includes an entity word that is identified from each type of content data through an entity identification model.
Preferably, the method further comprises the following steps:
the hot spot reasoning unit 25 is used for performing causal reasoning on each hot spot through a hot spot causal model and judging whether each hot spot generates a new hot spot;
and the hotspot monitoring unit 26 is used for monitoring whether each hotspot becomes a hot search topic of the platform through a real-time hotspot monitoring model.
Preferably, the hotspot structural model further comprises:
the external service unit 27 is configured to apply the hotspot structural model to other platforms with an interactive function in the industry, acquire historical information stream data from other platforms with an interactive function, find historical hotspots from the acquired historical information stream data, add hotspot topic keywords and tags corresponding to each historical hotspot into a hotspot library, where the hotspot library is a database for storing the hotspot topic keywords and tags in the industry.
The beneficial effects obtained by the invention are as follows:
1. timeliness and magnitude of hot content
By analyzing and classifying mass content data, a huge content library is constructed, the real-time performance of the content data is guaranteed, the hot spot can be pre-judged at the beginning, and the whole period of the hot spot can be mastered.
2. The authenticity of the hotspot is strongly correlated with the user
The invention strongly associates the content data with the user data through the behavior data, ensures that the data and the user have real interactive behaviors such as forwarding, commenting, praise and the like, and is a hotspot really generated by the attention of the user and further promotion.
3. Trend tracking of hotspots
Through powerful data computation and modeling capabilities. The method realizes the trend monitoring of the hot spots, and completely controls the whole link of the hot spots, such as the occurrence, the rise, the fall and the disappearance. The trend of the hot spot can be reasonably predicted.
The above technical solutions of the embodiments of the present invention are described in detail below with reference to specific application examples, and reference may be made to the foregoing related descriptions for technical details that are not described in the implementation process.
The invention relates to a hotspot discovery system based on multi-granularity user content consumption analysis, which is used for solving the problems that no real-time effective content support exists during hotspot discovery, the initial opportunity of hotspot discovery is missed once terms are not updated timely, and the real-time performance of hotspots cannot be guaranteed.
Meanwhile, due to the fact that real-time effective sound volume data support is achieved, hot spots are easy to find, a feedback closed loop is formed, reasonable demonstration is conducted through data after the hot spots are found, and misleading of the found hot spots to a user is avoided; and the trend tracking of the hot spot can be carried out through the sound volume data, so that when the hot spot rises, when the hot spot falls can be judged. The entity words are identified so that the content is not just depended on word segmentation, and the high dependence on word segmentation performance and webpage information crawling is avoided; and in the case of low quality of the identified document content, the analyzed regional hotspot may have low confidence level. The method can be used for target classification, group portrait and accurate propagation technology research oriented to legal control publicity.
The invention starts from the real behaviors of the user, and performs summary analysis on the data from a plurality of channels, thereby not only ensuring the timeliness of finding the hot spot, but also ensuring the authenticity, and simultaneously performing evidence-based and hot spot trend analysis by using the trend data, and more accurately and more quickly assisting the user to perform analysis and prejudgment.
As shown in FIG. 3, the method and the system provided by the invention can be used for screening and associating the content data through the consumption behavior of the user, then calculating and outputting the hot spot theme through the model, and then continuously tracking the hot spot trend through the real-time update of the user consumption. The invention relates to a hotspot discovery method based on content consumption and an association method based on user consumption behaviors and content hotspots, which comprises the following specific steps:
1. information collection classification
The technology is based on a data engine for processing mass data, mainly comprises microblog information hot flow, relation flow, search flow and the like, and reasonably and efficiently stores and calculates the mass data of a plurality of channels. The data comprises content data and user desensitization data (which refers to legally available user data with user privacy removed) and an association between the two, and after processing, there are three types of data: content data, user data, relationship data. The most important is content data, which is data of consumption behaviors, such as praise, forwarding, comment and the like, of a user actually generated on an internet content platform such as a microblog. Therefore, a strong relationship exists between the content and the user, and the established relational database ensures authenticity, timeliness and strong relationship based on the consumption condition of the user to the content. And through a powerful real-time computing system, huge consumption behavior data can be counted, and the change trend of the hot spots is tracked. Therefore, the whole process of hot spot generation, rising, falling and disappearance is better controlled. Wherein, the user data is: cleaning desensitized user portrait data, such as 90 posterior, male; the relational data means: and interaction relation between the user and the content, such as browsing, approval and forwarding.
2. Sample data tagging
The automatic classification or prediction of samples is realized by means of model training, basic training data needs to be provided, in order to train a hotspot structural model, hotspot training corpus labeling needs to be performed on historical internet hotspot data (content data) manually, the hotspot structural model comprises a hotspot tag model, a first-level hotspot tag label, a second-level hotspot tag label and the like need to be labeled on the hotspot tag model obtained by training, the hotspot structural model further comprises a hotspot topic analysis model, and hotspot topic keywords need to be labeled on the hotspot topic analysis model obtained by training. The hotspot topic refers to the name of a hotspot.
3. Model training
The model training needs a training environment, an algorithm platform capable of supporting information extraction is constructed, and the algorithm model in the algorithm platform needs a label classification model realized by deep learning technologies such as CNN (convolutional neural network), RNN (cyclic neural network), DRN (deep residual error network), RBM (restricted Boltzmann machine) and the like. PLSA (probabilistic latent semantic analysis), LDA (latent Dirichlet distribution), and other probabilistic topic class models. HMM models, named entity recognition models based on dictionary methods, etc. And real-time hot spot monitoring and hot spot causal models such as an incremental clustering algorithm and a decision tree algorithm. Finally, several models were constructed:
(1) real-time hotspot monitoring and hotspot causal model
The background capability of the Sina microblog for tracking the hot topic in real time is utilized, the hot spot is generated, spread and developed to perform real-time tracking and updating, and whether the generated hot spot can become a microblog hot search topic or a hot topic is judged. And carrying out causal reasoning on the hot spots, and judging whether one hot spot can generate another corresponding hot spot. The hot topic is a hot topic which brings about extensive attention and discussion of the masses on the media.
(2) Content tag recognition topic analysis model
The method comprises the steps of marking content with text classification models such as TextCNN, fastText and Bert, training the models with the training data by using labeled content data, classifying given content, identifying the label of the content and the label of a user, and associating hotspots with the user.
And judging the theme of the content by using a theme analysis model, wherein the theme model comprises: LDA, and the like. In the topic model, the topic is a probability distribution taking all characters in the text as a support set, and represents the frequency of the characters appearing in the topic, namely, the characters with high relevance to the topic have higher probability of appearing. When the text has a plurality of subjects, the probability distribution of each subject includes all characters, but the value of one character in the probability distribution of different subjects is different. A topic model attempts to embody this feature of a document with a mathematical framework. The topic model automatically analyzes each document, counts words in the document, and judges which topics the current document contains and the proportion of each topic is according to the statistical information.
4. Hotspot structured model acquisition
Based on hot spot training corpus labeling, training a hot spot label classifier by means of algorithm models such as CNN (convolutional neural network), RNN (neural network) and the like provided by an algorithm platform, and acquiring hot spot label structural models such as a primary hot spot label classification model and a secondary hot spot label classification model.
And training an HMM (hidden Markov model), and obtaining a named entity recognition model capable of recognizing entity words such as names of people, place names, time and the like based on models such as a dictionary method.
And training topic models such as LDA (latent dirichlet allocation) and PLSA (partial least squares) to generate a hot content topic model (hot topic analysis model) for identifying hot topic features.
And establishing a hot spot causal model capable of analyzing and reasoning the causal analysis among the hot spots based on the semantic information of the hot spots, the statistical characteristics of the hot spot word co-occurrence, the similarity distance of the hot spots and other characteristics.
A query method is established based on an information retrieval technology, a vector space model is established, and meanwhile a real-time hotspot monitoring model is obtained by combining methods such as an incremental clustering algorithm, a decision tree algorithm, a windowing strategy and the like, so that tracking of hotspots with the same theme and discovery of new hotspots are realized.
The hot spot structural model obtained by training provides external services, and the external services are services for company organizations of various industries except the company, which have hot spot discovery requirements, and are used for identifying and structuring historical social hot spots of other companies and perfecting the construction of a propagation hot spot library. The hot spot library is established for establishing a hot spot subject keyword and various label systems for storing the industry, and the hot spot or historical hot spots which occur in real time can be quickly identified or associated with related labels.
The beneficial effects obtained by the invention are as follows:
1. timeliness and magnitude of hot content
By analyzing and classifying mass content data, a huge content library is constructed, the real-time performance of the content data is guaranteed, the hot spot can be pre-judged at the beginning, and the whole period of the hot spot can be mastered.
2. The authenticity of the hotspot is strongly correlated with the user
The invention strongly associates the content data with the user data through the behavior data, ensures that the data and the user have real interactive behaviors such as forwarding, commenting, praise and the like, and is a hotspot really generated by the attention of the user and further promotion.
3. Trend tracking of hotspots
Through powerful data computation and modeling capabilities. The method realizes the trend monitoring of the hot spots, and completely controls the whole link of the hot spots, such as the occurrence, the rise, the fall and the disappearance. The trend of the hot spot can be reasonably predicted.
It should be understood that the specific order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy presented.
In the foregoing detailed description, various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments of the subject matter require more features than are expressly recited in each claim. Rather, as the following claims reflect, invention lies in less than all features of a single disclosed embodiment. Thus, the following claims are hereby expressly incorporated into the detailed description, with each claim standing on its own as a separate preferred embodiment of the invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. To those skilled in the art; various modifications to these embodiments will be readily apparent, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
What has been described above includes examples of one or more embodiments. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the aforementioned embodiments, but one of ordinary skill in the art may recognize that many further combinations and permutations of various embodiments are possible. Accordingly, the embodiments described herein are intended to embrace all such alterations, modifications and variations that fall within the scope of the appended claims. Furthermore, to the extent that the term "includes" is used in either the detailed description or the claims, such term is intended to be inclusive in a manner similar to the term "comprising" as "comprising" is interpreted when employed as a transitional word in a claim. Furthermore, any use of the term "or" in the specification of the claims is intended to mean a "non-exclusive or".
Those of skill in the art will further appreciate that the various illustrative logical blocks, units, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, elements, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.
The various illustrative logical blocks, or elements, described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a user terminal. In the alternative, the processor and the storage medium may reside in different components in a user terminal.
In one or more exemplary designs, the functions described above in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media that facilitate transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media can include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store program code in the form of instructions or data structures and which can be read by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Additionally, any connection is properly termed a computer-readable medium, and, thus, is included if the software is transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wirelessly, e.g., infrared, radio, and microwave. Such discs (disk) and disks (disc) include compact disks, laser disks, optical disks, DVDs, floppy disks and blu-ray disks where disks usually reproduce data magnetically, while disks usually reproduce data optically with lasers. Combinations of the above may also be included in the computer-readable medium.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A hotspot discovery method based on user content consumption analysis is characterized by comprising the following steps:
storing data of various information streams generated in a platform with an interactive function to form massive user consumption data;
carrying out classification processing on mass user consumption data of the platform to form standby data, wherein the standby data comprises the following types: content data, user desensitization cold data, relationship data; the content data refers to data of consumer behavior actually generated on the platform by a user, the user desensitization cold data refers to multi-granularity user portrait data formed by a user portrait technology after cleaning and desensitization, and the relationship data refers to an interaction relationship between the user and the content;
and screening and associating the content data through the hot spot structural model obtained through training, classifying the content data with association, respectively calculating various content data to form respective hot spots, and respectively associating each hot spot with the user in the user desensitization cold data through the relational data.
2. The hotspot discovery method based on user content consumption analysis of claim 1, further comprising:
obtaining historical internet hotspot data, labeling content data corresponding to the historical internet hotspot data respectively, wherein the labels corresponding to the historical internet hotspot data comprise primary hotspot labels or secondary hotspot labels, and training the labeled content data through an algorithm model supporting information extraction to obtain a hotspot label structured model;
respectively carrying out theme marking on content data corresponding to historical hotspot data of each internet, thereby marking hotspot theme keywords of each content data, and training the content data marked with the hotspot theme keywords through a probability theme model to obtain a hotspot theme analysis model;
and taking the hotspot tag structural model and the hotspot topic analysis model as hotspot structural models.
3. The hotspot discovery method based on user content consumption analysis according to claim 2, wherein the screening and associating of the content data through the hotspot structural model, the classifying of the content data with association, and the respective computing of each type of content data to form respective hotspots specifically comprise:
determining the labels of the content data through a hotspot label structured model, and automatically screening the content data corresponding to the associated labels to be classified into one category;
and aiming at each type of screened content data, determining hot topic keywords of the screened content data through a hot topic analysis model, and determining corresponding hot spots according to the hot topic keywords, wherein the hot topic keywords comprise entity words which are identified from each type of content data through an entity identification model.
4. The hotspot discovery method based on user content consumption analysis of claim 2, further comprising:
causality reasoning is carried out on each hotspot through a hotspot causality model, and whether each hotspot generates a new hotspot is judged; and monitoring whether each hotspot becomes a hot search topic of the platform or not through a real-time hotspot monitoring model.
5. The hotspot discovery method based on user content consumption analysis of claim 1, further comprising:
the hotspot structural model is used for other platforms with interactive functions in the industry, historical information flow data are obtained from other platforms with interactive functions, historical hotspots are found from the obtained historical information flow data, hotspot subject keywords and tags corresponding to the historical hotspots are added into a hotspot library, and the hotspot library is a database for storing the hotspot subject keywords and tags in the industry.
6. A hotspot discovery system based on user content consumption analysis, comprising:
the data acquisition unit is used for storing data of various information streams generated in the platform with the interaction function to form mass user consumption data;
the data processing unit is used for carrying out classification processing on mass user consumption data of the platform to form standby data, and the standby data comprises the following types: content data, user desensitization cold data, relationship data; the content data refers to data of consumer behavior actually generated on the platform by a user, the user desensitization cold data refers to multi-granularity user portrait data formed by a user portrait technology after cleaning and desensitization, and the relationship data refers to an interaction relationship between the user and the content;
and the hot spot discovery unit is used for screening and associating the content data through the trained hot spot structural model, classifying the content data with association, respectively calculating various content data to form respective hot spots, and respectively associating each hot spot with the user in the user desensitization cold data through the relational data.
7. The hotspot discovery system based on user content consumption analysis of claim 6, further comprising:
the hot spot structured model building unit is used for obtaining internet historical hot spot data, labeling content data corresponding to the internet historical hot spot data respectively, wherein the labels corresponding to the internet historical hot spot data comprise primary hot spot labels or secondary hot spot labels, and training the labeled content data through an algorithm model supporting information extraction to obtain a hot spot label structured model; respectively carrying out theme marking on content data corresponding to historical hotspot data of each internet, thereby marking hotspot theme keywords of each content data, and training the content data marked with the hotspot theme keywords through a probability theme model to obtain a hotspot theme analysis model; and taking the hotspot tag structural model and the hotspot topic analysis model as hotspot structural models.
8. The system of claim 7, wherein the hotspot discovery unit comprises:
the hot spot label determining subunit is used for determining the labels of the content data through the hot spot label structural model and automatically screening the content data corresponding to the associated labels to be classified into one category;
and the hotspot determining subunit is used for determining a hotspot topic keyword of each type of screened content data through a hotspot topic analysis model, and determining a corresponding hotspot according to the hotspot topic keyword, wherein the hotspot topic keyword comprises an entity word which is identified from each type of content data through an entity identification model.
9. The hotspot discovery system based on user content consumption analysis of claim 7, further comprising:
the hot spot reasoning unit is used for performing causal reasoning on each hot spot through a hot spot causal model and judging whether each hot spot generates a new hot spot;
and the hot spot monitoring unit is used for monitoring whether each hot spot becomes a hot search topic of the platform or not through a real-time hot spot monitoring model.
10. The user content consumption analysis-based hotspot discovery system of claim 6, wherein the hotspot structural model further comprises:
and the external service unit is used for applying the hotspot structural model to other platforms with interactive functions in the industry, acquiring historical information flow data from other platforms with interactive functions, finding historical hotspots from the acquired historical information flow data, and adding hotspot subject keywords and tags corresponding to the historical hotspots into a hotspot library, wherein the hotspot library is a database for storing the hotspot subject keywords and tags in the industry.
CN202110405034.9A 2021-04-15 2021-04-15 Hot spot discovery method and system based on user content consumption analysis Active CN113127576B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110405034.9A CN113127576B (en) 2021-04-15 2021-04-15 Hot spot discovery method and system based on user content consumption analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110405034.9A CN113127576B (en) 2021-04-15 2021-04-15 Hot spot discovery method and system based on user content consumption analysis

Publications (2)

Publication Number Publication Date
CN113127576A true CN113127576A (en) 2021-07-16
CN113127576B CN113127576B (en) 2024-05-24

Family

ID=76776509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110405034.9A Active CN113127576B (en) 2021-04-15 2021-04-15 Hot spot discovery method and system based on user content consumption analysis

Country Status (1)

Country Link
CN (1) CN113127576B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281607A (en) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 Microblog hot topic analyzing method
CN104281882A (en) * 2014-09-16 2015-01-14 中国科学院信息工程研究所 Method and system for predicting social network information popularity on basis of user characteristics
CN106649380A (en) * 2015-11-02 2017-05-10 天脉聚源(北京)科技有限公司 Hot spot recommendation method and system based on tag
CN109086355A (en) * 2018-07-18 2018-12-25 北京航天云路有限公司 Hot spot association relationship analysis method and system based on theme of news word
CN109325171A (en) * 2018-08-08 2019-02-12 微梦创科网络科技(中国)有限公司 User interest analysis method and system based on domain knowledge
WO2019184217A1 (en) * 2018-03-26 2019-10-03 平安科技(深圳)有限公司 Hotspot event classification method and apparatus, and storage medium
CN110909230A (en) * 2019-11-27 2020-03-24 北京天元创新科技有限公司 Network hotspot analysis method and system
WO2020143156A1 (en) * 2019-01-11 2020-07-16 平安科技(深圳)有限公司 Hotspot video annotation processing method and apparatus, computer device and storage medium
CN112597280A (en) * 2020-12-28 2021-04-02 上海朝阳永续信息技术股份有限公司 Method for automatically discovering hot keywords and hot news

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281607A (en) * 2013-07-08 2015-01-14 上海锐英软件技术有限公司 Microblog hot topic analyzing method
CN104281882A (en) * 2014-09-16 2015-01-14 中国科学院信息工程研究所 Method and system for predicting social network information popularity on basis of user characteristics
CN106649380A (en) * 2015-11-02 2017-05-10 天脉聚源(北京)科技有限公司 Hot spot recommendation method and system based on tag
WO2019184217A1 (en) * 2018-03-26 2019-10-03 平安科技(深圳)有限公司 Hotspot event classification method and apparatus, and storage medium
CN109086355A (en) * 2018-07-18 2018-12-25 北京航天云路有限公司 Hot spot association relationship analysis method and system based on theme of news word
CN109325171A (en) * 2018-08-08 2019-02-12 微梦创科网络科技(中国)有限公司 User interest analysis method and system based on domain knowledge
WO2020143156A1 (en) * 2019-01-11 2020-07-16 平安科技(深圳)有限公司 Hotspot video annotation processing method and apparatus, computer device and storage medium
CN110909230A (en) * 2019-11-27 2020-03-24 北京天元创新科技有限公司 Network hotspot analysis method and system
CN112597280A (en) * 2020-12-28 2021-04-02 上海朝阳永续信息技术股份有限公司 Method for automatically discovering hot keywords and hot news

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘华真 等: "基于用户浏览行为的个性化推荐研究综述", 《计算机应用研究》, vol. 38, no. 8, pages 2268 - 2277 *
刘培磊 等: "基于词向量语义聚类的微博热点挖掘方法", 《计算机工程与科学》, no. 2, pages 127 - 133 *

Also Published As

Publication number Publication date
CN113127576B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CA3098802A1 (en) Systems and methods for generating a contextually and conversationally correct response to a query
US20130060769A1 (en) System and method for identifying social media interactions
KR102361597B1 (en) A program recording medium on which a program for labeling sentiment information in news articles using big data is recoded
KR20220074576A (en) A method and an apparatus for extracting new words based on deep learning to generate marketing knowledge graphs
WO2015084757A1 (en) Systems and methods for processing data stored in a database
KR102371505B1 (en) A program for labeling news articles using big data
KR102361596B1 (en) A method for labeling sentiment information in news articles using big data
AU2019290658B2 (en) Systems and methods for identifying and linking events in structured proceedings
KR102382681B1 (en) A program for labeling sentiment information in news articles using big data
CN110941713B (en) Self-optimizing financial information block classification method based on topic model
Sun et al. Text tendency analysis based on multi-granularity emotional chunks and integrated learning
KR20200127636A (en) A program recording midium for an automatic sentiment information labeling to news articles for providing sentiment information
CN113127576B (en) Hot spot discovery method and system based on user content consumption analysis
KR102228585B1 (en) An automatic sentiment information labeling method to news articles for providing sentiment information
KR102382226B1 (en) A device for labeling sentimental information in news articles
CN113688633A (en) Outline determination method and device
KR20220074572A (en) A method and an apparatus for extracting new words based on deep learning to generate marketing knowledge graphs
KR20200127651A (en) A program recording midium for an automatic sentiment information labeling to news articles for providing sentiment information
KR20200127670A (en) An apparatus for an automatic sentiment information labeling method to news articles for providing sentiment information
KR20200127589A (en) An apparatus for automatic sentiment information labeling to news articles
KR102625347B1 (en) A method for extracting food menu nouns using parts of speech such as verbs and adjectives, a method for updating a food dictionary using the same, and a system for the same
KR102361598B1 (en) A recording medium on which a program for labeling emotional information of an object requiring predictive analysis of emotion is recorded
Sangeetha et al. Fake News Detection System Using Multinomial Naïve Bayes Classifier
KR20230010956A (en) Method for determining investment indicator related with stock item and providing information of stock item based on artificial intelligence, and computing system performing the same
KR20210063883A (en) Computer program and recording medium for analyzing marketing information based on knowledge graphs supporting efficient classifying documents processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant