CN113656695A - Hot data generation method and device, data processing method and electronic equipment - Google Patents

Hot data generation method and device, data processing method and electronic equipment Download PDF

Info

Publication number
CN113656695A
CN113656695A CN202110948195.2A CN202110948195A CN113656695A CN 113656695 A CN113656695 A CN 113656695A CN 202110948195 A CN202110948195 A CN 202110948195A CN 113656695 A CN113656695 A CN 113656695A
Authority
CN
China
Prior art keywords
hot
topic
target
platform
topics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110948195.2A
Other languages
Chinese (zh)
Inventor
万国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202110948195.2A priority Critical patent/CN113656695A/en
Publication of CN113656695A publication Critical patent/CN113656695A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application relates to a hot spot data generation method, a hot spot data generation device, a data processing method and electronic equipment, wherein the hot spot data generation method comprises the following steps: the method comprises the steps of obtaining hot topics based on public opinion information, obtaining target resource content in a target platform, wherein the association degree of the target resource content and the hot topics meets a preset condition, and associating the hot topics and the target resource content corresponding to the hot topics to obtain hot data corresponding to the hot topics. Because the hot topics are obtained and associated with the target resource content, and then the topic data of the hot topics are generated, the topics of the platform are not only adapted to the topics concerned by the users in the platform, but also adapted to the hot topics concerned by the users outside the station, and therefore the hot topics of the whole network are reflected.

Description

Hot data generation method and device, data processing method and electronic equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a hot spot data generation method and apparatus, a data processing method, and an electronic device.
Background
With the advent of the internet era, a plurality of network media platforms are appeared, and the user group is distributed in different network media platforms nowadays. And each network media platform aggregates the hot spot information of the media to form the topic data thereof for the user to check.
Because each network media platform forms topic data according to the hot spot information in the media, the hot spot information in each network media platform is different, the formed topic data is also different and can only reflect the hot spot information in the network media platform, but the hot spot information of each current network media platform is not comprehensive and even shows a complementary trend, so that for a user in a single network media platform, the topic data looked up by the user may be different from the hot topics of the whole network, so that the user cannot know the hot topics in the whole network through the single network media platform, and the user experience is reduced.
Disclosure of Invention
In order to solve the problem that a single network media platform cannot find hot topics of the whole network in the related technology, the application provides a hot data generation method and device, a data processing method and electronic equipment.
According to a first aspect of the present application, a hotspot data generation method is provided, the method comprising:
acquiring hot topics based on public opinion information;
acquiring target resource content in a target platform, wherein the association degree of the target resource content and the hot topic meets a preset condition;
and associating the hot topic with the target resource content corresponding to the hot topic to obtain hot data corresponding to the hot topic.
In an optional embodiment, the obtaining target resource content in the target platform includes:
inputting the hot topics into a preset search engine, and acquiring initial resource content obtained by the search engine through searching from resource content of a target platform according to the hot topics;
aiming at any initial resource content, inputting the initial resource content and the hot topic into a pre-trained content relevance calculation model to obtain the relevance of the initial resource content and the hot topic;
and judging whether the association degree meets a preset condition, and if so, determining the initial resource content as the target resource content.
In an alternative embodiment, the content relevance computation model includes a text generation layer and a similarity computation layer;
the step of inputting the initial resource content and the hot topic into a pre-trained content relevance degree calculation model aiming at any initial resource content to obtain the relevance degree of the initial resource content and the hot topic comprises the following steps:
aiming at any initial resource content, inputting the hot topic and the initial resource content into the text generation layer, and acquiring a first text abstract corresponding to the hot topic and a second text abstract corresponding to the initial resource content;
inputting the first text abstract and the second text abstract into the similarity calculation layer to obtain the similarity of the first text abstract and the second text abstract;
determining the similarity of the determined first text abstract and the second text abstract as the association degree of the initial resource content and the hot topic.
In an optional embodiment, the determining whether the association degree satisfies a preset condition includes:
determining a target relevance threshold corresponding to a topic source platform of the hot topic according to a mapping relation between a preset platform and a relevance threshold;
and if the similarity is greater than or equal to the target association threshold, judging that the similarity meets a preset condition.
In an optional embodiment, the obtaining of the hot topic based on the public opinion information includes:
acquiring initial topics contained in public opinion information in an off-site platform, wherein the off-site platform is a platform except the target platform in a preset platform set;
inputting the initial topic into a pre-trained text classification model, and acquiring the probability that the initial topic belongs to a target type;
and if the probability is larger than a preset threshold value, determining the initial topic as a hot topic.
In an optional embodiment, the method further comprises:
for any target hot topic in all the hot topics, determining display position data corresponding to the target hot topic according to a preset display algorithm;
and displaying the hot spot data corresponding to the target hot spot topic according to the display position data.
In an optional implementation manner, the determining, according to a preset display algorithm, display position data corresponding to any target hot topic among all the hot topics includes:
for any target hot topic in all the hot topics, determining the number of the hot topics which belong to the same one off-site platform as the target hot topic from all the hot topics corresponding to the generated hot data;
determining a quality weight corresponding to the off-site platform corresponding to the target hot topic according to a preset mapping relation between the off-site platform and the quality weight;
and determining display position data of the target hot topic according to the quality weight and the quantity and a preset sorting algorithm.
In an optional embodiment, the determining, according to the quality weight and the number, display position data of the hot topic according to a preset sorting algorithm includes:
and multiplying the quantity by the quality weight to obtain display position data corresponding to the hot topic.
According to a second aspect of the present application, there is provided a data processing method, the method comprising:
capturing topics from a target network, and determining the captured topics as hot topics;
based on the method according to the first aspect of the present application, hot spot data corresponding to the hot spot topic is matched from an in-station resource;
and pushing the hot topic and the hot data corresponding to the hot topic to a client.
According to a third aspect of the present application, there is provided a data processing method, the method comprising:
responding to a page content acquisition request of client equipment, and acquiring the hot topics obtained by the method of the first aspect of the application and the hot data corresponding to the hot topics;
and sending the hot data corresponding to the hot topic to the client device.
According to a fourth aspect of the present application, there is provided a hotspot data generation device, which includes:
the acquisition module is used for acquiring hot topics based on public sentiment information;
the screening module is used for acquiring target resource content in the target platform, and the association degree of the target resource content and the hot topic meets a preset condition; (ii) a
And the association module is used for associating the hot topic with the target resource content corresponding to the hot topic to obtain the hot data corresponding to the hot topic.
According to a fifth aspect of the present application, there is provided an electronic apparatus, comprising: at least one processor and memory;
the processor is configured to execute the program stored in the memory to implement the method of the first, second or third aspect of the present application.
According to a sixth aspect of the present application, there is provided a storage medium, characterized in that the storage medium stores one or more programs which, when executed, implement the method of the first, second or third aspect of the present application.
The technical scheme provided by the application can comprise the following beneficial effects: the method comprises the steps of obtaining hot topics based on public opinion information, obtaining target resource content in a target platform, wherein the association degree of the target resource content and the hot topics meets a preset condition, and associating the hot topics and the target resource content corresponding to the hot topics to obtain hot data corresponding to the hot topics. Because the hot topics are obtained and associated with the target resource content, and then the hot data of the hot topics are generated, the topics of the platform are not only adapted to the topics concerned by the users in the platform, but also adapted to the hot topics concerned by the users outside the station, and therefore the hot data corresponding to the hot topics of the whole network are reflected.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic flowchart of a hot spot data generation method according to an embodiment of the present application;
fig. 2 is a schematic flowchart illustrating a process of obtaining a hot topic based on public opinion information according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of filtering target resource content according to an embodiment of the present application;
fig. 4 is a schematic flowchart of obtaining similarity between initial resource content and the hot topic according to an embodiment of the present application;
fig. 5 is a schematic flowchart of determining display location data corresponding to the hot topic according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a hot spot data generating apparatus according to another embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to another embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a hot spot data generating method according to an embodiment of the present disclosure.
As shown in fig. 1, the hot spot data generating method provided in this embodiment may include:
and S101, acquiring hot topics based on public sentiment information.
In this step, the public sentiment information is the public sentiment information collected in each platform in the preset platform set, wherein the preset platform set collects all mainstream platforms in the internet, and certainly, a user can add a new platform or delete some platforms to the preset platform set according to needs. In addition, the target platform is one of a preset set of platforms, the target platform needs to generate hotspot data, and the target platform can be marked in advance from the preset set of platforms by a user. Therefore, the scheme provided by the embodiment can be applied to any platform, and for any platform, the platform only needs to be set as a target platform.
In a specific example, the platforms in the preset platform set and whether the pre-marked platform is the target platform may be specifically shown in table 1.
Preset platform set Whether it is a target platform
Platform A Whether or not
Platform B Is that
Platform C Whether or not
Platform D Whether or not
TABLE 1
As can be seen from table 1, platform B is the target platform, and then platform a, platform C, and platform D are the off-site platforms.
In addition, referring to fig. 2, a process of specifically acquiring a hot topic based on public opinion information in an off-site platform or a target platform, where fig. 2 is a schematic flow diagram of acquiring a hot topic based on public opinion information according to an embodiment of the present application.
As shown in fig. 2, in this embodiment, taking the obtaining of the hot topic in the public opinion information in the off-site platform as an example, the process of obtaining the hot topic in the public opinion information in the target platform may refer to this embodiment, and the process of obtaining the hot topic based on the public opinion information provided in this embodiment may include:
step S201, obtaining an initial topic contained in public opinion information in an off-site platform, wherein the off-site platform is a platform except the target platform in a preset platform set.
In this step, the public sentiment information may include a topic list, the topic list refers to a list of topics sorted according to popularity in the off-site platform, the topic list generally includes a plurality of topics, and in this step, only the topics in the topic list need to be extracted as initial topics.
Specifically, the step can directly crawl topics from public opinion information contained in the platform outside the station as initial topics by utilizing a web crawler technology. Of course, an API interface may be set in the off-site platform in advance, and the toplist of the off-site platform may be directly acquired from the API interface.
Step S202, inputting the initial topic into a pre-trained text classification model, and acquiring the probability that the initial topic belongs to the target type.
It should be noted that the text classification model in this step may be a bert-based text classification model, and the bert-based text classification model combines a pre-training model and a downstream task model together, and supports a text classification task, so that in this embodiment, when training the text classification model, a pre-training model, such as a google-sourced pre-training model based on chinese wiki encyclopedia, may be selected, and the selected pre-training model is used as a pre-heating start model, and on this basis, fine-tuning (fine) is performed based on a small number of corpus samples, so that a very high classification accuracy may be achieved.
For the corpus samples required during training, a plurality of types of corpora can be selected, such as types of finance, real estate, stocks, education, science and technology, society, time administration, sports, games, entertainment and the like, and for each type, a certain number of corpus samples can be extracted, because the embodiment identifies the type of the topic, and the length of the title and the length of the topic are generally similar (for example, between 15 and 30 words), the embodiment only needs to take the title of the corpus sample and the corresponding type as the final training sample.
In order to verify whether the accuracy of the trained text classification model reaches the standard or not, the training samples can be divided into a training set and a testing set according to a certain proportion, the training samples in the training set are used for training the text classification model and adjusting the parameters of the model, and the training samples in the testing set are used for verifying the accuracy of the model after the parameters are adjusted in classifying the training samples in the testing set.
In this step, the target type is preset, that is, the type of the topic that the target platform needs to add, for example, the target platform is a platform that is mainly entertainment content, the target platform may need to pay attention to the topic of the entertainment type more, and therefore, the entertainment type may be set as the target type, and only whether the probability corresponding to the entertainment type satisfies the condition or not may be paid attention to.
In a specific example, such as a probability of 10 types of classification results of the text classification model, the text classification model outputs respective probabilities of the text with respect to 10 types according to the input text.
Taking the 10 types of finance, real estate, stock, education, science and technology, society, fashion, sports, games and entertainment as examples, after a certain initial topic is input into the text classification model, the probabilities of the types are finance 0.1, real estate 0.05, stock 0.1, education 0.25, science 0.08, society 0.9, fashion 0.3, sports 0.4, games 0.2 and entertainment 0.8.
In this example, if the target platform sets the entertainment type as the target type, then the probability corresponding to the entertainment type, that is, 0.8, may be obtained at this time.
Step S203, if the probability is larger than a preset threshold value, determining the initial topic as a hot topic.
In this embodiment, the condition for determining the initial topic as the hot topic may be that the probability of the target type acquired in step S202 needs to be greater than a preset threshold, and if the probability of the target type is greater than the preset threshold, the initial topic may be considered as belonging to the target type, and meets the requirement of the target platform, and at this time, the initial topic may be determined as the hot topic.
Of course, it may be encountered that the probability is greater than the preset threshold, but not the maximum probability among all the obtained probabilities, or the probability of 10 types is taken as an example, the entertainment type is set as the target type, the preset threshold is set to 0.6, then 0.8 is necessarily greater than 0.6, but there is 0.9 greater than 0.8 at the same time, that is, the probability that the initial topic belongs to the social type is 0.9, which is higher than the probability that the initial topic belongs to the entertainment type, and then the initial topic is considered to be more biased to the social type.
If the number of the hot topics currently acquired by the target platform exceeds a certain threshold value, the initial topics under the condition can be removed, so that the obtained hot topics are more suitable for the requirements of the target platform.
Specifically, for each initial topic, the output probability of each type may be stored, and when the number of hot topics currently acquired by the target platform has exceeded a certain threshold, for any initial topic determined as a hot topic in the stored content, if the probability of the target type is not the maximum among the probabilities of all types, the initial topic determined as a hot topic is removed.
Step S102, target resource content in a target platform is obtained, and the association degree of the target resource content and the hot topic meets a preset condition.
It should be noted that the resource content of the target platform may be text content, video content, or even audio content in the target platform, and since the hot topic is content in a text form, when the target resource content is screened from the resource content, similarity of text explanations or titles corresponding to the video content and the audio content may be compared, or pictures and sounds in the video content may be identified in advance to form corresponding text, sounds in the audio content may be identified in advance to form corresponding text, and then similarity with the hot topic is compared.
Specifically, in order to reduce the workload of similarity calculation, a preset search engine may be used to perform search, perform coarse screening on all resource contents, and then perform fine screening by using a content relevance calculation model trained in advance, which may specifically refer to fig. 3, where fig. 3 is a schematic flow diagram of screening target resource contents provided in an embodiment of the present application.
As shown in fig. 3, the process of filtering the target resource content provided by this embodiment may include:
step S301, inputting the hot topics into a preset search engine, and acquiring initial resource content obtained by the search engine according to the hot topics and searching from the resource content of the target platform.
In this step, the preset search engine may be a search engine owned by the target platform itself, and the hot topics are input into the search engine, and the search engine searches resource contents with keywords in the hot topics from all resource contents of the target platform to serve as initial resource contents.
It should be noted that the principle process of searching resource content by the search engine is not the focus of attention of the present application, and the search principle is not described herein again.
Step S302, aiming at any initial resource content, inputting the initial resource content and the hot topic into a pre-trained content relevance degree calculation model, and obtaining the relevance degree of the initial resource content and the hot topic.
In this step, the content relevance calculation model may include at least a text generation layer and a similarity calculation layer.
Since the grammar of the sentence or the front-back order of the words may have some influence on the calculation of the similarity, in order to improve the calculation accuracy of the semantic similarity of the two, the content association degree calculation model in this embodiment may first generate the first text abstract of the hot topic and the second text abstract of the initial resource content by using the text generation layer therein, and then calculate the similarity between the first text abstract and the second text abstract respectively and serve as the final association degree between the hot topic and the initial resource content.
Specifically, referring to fig. 4, fig. 4 is a schematic flowchart illustrating a process of obtaining similarity between initial resource content and a hot topic according to an embodiment of the present application.
As shown in fig. 4, the process of obtaining the similarity between the initial resource content and the hot topic provided by this embodiment may include:
step S401, aiming at any initial resource content, inputting a hot topic and the initial resource content into a text generation layer, and acquiring a first text abstract corresponding to the hot topic and a second text abstract corresponding to the initial resource content.
It should be noted that the text generation layer may be a Sequence to Sequence (seq) 2seq framework, and the text generation layer of the seq2seq framework may generate a summary of the text, that is, a short text capable of expressing the semantics of the original text to the greatest extent.
Of course, for the resource content of which the initial resource content is audio, the audio may be recognized as a text, and then the second text abstract is extracted from the recognized text; for the resource content of which the initial resource content is a video or an image, the content in the video or the image can be identified, a main body included in the content can be identified, and the main body can be used as a second text abstract.
Step S402, inputting the first text abstract and the second text abstract into a similarity calculation layer, and obtaining the similarity of the first text abstract and the second text abstract.
In this step, in order to fully consider the semantics of the sentence, the similarity calculation layer may map the first text abstract and the second text abstract into the form of embedding vectors, and then perform similarity calculation. The specific similarity calculation may use dot product calculation between mapped embedding vectors to obtain the similarity between the first text abstract and the second text abstract.
Step S403, determining the similarity between the first text abstract and the second text abstract as the association between the initial resource content and the hot topic.
It should be noted that, in a specific example, the scheme shown in fig. 4 may be implemented by using a simbert model, that is, the simbert model is used as the content relevance degree calculation model in this embodiment.
Step S303, judging whether the association degree meets a preset condition, and if so, determining the initial resource content as the target resource content.
In this step, the preset condition may be that the initial resource content is greater than a preset threshold, and as long as the similarity is greater than a preset threshold, the initial resource content may be considered to be related to the hot topic, and then the initial resource content may be determined as the target resource content.
Because topic quality in the platform outside the station of difference is uneven, to the platform outside the station of different topic quality, can set up different threshold values of predetermineeing, for example, to the better platform outside the station of topic quality, can set up higher threshold value of predetermineeing to make the more accurate target resource content of topic correlation of this platform outside the station, to the relatively poor platform outside the station of topic quality, can set up lower threshold value of predetermineeing, make the target resource content of topic correlation of this platform outside the station richer.
Specifically, the process of determining whether the similarity satisfies the preset condition may be as follows:
and determining a target association threshold corresponding to the off-site platform corresponding to the hot topic according to a preset mapping relation between the off-site platform and the association threshold, and if the association is greater than or equal to the target association threshold, judging that the association meets a preset condition.
In a specific example, the mapping relationship between the preset off-site platform and the association threshold may be as shown in table 2.
Platform outside station Threshold of degree of association
Platform A a
Platform C c
Platform D d
TABLE 2
For example, if the off-site platform corresponding to the hot topic is platform a, the target association threshold is a. If the association degree of the initial resource content and the hot topic is greater than a, the association degree of the initial resource content and the hot topic meets a preset condition.
Step S103, associating the hot topics and the target resource content corresponding to the hot topics to obtain hot data corresponding to the hot topics.
It should be noted that each resource content in the target platform has a corresponding identification ID, and in this step, when associating the hot topic and the target resource content corresponding to the hot topic, the identification ID of the hot topic and the target resource content are directly mapped.
In addition, in the association, the thermal value of each target resource content may be associated, and for any target resource content, the thermal value may be the number of the target resource content, such as the playing amount and the watching amount. It should be noted that the thermal value of the hot topic can also be obtained based on the thermal value of each target resource content, for example, the sum of the thermal values of all target resource contents corresponding to the hot topic is used as the thermal value of the hot topic.
In addition, the embodiment also determines the display position of the hot data, specifically, for any target hot topic in all the hot topics, the display position data corresponding to the target hot topic is determined according to a preset display algorithm; and displaying the hot spot data corresponding to the target hot spot topic according to the display position data.
In this step, specifically, referring to fig. 5, the display position data corresponding to the hot topic is determined according to a preset display algorithm, and fig. 5 is a schematic flow chart of determining the display position data corresponding to the hot topic according to an embodiment of the present application.
As shown in fig. 5, the process of determining the display position data corresponding to the hot topic provided by this embodiment may include:
step S501, determining the number of hot topics which belong to the same platform as the current hot topic from all the hot topics corresponding to the generated topic data.
It should be noted that the process of generating topic data in the present embodiment is periodic, and topic data of a plurality of hot topics from different off-site platforms are generated each time, so in this step, the topic data that has been generated refers to the topic data generated in this period.
In a specific example, for example, topic data of 6 hot topics has been generated, and the current hot topic is the 7 th, then the first 6 hot topics and the corresponding off-site platforms may be as shown in table 3.
Topic of hot Platform
Topic 1 Platform A
Topic 2 Platform C
Topic 3 Platform C
Topic 4 Platform A
Topic 5 Platform D
Topic 6 Platform D
TABLE 3
If the off-site platform corresponding to the current hot topic is the platform A, the hot topics belonging to the same platform A include topics 1 and topics 4, and the number of the topics is 2.
Step S502, determining the corresponding quality weight of the off-site platform corresponding to the hot topic according to the preset mapping relation between the off-site platform and the quality weight.
The quality weight in this step is used to scatter hot topics from each off-site platform, and for convenience of setting the quality weight, the quality weight may be set directly according to the topic quality.
In a specific example, the mapping relationship between the off-site platform and the quality weight can be as shown in table 4.
Platform outside station Quality weight
Platform A 1
Platform C 3
Platform D 5
TABLE 4
Still taking the foregoing example as an example, platform a is an off-site platform corresponding to the hot topic, and then the quality weight is 1 as can be seen from table 4.
And S503, determining display position data of the hot topic according to the quality weight and the quantity and a preset sorting algorithm.
Specifically, the preset sorting algorithm may be to multiply the number and the quality weight to obtain the display position data corresponding to the hot topic. The display position data is specifically a numerical value, and when all the target titles are displayed, the display can be performed according to the size of the numerical value.
In a specific example, the number and the quality weight are multiplied by 2 x 1 to 2. To more clearly illustrate the feasibility of generating the display position data, the display position data of 7 target data are calculated as shown in tables 3 and 4.
For topic 1, in the hot topic in which topic data of topic 1 has been generated before generating the topic data, the number of sibling platforms a is 0, the quality weight of platform a is 1, and then the product of the two is 0.
For topic 2, in the hot topic in which topic data of topic 2 has been generated before generating the topic data, the number of sibling platforms C is 0, the quality weight of platform a is 3, and then the product of the two is 0.
For topic 3, in the hot topic in which topic data of topic 3 has been generated before generating the topic data, the number of sibling platforms C is 1, the quality weight of platform a is 3, and then the product of the two is 3.
For topic 4, in the hot topic in which topic data of topic 4 has been generated before generating the topic data, the number of sibling platforms a is 1, the quality weight of platform a is 1, and then the product of the two is 1.
For topic 5, in the hot topic in which topic data of topic 5 has been generated before generating the topic data, the number of sibling platforms D is 0, the quality weight of platform a is 5, and then the product of the two is 0.
For topic 6, in the hot topic in which topic data of topic 6 has been generated before generating the topic data, the number of sibling platforms D is 1, the quality weight of platform a is 5, and then the product of the two is 5.
For the current hot topic, the product obtained by the foregoing is 2.
Therefore, the presentation position data of topic 1 is 0, topic 2 is 0, topic 3 is 3, topic 4 is 1, topic 5 is 0, topic 6 is 5, and the current hot topic is 2.
For the same numerical value, the sequence can be performed according to the sequence of the generated topic data, and the sequence in table 5 can be obtained after the sequence is performed according to the numerical value.
Serial number 1 2 3 4 5 6 7
Topic of hot Topic 1 Topic 2 Topic 5 Topic 4 Current topic Topic 3 Topic 6
Platform outside station Platform A Platform C Platform D Platform A Platform A Platform C Platform D
TABLE 5
In the ranking in table 3, there are two situations where topics with the same platform are adjacent, and after the ranking of the display position data obtained according to this embodiment, as shown in table 5, only topics with 1 place of the same platform are adjacent, and two topics adjacent before are all broken up.
It should be noted that, because the hot data generation of this embodiment is periodically generated, for each period, a version number may be set, for example, any time in the period is used as the version number, each version number corresponds to the topic data of all hot topics generated in one period, and the topic data of each period is mapped and stored according to the respective version number, which is convenient for obtaining the topic data of each period in the following.
In the embodiment, a hot topic is obtained based on public sentiment information, and then target resource content in a target platform is obtained, wherein the association degree of the target resource content and the hot topic meets a preset condition, and the hot topic and the target resource content corresponding to the hot topic are associated to obtain hot data corresponding to the hot topic. Because the hot topics are obtained and associated with the target resource content, topic data of the hot topics are generated, and thus the topics of the platform are not only adapted to the topics concerned by the users in the platform, but also adapted to the hot topics concerned by the users outside the station, and the hot topics of the whole network are reflected.
In addition, as for the hot spot data generation method provided in the foregoing embodiment, another embodiment of the present application provides a data processing method, which is used for actively pushing the hot spot data. The method specifically comprises the following steps:
step one, capturing topics from a target network, and determining the captured topics as hot topics.
Step two, based on the hot spot data generation method provided by the foregoing embodiment, the hot spot data corresponding to the hot spot topic is matched from the resource in the station.
And step three, pushing the hot topics and the hot data corresponding to the hot topics to a client.
In this embodiment, the target network refers to any network environment, such as the internet, each local area network, and the like, and the topics captured from the internet or the local area network may be used as hot topics, and then hot data corresponding to the hot topics is generated by using the hot data generation method provided in the foregoing embodiment, and the hot topics and the hot data are pushed to the client. The specific application scene of the method can be an advertisement pushing scene, an interest data pushing scene and other information pushing scenes.
Another embodiment of the present application provides a data processing method, which is used for displaying hot spot data in a page. The method specifically comprises the following steps:
step one, in response to a page content obtaining request of a client device, obtaining a hot topic obtained by the hot data generating method provided by the foregoing embodiment and hot data corresponding to the hot topic.
Step two, the hotspot data corresponding to the hotspot topics are sent to the client device.
In this embodiment, when the client device has a page content acquisition request, the hot data generation method provided in the foregoing embodiment may be executed to acquire the obtained hot topic and the hot data corresponding to the hot topic. And finally, sending the hot data corresponding to the hot topic to the client equipment for display.
In a specific scenario, the page content obtaining request of the client device may be a content obtaining request sent by a homepage recommending content when an APP in the client device is opened, or a content obtaining request sent by clicking a certain sub-level page, such as an entertainment block.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a hot spot data generating device according to another embodiment of the present application.
As shown in fig. 6, the hot spot data generating apparatus provided in this embodiment may include:
an obtaining module 601, configured to obtain a hot topic based on public opinion information;
a screening module 602, configured to obtain target resource content in the target platform, where a correlation degree between the target resource content and the hot topic meets a preset condition;
the association module 603 is configured to associate the hot topic with the target resource content corresponding to the hot topic to obtain hot data corresponding to the hot topic;
referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to another embodiment of the present application.
As shown in fig. 7, the electronic device provided in this embodiment includes: at least one processor 701, memory 702, at least one network interface 703, and other user interfaces 704. The various components in the electronic device 700 are coupled together by a bus system 705. It is understood that the bus system 705 is used to enable communications among the components. The bus system 705 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various busses are labeled in figure 7 as the bus system 705.
The user interface 704 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, trackball, touch pad, or touch screen, among others.
It is to be understood that the memory 702 in embodiments of the present invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), Enhanced Synchronous SDRAM (ESDRAM), synchlronous SDRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 702 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
In some embodiments, memory 702 stores the following elements, executable units or data structures, or a subset thereof, or an expanded set thereof: an operating system 7021 and application programs 7022.
The operating system 7021 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 7022 includes various applications, such as a Media Player (Media Player), a Browser (Browser), and the like, for implementing various application services. Programs that implement methods in accordance with embodiments of the present invention can be included within application program 7022.
In the embodiment of the present invention, the processor 701 is configured to execute the method steps provided by the method embodiments by calling a program or an instruction stored in the memory 702, which may be, in particular, a program or an instruction stored in the application 7022.
The method disclosed in the above embodiments of the present invention may be applied to the processor 701, or implemented by the processor 701. The processor 701 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 701. The Processor 701 may be a general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software elements in the decoding processor. The software elements may be located in ram, flash, rom, prom, or eprom, registers, among other storage media that are well known in the art. The storage medium is located in the memory 702, and the processor 701 reads the information in the memory 702 and performs the steps of the above method in combination with the hardware thereof.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented in one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions of the present Application, or a combination thereof.
For a software implementation, the techniques herein may be implemented by means of units performing the functions herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
The embodiment of the invention also provides a storage medium (computer readable storage medium). The storage medium herein stores one or more programs. Among others, the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.
When one or more programs in the storage medium are executable by one or more processors to implement the above-described method performed on the electronic device side.
The processor is adapted to execute the program stored in the memory to implement the steps of the method provided by the aforementioned method embodiments performed on the electronic device side.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (13)

1. A hotspot data generation method is characterized by comprising the following steps:
acquiring hot topics based on public opinion information;
acquiring target resource content in a target platform, wherein the association degree of the target resource content and the hot topic meets a preset condition;
and associating the hot topic with the target resource content corresponding to the hot topic to obtain hot data corresponding to the hot topic.
2. The method of claim 1, wherein obtaining target resource content in a target platform comprises:
inputting the hot topics into a preset search engine, and acquiring initial resource content obtained by the search engine through searching from resource content of a target platform according to the hot topics;
aiming at any initial resource content, inputting the initial resource content and the hot topic into a pre-trained content relevance calculation model to obtain the relevance of the initial resource content and the hot topic;
and judging whether the association degree meets a preset condition, and if so, determining the initial resource content as the target resource content.
3. The method according to claim 2, wherein the content relevance calculation model includes a text generation layer and a similarity calculation layer;
the step of inputting the initial resource content and the hot topic into a pre-trained content relevance degree calculation model aiming at any initial resource content to obtain the relevance degree of the initial resource content and the hot topic comprises the following steps:
aiming at any initial resource content, inputting the hot topic and the initial resource content into the text generation layer, and acquiring a first text abstract corresponding to the hot topic and a second text abstract corresponding to the initial resource content;
inputting the first text abstract and the second text abstract into the similarity calculation layer to obtain the similarity of the first text abstract and the second text abstract;
determining the similarity of the determined first text abstract and the second text abstract as the association degree of the initial resource content and the hot topic.
4. The method according to claim 2, wherein the determining whether the association degree satisfies a preset condition comprises:
determining a target relevance threshold corresponding to a topic source platform of the hot topic according to a mapping relation between a preset platform and a relevance threshold;
and if the association degree is greater than or equal to the target association degree threshold value, judging that the association degree meets a preset condition.
5. The method of claim 1, wherein the obtaining of the hot topic based on the public opinion information comprises:
acquiring initial topics contained in public opinion information in an off-site platform, wherein the off-site platform is a platform except the target platform in a preset platform set;
inputting the initial topic into a pre-trained text classification model, and acquiring the probability that the initial topic belongs to a target type;
and if the probability is larger than a preset threshold value, determining the initial topic as a hot topic.
6. The method of claim 1, further comprising:
for any target hot topic in all the hot topics, determining display position data corresponding to the target hot topic according to a preset display algorithm;
and displaying the hot spot data corresponding to the target hot spot topic according to the display position data.
7. The method as claimed in claim 6, wherein the determining, according to a preset display algorithm, display position data corresponding to any one target hot topic among all the hot topics comprises:
for any target hot topic in all the hot topics, determining the number of the hot topics which belong to the same one off-site platform as the target hot topic from all the hot topics corresponding to the generated hot data;
determining a quality weight corresponding to the off-site platform corresponding to the target hot topic according to a preset mapping relation between the off-site platform and the quality weight;
and determining display position data of the target hot topic according to the quality weight and the quantity and a preset sorting algorithm.
8. The method of claim 7, wherein the determining the display position data of the target hot topic according to the quality weight and the quantity according to a preset sorting algorithm comprises:
and multiplying the quantity by the quality weight to obtain display position data corresponding to the target hot topic.
9. A method of data processing, the method comprising:
capturing topics from a target network, and determining the captured topics as hot topics;
the method according to any claim 1-8, comprising the steps of matching hotspot data corresponding to the hotspot topic from in-station resources;
and pushing the hot topic and the hot data corresponding to the hot topic to a client.
10. A method of data processing, the method comprising:
responding to a page content acquisition request of client equipment, acquiring a hot topic obtained by the method of any one of claims 1-8 and hot data corresponding to the hot topic;
and sending the hot data corresponding to the hot topic to the client device.
11. An apparatus for generating hotspot data, the apparatus comprising:
the acquisition module is used for acquiring hot topics based on public sentiment information;
the screening module is used for acquiring target resource content in the target platform, and the association degree of the target resource content and the hot topic meets a preset condition;
and the association module is used for associating the hot topic with the target resource content corresponding to the hot topic to obtain the hot data corresponding to the hot topic.
12. An electronic device, comprising: at least one processor and memory;
the processor is configured to execute a program stored in the memory to implement the method of any of claims 1-10.
13. A storage medium, characterized in that the storage medium stores one or more programs which, when executed, implement the method of any one of claims 1-10.
CN202110948195.2A 2021-08-18 2021-08-18 Hot data generation method and device, data processing method and electronic equipment Pending CN113656695A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110948195.2A CN113656695A (en) 2021-08-18 2021-08-18 Hot data generation method and device, data processing method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110948195.2A CN113656695A (en) 2021-08-18 2021-08-18 Hot data generation method and device, data processing method and electronic equipment

Publications (1)

Publication Number Publication Date
CN113656695A true CN113656695A (en) 2021-11-16

Family

ID=78480856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110948195.2A Pending CN113656695A (en) 2021-08-18 2021-08-18 Hot data generation method and device, data processing method and electronic equipment

Country Status (1)

Country Link
CN (1) CN113656695A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424278A (en) * 2013-08-29 2015-03-18 腾讯科技(深圳)有限公司 Method and device for acquiring hotspot information
CN109871433A (en) * 2019-02-21 2019-06-11 北京奇艺世纪科技有限公司 Calculation method, device, equipment and the medium of document and the topic degree of correlation
CN110457599A (en) * 2019-08-15 2019-11-15 中国电子信息产业集团有限公司第六研究所 Hot topic method for tracing, device, server and readable storage medium storing program for executing
CN111046281A (en) * 2019-12-05 2020-04-21 中国银行股份有限公司 Hot topic construction method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424278A (en) * 2013-08-29 2015-03-18 腾讯科技(深圳)有限公司 Method and device for acquiring hotspot information
CN109871433A (en) * 2019-02-21 2019-06-11 北京奇艺世纪科技有限公司 Calculation method, device, equipment and the medium of document and the topic degree of correlation
CN110457599A (en) * 2019-08-15 2019-11-15 中国电子信息产业集团有限公司第六研究所 Hot topic method for tracing, device, server and readable storage medium storing program for executing
CN111046281A (en) * 2019-12-05 2020-04-21 中国银行股份有限公司 Hot topic construction method and device

Similar Documents

Publication Publication Date Title
US9594826B2 (en) Co-selected image classification
US8290927B2 (en) Method and apparatus for rating user generated content in search results
US10599743B2 (en) Providing localized individually customized updates from a social network site to a desktop application
US8290926B2 (en) Scalable topical aggregation of data feeds
US10503803B2 (en) Animated snippets for search results
US8463785B2 (en) Method and system for generating search collection of query
US8856125B1 (en) Non-text content item search
US11722575B2 (en) Dynamic application content analysis
CN106095766A (en) Use selectivity again to talk and correct speech recognition
JP2009510637A (en) Selecting high-quality reviews for display
RU2685991C1 (en) Instant context-based search recommendations
MX2015006040A (en) Grammar model for structured search queries.
Lu et al. Knowledge enhanced personalized search
JP2022533282A (en) Multi-tier scalable media analytics
JP6162134B2 (en) Social page trigger
US9558233B1 (en) Determining a quality measure for a resource
CN113656695A (en) Hot data generation method and device, data processing method and electronic equipment
US8713040B2 (en) Method and apparatus for increasing query traffic to a web site
US9152701B2 (en) Query classification
US20160150038A1 (en) Efficiently Discovering and Surfacing Content Attributes
WO2015168938A1 (en) Entity based content distribution
US11921731B2 (en) Pipeline for document scoring
Roegiest Finding Microblog Posts of User Interest

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination