CN111310058B - Information theme recommendation method, device, terminal and storage medium - Google Patents

Information theme recommendation method, device, terminal and storage medium Download PDF

Info

Publication number
CN111310058B
CN111310058B CN202010227922.1A CN202010227922A CN111310058B CN 111310058 B CN111310058 B CN 111310058B CN 202010227922 A CN202010227922 A CN 202010227922A CN 111310058 B CN111310058 B CN 111310058B
Authority
CN
China
Prior art keywords
information
theme
topics
topic
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010227922.1A
Other languages
Chinese (zh)
Other versions
CN111310058A (en
Inventor
蔡远俊
盛广智
陈奇石
郑烨翰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010227922.1A priority Critical patent/CN111310058B/en
Publication of CN111310058A publication Critical patent/CN111310058A/en
Application granted granted Critical
Publication of CN111310058B publication Critical patent/CN111310058B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a recommendation method, a recommendation device, a recommendation terminal and a recommendation storage medium for information topics, and relates to the technical field of intelligent search. The specific implementation scheme of the method in the application is as follows: determining a topic of information; determining the writing degree of the theme; screening out target topics according to the similarity between the topics and the sketching degree of the topics; and sending the target theme to a recommender user. The method and the system can conduct intelligent search on the whole network information, provide topics with higher popularity and timeliness for the recommender user, and enable the recommender user to create information with higher popularity and higher timeliness.

Description

Information theme recommendation method, device, terminal and storage medium
Technical Field
The present disclosure relates to intelligent search technology in the field of data processing technologies, and in particular, to a method, an apparatus, a terminal, and a storage medium for recommending an information topic.
Background
With the development of the internet, the information content on the network is more and more rich. Information meeting the user requirements can be recommended to the user.
Currently, when recommending information meeting the user requirements for the user, the information with higher search rate can be determined by adopting a mode of searching records based on the histories of the users; and then recommending the information with higher search rate to the user.
However, the above manner is only suitable for user, and for the recommender user (author), the recommender user needs to timely master the appropriate theme and write the information corresponding to the theme; then, the user can view the information. However, in the prior art, a suitable theme cannot be provided for the recommender user, so that the recommender user cannot provide information with higher heat and higher timeliness.
Disclosure of Invention
The method, the device, the terminal and the storage medium for recommending the information subject can conduct intelligent search on the whole network information, provide the subject with higher heat and timeliness for a recommender user, and enable the recommender user to create the information with higher heat and timeliness.
In a first aspect, an embodiment of the present application provides a recommendation method for an information topic, where the method includes:
determining a topic of information;
determining the writing degree of the theme;
screening out target topics according to the similarity between the topics and the sketching degree of the topics;
and sending the target theme to a recommender user.
In this embodiment, by determining the topic of the information; determining the writing degree of the theme; screening out target topics according to the similarity between the topics and the sketching degree of the topics; and sending the target theme to a recommender user. Therefore, topics can be screened according to the similarity and the sketching degree between the topics, topics with higher popularity and timeliness are searched out from the whole network, and a recommender user can create information with higher popularity and higher timeliness.
In one possible design, the determining the subject matter of the information includes:
acquiring information to be processed;
classifying the information to be processed through a classification model to obtain information with mass fraction larger than a first preset value;
determining N topics corresponding to each piece of information after classification processing; the N topics comprise M entity type topics and N-M topic type topics.
In this embodiment, the information is classified by the classification model, for example, a classification model for advertisement discrimination, a classification model for yellow-reverse discrimination, and a classification model for inventory information are constructed, so that the information can be classified into high-quality information and low-quality information by these models. Filtering out low-quality information, and performing topic determination processing on high-quality information, so that high-quality information content can be screened out, and the data processing amount of the subsequent steps is reduced.
In one possible design, the determining N topics corresponding to each piece of information after the classification process includes:
according to the existing general knowledge graph, extracting and representing knowledge of the information content to obtain an entity link relation;
according to the entity link relation, entity knowledge and entity association information related to the input content are found from the knowledge graph, and an entity type theme of the information is obtained;
Extracting keywords of the information;
according to the statistical information of the keywords in the information, determining topic subjects corresponding to the information; wherein the statistical information includes: the frequency of keywords, the part of speech of the keywords, and the degree to which the keywords conform to the subject matter of the article.
In this embodiment, for each piece of information, knowledge is extracted and represented based on the existing universal knowledge graph, and entity knowledge and entity association information related to the input content are found from the knowledge graph based on entity links, so as to determine the entity type subject of the information. That is, the physical topic of each information is extracted from each information according to the key words of each information by adopting the knowledge graph entity association mode. Extracting a keyword of each piece of information; according to the statistics information such as the frequency of keywords, the part of speech of keywords, the conformity degree of keywords and articles, etc., the topic label subject of the information is analyzed. Therefore, the information topics can be accurately divided, and the user can conveniently conduct classified reading on the information content.
In one possible design, the determining the degree of writing of the theme includes:
Extracting theme characteristics of the theme, wherein the theme characteristics comprise: the method comprises the steps of information quantity under a theme, user behavior data of the information under the theme in a time window, click rate of the theme in the time window, semantic distance scoring of the theme and a user field, and event probability scoring of the theme;
inputting the theme characteristics into a theme click rate model to obtain the sketching degree score of the theme; the topic click rate model is obtained through iterative training of a first topic data set which is marked with click rate and writing degree scores in advance.
In this embodiment, the sketching degree score of each topic is output by extracting the topic features of the topics and inputting the topic features into the click rate model, so that the sketching degree of each topic can be intuitively judged, wherein the higher the sketching degree score is, the higher the popularity and timeliness of the corresponding topic are.
In one possible design, the screening the target theme according to the similarity between the themes and the writing degree of the theme includes:
determining the similarity between every two topics through a similarity judging model; the similarity judging model is obtained through iterative training of a second theme data set with similarity marked in advance, and elements in the second theme data set are subsets formed by two themes;
Performing de-duplication treatment on the topics with similarity larger than a second preset value to obtain candidate topics;
and selecting at least one candidate theme from the candidate themes as the target theme according to the order of the writing degree score from high to low.
In this embodiment, a second topic sample data set (a data set formed by topics with similarity of every 2 topics labeled) is input into the topic similarity discrimination model to obtain a trained topic similarity discrimination model. For the topics to be processed, inputting every 2 topics into a trained topic similarity discrimination model to obtain the similarity (i.e. similarity probability) between every 2 topics. And performing de-duplication treatment on the topics with higher similarity (the similarity is larger than a preset threshold value) according to the sequence of the writing degree scores of the topics from high to low, and further filtering out repeated topics to obtain filtered topics. Therefore, the target theme with high heat and high timeliness can be obtained.
In one possible design, before the target subject is sent to the recommender user, the method further comprises:
acquiring user information of the recommender user;
and determining a target theme matched with the user information.
In this embodiment, a matching check may be further performed on the target subject according to the user information, so that the finally recommended target subject matches with the field of the recommender user that is good for writing, and more accurate information recommendation is achieved.
In one possible design, before determining the writing degree of the theme, the method further includes:
determining a domain label corresponding to each theme;
and selecting a theme corresponding to the domain label matched with the user information according to the user information of the recommender user.
In this embodiment, before analyzing the sketching degree of the theme, the domain labels of the theme are labeled to obtain a domain division theme with finer granularity. Thereby realizing more accurate topic screening.
In one possible design, the determining the domain label corresponding to each topic includes:
integrating the information according to the theme to obtain a plurality of pieces of information under the same theme;
classifying the information through a multi-classification model, and marking the domain label of each piece of information; the multi-classification model is obtained through iterative training of an information data set marked with a field label;
And taking the domain labels with the occurrence frequency arranged in the front P bits as the domain labels corresponding to the topics.
In the embodiment, division of fine division of the domain labels of the topics is realized, namely, the domain of the topics is further refined, and the problem of insufficient subdivision of the topics in the prior art is solved.
In a second aspect, an embodiment of the present application provides an information topic recommendation device, where the device includes:
a first determining module for determining a topic of the information;
the second determining module is used for determining the writing degree of the theme;
the screening module is used for screening out target topics according to the similarity between the topics and the sketching degree of the topics;
and the recommending module is used for sending the target theme to a recommender user.
In this embodiment, by determining the topic of the information; determining the writing degree of the theme; screening out target topics according to the similarity between the topics and the sketching degree of the topics; and sending the target theme to a recommender user. Therefore, topics can be screened according to the similarity and the sketching degree between the topics, topics with higher popularity and timeliness are searched out from the whole network, and a recommender user can create information with higher popularity and higher timeliness.
In one possible design, the first determining module is specifically configured to:
acquiring information to be processed;
classifying the information to be processed through a classification model to obtain information with mass fraction larger than a first preset value;
determining N topics corresponding to each piece of information after classification processing; the N topics comprise M entity type topics and N-M topic type topics.
In this embodiment, the information is classified by the classification model, for example, a classification model for advertisement discrimination, a classification model for yellow-reverse discrimination, and a classification model for inventory information are constructed, so that the information can be classified into high-quality information and low-quality information by these models. Filtering out low-quality information, and performing topic determination processing on high-quality information, so that high-quality information content can be screened out, and the data processing amount of the subsequent steps is reduced.
In one possible design, the first determining module is specifically configured to:
according to the existing general knowledge graph, extracting and representing knowledge of the information content to obtain an entity link relation;
according to the entity link relation, entity knowledge and entity association information related to the input content are found from the knowledge graph, and an entity type theme of the information is obtained;
Extracting keywords of the information;
according to the statistical information of the keywords in the information, determining topic subjects corresponding to the information; wherein the statistical information includes: the frequency of keywords, the part of speech of the keywords, and the degree to which the keywords conform to the subject matter of the article.
In this embodiment, for each piece of information, knowledge is extracted and represented based on the existing universal knowledge graph, and entity knowledge and entity association information related to the input content are found from the knowledge graph based on entity links, so as to determine the entity type subject of the information. That is, the physical topic of each information is extracted from each information according to the key words of each information by adopting the knowledge graph entity association mode. Extracting a keyword of each piece of information; according to the statistics information such as the frequency of keywords, the part of speech of keywords, the conformity degree of keywords and articles, etc., the topic label subject of the information is analyzed. Therefore, the information topics can be accurately divided, and the user can conveniently conduct classified reading on the information content.
In one possible design, the second determining module is specifically configured to:
Extracting theme characteristics of the theme, wherein the theme characteristics comprise: the method comprises the steps of information quantity under a theme, user behavior data of the information under the theme in a time window, click rate of the theme in the time window, semantic distance scoring of the theme and a user field, and event probability scoring of the theme;
inputting the theme characteristics into a theme click rate model to obtain the sketching degree score of the theme; the topic click rate model is obtained through iterative training of a first topic data set which is marked with click rate and writing degree scores in advance.
In this embodiment, the sketching degree score of each topic is output by extracting the topic features of the topics and inputting the topic features into the click rate model, so that the sketching degree of each topic can be intuitively judged, wherein the higher the sketching degree score is, the higher the popularity and timeliness of the corresponding topic are.
In one possible design, the screening module is specifically configured to:
determining the similarity between every two topics through a similarity judging model; the similarity judging model is obtained through iterative training of a second theme data set with similarity marked in advance, and elements in the second theme data set are subsets formed by two themes;
Performing de-duplication treatment on the topics with similarity larger than a second preset value to obtain candidate topics;
and selecting at least one candidate theme from the candidate themes as the target theme according to the order of the writing degree score from high to low.
In this embodiment, a second topic sample data set (a data set formed by topics with similarity of every 2 topics labeled) is input into the topic similarity discrimination model to obtain a trained topic similarity discrimination model. For the topics to be processed, inputting every 2 topics into a trained topic similarity discrimination model to obtain the similarity (i.e. similarity probability) between every 2 topics. And performing de-duplication treatment on the topics with higher similarity (the similarity is larger than a preset threshold value) according to the sequence of the writing degree scores of the topics from high to low, and further filtering out repeated topics to obtain filtered topics. Therefore, the target theme with high heat and high timeliness can be obtained.
In one possible design, the method further comprises: an acquisition module for:
acquiring user information of the recommender user;
and determining a target theme matched with the user information.
In this embodiment, a matching check may be further performed on the target subject according to the user information, so that the finally recommended target subject matches with the field of the recommender user that is good for writing, and more accurate information recommendation is achieved.
In one possible design, the method further comprises: a third determining module, configured to:
determining a domain label corresponding to each theme;
and selecting a theme corresponding to the domain label matched with the user information according to the user information of the recommender user.
In this embodiment, before analyzing the sketching degree of the theme, the domain labels of the theme are labeled to obtain a domain division theme with finer granularity. Thereby realizing more accurate topic screening.
In one possible design, the third determining module is specifically configured to:
integrating the information according to the theme to obtain a plurality of pieces of information under the same theme;
classifying the information through a multi-classification model, and marking the domain label of each piece of information; the multi-classification model is obtained through iterative training of an information data set marked with a field label;
and taking the domain labels with the occurrence frequency arranged in the front P bits as the domain labels corresponding to the topics.
In the embodiment, division of fine division of the domain labels of the topics is realized, namely, the domain of the topics is further refined, and the problem of insufficient subdivision of the topics in the prior art is solved.
In a third aspect, the present application provides a terminal, including: a processor and a memory; the memory stores executable instructions of the processor; wherein the processor is configured to perform the recommendation method for information topics as described in any of the first aspects via execution of the executable instructions.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements the recommendation method for information topics according to any one of the first aspects.
In a fifth aspect, embodiments of the present application provide a program product comprising: a computer program stored in a readable storage medium, from which it can be read by at least one processor of a server, the at least one processor executing the computer program causing the server to perform the recommendation method for information topics according to any of the first aspect.
One embodiment of the above application has the following advantages or benefits: the intelligent search can be performed on the whole network information, and topics with higher popularity and timeliness are provided for the recommender user, so that the recommender user can create information with higher popularity and higher timeliness. Because the degree of writing that determines the subject is employed; screening out target topics according to the similarity between the topics and the sketching degree of the topics; the technical means of sending the target theme to the recommender user is overcome, the technical problem that proper theme cannot be provided for the recommender user, and further the recommender user cannot provide information with higher popularity and higher timeliness is solved, the themes are screened according to the similarity and the writing degree among the themes, the themes with higher popularity and higher timeliness are searched out from the whole network, and further the recommender user can create the technical effect of the information with higher popularity and higher timeliness. Other effects of the above alternative will be described below in connection with specific embodiments.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is a schematic diagram of a recommendation method for information topics in which embodiments of the present application may be implemented;
FIG. 2 is a schematic diagram according to a first embodiment of the present application;
FIG. 3 is a schematic diagram according to a second embodiment of the present application;
FIG. 4 is a schematic diagram according to a third embodiment of the present application;
FIG. 5 is a schematic diagram according to a fourth embodiment of the present application;
FIG. 6 is a schematic diagram according to a fifth embodiment of the present application;
FIG. 7 is a schematic diagram according to a sixth embodiment of the present application;
fig. 8 is a block diagram of a terminal used to implement an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of operation in sequences other than those illustrated or described herein, for example. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical scheme of the present application is described in detail below with specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
In the prior art, when recommending information meeting the user requirements for the user, the information with higher search rate can be determined by adopting a mode of searching records based on the histories of the users; and then recommending the information with higher search rate to the user. For a recommender user (author), the recommender user needs to timely master a proper theme and write information corresponding to the theme; then, the user can view the information. However, in the prior art, a suitable theme cannot be provided for the recommender user, so that the recommender user cannot provide information with higher heat and higher timeliness.
Aiming at the technical problems, the application aims to provide a recommendation method, a recommendation device, a recommendation terminal and a recommendation storage medium for information topics, which can perform intelligent search on whole-network information and provide topics with higher popularity and timeliness for a recommender user, so that the recommender user can create information with higher popularity and higher timeliness. The method provided by the application can be applied to terminal equipment, for example: cell phones, tablet computers, etc.
Fig. 1 is a schematic diagram of a recommendation method for an information topic, which can implement an embodiment of the present application, as shown in fig. 1, firstly, information to be processed is obtained from all networks, including obtaining massive information from various channels such as hundred-degree search, microblog hot search, forum, and the like. Therefore, a complete view of the whole network information can be established from multiple angles, the topics of the information are more abundant, the influence of the attention points of the user groups on the site is avoided, and the real situation is reflected more objectively. Then, the information is classified by the classification model, for example, a classification model for advertisement discrimination, a classification model for yellow-back discrimination, a classification model for inventory classification information are constructed, so that the information can be classified into high quality information and low quality information by these models. Filtering out low-quality information, and performing topic determination processing on high-quality information, so that high-quality information content can be screened out, and the data processing amount of the subsequent steps is reduced. Then, a plurality of topics corresponding to the sponsored message are extracted from the high quality information. N topics corresponding to each piece of information after classification processing are determined, wherein the N topics comprise: according to the existing general knowledge graph, extracting and representing knowledge of the information content to obtain an entity link relation; according to the entity link relation, entity knowledge and entity association information related to the input content are found from the knowledge graph, and an entity type theme of the information is obtained; extracting keywords of the information; according to the statistical information of the keywords in the information, determining topic subjects corresponding to the information; wherein the statistical information includes: the frequency of keywords, the part of speech of the keywords, and the degree to which the keywords conform to the subject matter of the article. The topics may then be described from a plurality of topic feature dimensions, scoring the degree of topic composition. For example, the amount of topic-related information, user behavior data (number of reviews, number of views, number of praise, number of users) within a time window (last hour, last day, last week, history), click-through rate of topics within a time window (last hour, last day, last week, history) (this is the true click-through rate of topics, not the click-through rate of information, e.g., whether the author clicked the click-through rate), scoring topic-to-user domain semantic distance, topic event probability score (event, referring to the direction of research in the industry; e.g., every thing, whether it is an event), etc. User domain refers to the domain to which the recommender user (author) belongs, for example, when the author uses hundred degrees of hundred family numbers, the author selects the domain to which the recommender belongs, such as entertainment, medicines and history; the topic and the user field can be analyzed semantically, and then the score of the semantic distance between the topic and the user field is obtained. And then, inputting the theme characteristics into a theme click rate model to obtain the sketching degree score of the theme. The topic click rate model is obtained through iterative training of a first topic data set pre-labeled with click rate and writing degree scores by using various model structures including but not limited to a logistic regression (Logistic Regression, abbreviated as LR) model, a gradient lifting random tree (Gradient Boosting Decision Tree, abbreviated as GBDT) model, a deep neural network (Deep Neural Networks, abbreviated as DNN) model and the like. By the method, the theme characteristics of the themes can be extracted and input into the click rate model, and the writing degree score of each theme is output, so that the writing degree judgment of each theme can be intuitively performed, wherein the higher the writing degree score is, the higher the popularity and timeliness of the corresponding theme are. Finally, the similarity between every two topics can be determined through a similarity judging model. The similarity judging model is obtained through iterative training of a second theme data set with similarity marked in advance, and elements in the second theme data set are subsets formed by two themes. And then, performing de-duplication processing on the topics with the similarity larger than a second preset value to obtain candidate topics. And finally, selecting at least one candidate theme from the candidate themes as a target theme according to the order of the writing degree score from high to low, and sending the target theme to the user.
By applying the method, the topics can be screened according to the similarity and the writing degree among the topics, and topics with higher popularity and timeliness are searched out from the whole network, so that a recommender user can create information with higher popularity and higher timeliness.
Fig. 2 is a schematic diagram according to a first embodiment of the present application, as shown in fig. 2, the method in this embodiment may include:
s101, determining the topic of the information.
In this embodiment, the information to be processed may be acquired first. Then, classifying the information to be processed through the classifying model to obtain the information with the mass fraction larger than the first preset value. And finally, determining N topics corresponding to each piece of information after classification processing. The N topics comprise M entity type topics and N-M topic type topics.
Specifically, firstly, information to be processed is obtained from all networks, including obtaining massive information from various channels such as hundred-degree search, microblog hot search and forum. Therefore, a complete view of the whole network information can be established from multiple angles, the topics of the information are more abundant, the influence of the attention points of the user groups on the site is avoided, and the real situation is reflected more objectively. Then, the information is classified by the classification model, for example, a classification model for advertisement discrimination, a classification model for yellow-back discrimination, a classification model for inventory classification information are constructed, so that the information can be classified into high quality information and low quality information by these models. Filtering out low-quality information, and performing topic determination processing on high-quality information, so that high-quality information content can be screened out, and the data processing amount of the subsequent steps is reduced. Finally, extracting a plurality of topics corresponding to the information from the high-quality information.
Optionally, determining N topics corresponding to each piece of information after the classification processing includes: according to the existing general knowledge graph, extracting and representing knowledge of the information content to obtain an entity link relation; according to the entity link relation, entity knowledge and entity association information related to the input content are found from the knowledge graph, and an entity type theme of the information is obtained; extracting keywords of the information; according to the statistical information of the keywords in the information, determining topic subjects corresponding to the information; wherein the statistical information includes: the frequency of keywords, the part of speech of the keywords, and the degree to which the keywords conform to the subject matter of the article.
In particular, topics may be divided into physical topics and topic topics. The entity type theme is a coarse granularity theme, and the topic type theme is a fine granularity theme. For each piece of information, knowledge is extracted and represented based on the existing general knowledge graph, entity knowledge and entity association information related to the input content are found from the knowledge graph based on entity links, and the entity type subject of the information is determined. That is, the physical topic of each information is extracted from each information according to the key words of each information by adopting the knowledge graph entity association mode. Extracting a keyword of each piece of information; according to the statistics information such as the frequency of keywords, the part of speech of keywords, the conformity degree of keywords and articles, etc., the topic label subject of the information is analyzed. For example, yang Mi is a physical topic and event B is a topic. In general, a physical topic is hundreds of degrees encyclopedia. Whether it is a physical topic can also be determined simply based on whether there are hundred degrees encyclopedias. By the method, the information topics can be accurately divided, and the user can conveniently read the information content in a classified mode.
S102, determining the writing degree of the theme.
In this embodiment, the theme features of the theme may be extracted first. Wherein the theme features include: the amount of information under the theme, the user behavior data of the information under the theme in the time window, the click rate of the theme in the time window, the semantic distance scoring of the theme and the user field, the event probability score of the theme and the like. And then, inputting the theme characteristics into a theme click rate model to obtain the sketching degree score of the theme. The topic click rate model is obtained through iterative training of a first topic data set which is marked with click rate and writing degree scores in advance.
In particular, a topic may be described from multiple topic feature dimensions. For example, the amount of topic-related information, user behavior data (number of reviews, number of views, number of praise, number of users) within a time window (last hour, last day, last week, history), click-through rate of topics within a time window (last hour, last day, last week, history) (this is the true click-through rate of topics, not the click-through rate of information, e.g., whether the author clicked the click-through rate), scoring topic-to-user domain semantic distance, topic event probability score (event, referring to the direction of research in the industry; e.g., every thing, whether it is an event), etc. User domain refers to the domain to which the recommender user (author) belongs, for example, when the author uses hundred degrees of hundred family numbers, the author selects the domain to which the recommender belongs, such as entertainment, medicines and history; the topic and the user field can be analyzed semantically, and then the score of the semantic distance between the topic and the user field is obtained. And then, inputting the theme characteristics into a theme click rate model to obtain the sketching degree score of the theme. The topic click rate model is obtained through iterative training of a first topic data set pre-labeled with click rate and writing degree scores by using various model structures including but not limited to a logistic regression (Logistic Regression, abbreviated as LR) model, a gradient lifting random tree (Gradient Boosting Decision Tree, abbreviated as GBDT) model, a deep neural network (Deep Neural Networks, abbreviated as DNN) model and the like. By the method, the theme characteristics of the themes can be extracted and input into the click rate model, and the writing degree score of each theme is output, so that the writing degree judgment of each theme can be intuitively performed, wherein the higher the writing degree score is, the higher the popularity and timeliness of the corresponding theme are.
S103, screening out target topics according to the similarity among the topics and the sketching degree of the topics.
In this embodiment, the similarity between every two topics can be determined by a similarity discrimination model. The similarity judging model is obtained through iterative training of a second theme data set with similarity marked in advance, and elements in the second theme data set are subsets formed by two themes. And then, performing de-duplication processing on the topics with the similarity larger than a second preset value to obtain candidate topics. Finally, selecting at least one candidate theme from the candidate themes as a target theme according to the order of the writing degree score from high to low.
Specifically, the second topic sample data set (the data set formed by topics with the similarity of every 2 topics marked) may be input into the topic similarity discrimination model to obtain a trained topic similarity discrimination model. For the topics to be processed, inputting every 2 topics into a trained topic similarity discrimination model to obtain the similarity (i.e. similarity probability) between every 2 topics. And performing de-duplication treatment on the topics with higher similarity (the similarity is larger than a preset threshold value) according to the sequence of the writing degree scores of the topics from high to low, and further filtering out repeated topics to obtain filtered topics. Therefore, the target theme with high heat and high timeliness can be obtained.
S104, sending the target theme to the recommender user.
In this embodiment, the target topics may be sent to the recommender user, so that the topics may be rapidly screened according to the similarity and the writing degree between the topics, and topics with higher popularity and timeliness may be searched out from the whole network, so that the recommender user may create information with higher popularity and higher timeliness.
In this embodiment, by determining the topic of the information; determining the writing degree of a theme; screening out target topics according to the similarity among the topics and the sketching degree of the topics; and sending the target theme to the recommender user. Therefore, topics can be screened according to the similarity and the sketching degree between the topics, topics with higher popularity and timeliness are searched out from the whole network, and a recommender user can create information with higher popularity and higher timeliness.
Fig. 3 is a schematic diagram according to a second embodiment of the present application, and as shown in fig. 3, the method in this embodiment may include:
s201, determining the topic of the information.
S202, determining the writing degree of the theme.
S203, screening out target topics according to the similarity among the topics and the sketching degree of the topics.
S204, acquiring user information of a recommender user; a target topic that matches the user information is determined.
In this embodiment, user information of the recommender user may be further obtained, and a matching check may be performed on the target subject according to the recommender user information, so that the final recommended target subject matches with the field of the recommender user that is good for writing, thereby realizing more accurate information recommendation.
S205, the target theme is sent to the recommender user.
In this embodiment, the specific implementation process and technical principle of step S201 to step S203 and step S205 are described in the related descriptions of step S101 to step S104 in the method shown in fig. 2, and are not repeated here.
In this embodiment, by determining the topic of the information; determining the writing degree of a theme; screening out target topics according to the similarity among the topics and the sketching degree of the topics; and sending the target theme to the recommender user. Therefore, topics can be screened according to the similarity and the sketching degree between the topics, topics with higher popularity and timeliness are searched out from the whole network, and a recommender user can create information with higher popularity and higher timeliness.
In addition, the embodiment can also acquire the user information of the recommender user, and perform a matching check on the target subject according to the recommender user information, so that the finally recommended target subject accords with the field of the recommender user good writing, and more accurate information recommendation is realized.
Fig. 4 is a schematic diagram according to a third embodiment of the present application, as shown in fig. 4, the method in this embodiment may include:
s301, determining the topic of the information.
S302, determining the writing degree of the theme.
S303, screening out target topics according to the similarity among the topics and the sketching degree of the topics.
S304, determining a domain label corresponding to each theme; and selecting a theme corresponding to the domain label matched with the user information according to the user information of the recommender user.
In this embodiment, before analyzing the sketching degree of the theme, the domain labels of the theme may be labeled to obtain a domain division theme with finer granularity. And selecting the topic corresponding to the domain label matched with the user information according to the user information of the recommender user, thereby realizing more accurate topic screening.
Optionally, determining the domain label corresponding to each topic includes: integrating the information according to the theme to obtain a plurality of pieces of information under the same theme; classifying the information through a multi-classification model, and marking the domain label of each piece of information; the multi-classification model is obtained through iterative training of an information data set marked with a field label; and taking the domain labels with the occurrence frequency arranged in the front P bits as the domain labels corresponding to the topics.
In this embodiment, the information may be integrated according to the theme to obtain a plurality of information under the same theme. Then, the topic domain judging task is converted into a plurality of information domain judging tasks, namely all relevant information of the topic is subjected to domain classification through an information domain classifying model, the results are aggregated, and 5 domains with highest occurrence frequency are selected as final domain classification results of the topic. Therefore, division of fine division of the domain labels of the topics is realized, namely, the domain of the topics is further refined, and the problem of insufficient fine division of the topics in the prior art is solved.
And S305, sending the target theme to the recommender user.
In this embodiment, the specific implementation process and technical principle of step S301 to step S303 and step S305 are described in the related descriptions of step S101 to step S104 in the method shown in fig. 2, and are not repeated here.
In this embodiment, by determining the topic of the information; determining the writing degree of a theme; screening out target topics according to the similarity among the topics and the sketching degree of the topics; and sending the target theme to the recommender user. Therefore, topics can be screened according to the similarity and the sketching degree between the topics, topics with higher popularity and timeliness are searched out from the whole network, and a recommender user can create information with higher popularity and higher timeliness.
In addition, the embodiment can divide the fine division of the domain labels of the topics, namely, the domain of the topics is further refined, and the problem of insufficient subdivision of the topics in the prior art is solved.
FIG. 5 is a schematic diagram according to a fourth embodiment of the present application; as shown in fig. 5, the apparatus in this embodiment may include:
a first determining module 31 for determining a subject of the information;
a second determining module 32, configured to determine a degree of writing of the theme;
the screening module 33 is configured to screen out a target topic according to the similarity between topics and the sketching degree of the topic;
the recommendation module 34 is configured to send the target topic to a recommender user.
In this embodiment, by determining the topic of the information; determining the writing degree of a theme; screening out target topics according to the similarity among the topics and the sketching degree of the topics; and sending the target theme to the recommender user. Therefore, topics can be screened according to the similarity and the sketching degree between the topics, topics with higher popularity and timeliness are searched out from the whole network, and a recommender user can create information with higher popularity and higher timeliness.
In one possible design, the first determining module 31 is specifically configured to:
Acquiring information to be processed;
classifying the information to be processed through the classification model to obtain information with mass fraction larger than a first preset value;
determining N topics corresponding to each piece of information after classification processing; the N topics include M physical topics and N-M topic topics.
In this embodiment, the information is classified by the classification model, for example, a classification model for advertisement discrimination, a classification model for yellow-reverse discrimination, and a classification model for inventory information are constructed, so that the information can be classified into high-quality information and low-quality information by these models. Filtering out low-quality information, and performing topic determination processing on high-quality information, so that high-quality information content can be screened out, and the data processing amount of the subsequent steps is reduced.
In one possible design, the first determining module 31 is specifically configured to:
according to the existing general knowledge graph, extracting and representing knowledge of the information content to obtain an entity link relation;
according to the entity link relation, entity knowledge and entity association information related to the input content are found from the knowledge graph, and an entity type theme of the information is obtained;
extracting keywords of the information;
According to the statistical information of the keywords in the information, determining topic subjects corresponding to the information; wherein the statistical information includes: the frequency of keywords, the part of speech of the keywords, and the degree to which the keywords conform to the subject matter of the article.
In this embodiment, for each piece of information, knowledge is extracted and represented based on the existing universal knowledge graph, and entity knowledge and entity association information related to the input content are found from the knowledge graph based on entity links, so as to determine the entity type subject of the information. That is, the physical topic of each information is extracted from each information according to the key words of each information by adopting the knowledge graph entity association mode. Extracting a keyword of each piece of information; according to the statistics information such as the frequency of keywords, the part of speech of keywords, the conformity degree of keywords and articles, etc., the topic label subject of the information is analyzed. Therefore, the information topics can be accurately divided, and the user can conveniently conduct classified reading on the information content.
In one possible design, the second determining module 32 is specifically configured to:
extracting theme characteristics of a theme, wherein the theme characteristics comprise: the method comprises the steps of information quantity under a theme, user behavior data of the information under the theme in a time window, click rate of the theme in the time window, semantic distance scoring of the theme and a user field, and event probability scoring of the theme;
Inputting the topic features into a topic click rate model to obtain the sketching degree score of the topic; the topic click rate model is obtained through iterative training of a first topic data set which is marked with click rate and writing degree scores in advance.
In this embodiment, the sketching degree score of each topic is output by extracting the topic features of the topics and inputting the topic features into the click rate model, so that the sketching degree of each topic can be intuitively judged, wherein the higher the sketching degree score is, the higher the popularity and timeliness of the corresponding topic are.
In one possible design, the screening module 33 is specifically configured to:
determining the similarity between every two topics through a similarity judging model; the similarity judging model is obtained through iterative training of a second theme data set with similarity marked in advance, and elements in the second theme data set are subsets formed by two themes;
performing de-duplication treatment on the topics with similarity larger than a second preset value to obtain candidate topics;
and selecting at least one candidate theme from the candidate themes as a target theme according to the order of the writing degree score from high to low.
In this embodiment, a second topic sample data set (a data set formed by topics with similarity of every 2 topics labeled) is input into the topic similarity discrimination model to obtain a trained topic similarity discrimination model. For the topics to be processed, inputting every 2 topics into a trained topic similarity discrimination model to obtain the similarity (i.e. similarity probability) between every 2 topics. And performing de-duplication treatment on the topics with higher similarity (the similarity is larger than a preset threshold value) according to the sequence of the writing degree scores of the topics from high to low, and further filtering out repeated topics to obtain filtered topics. Therefore, the target theme with high heat and high timeliness can be obtained.
The information subject recommending apparatus of the present embodiment may execute the technical scheme in the method shown in fig. 2, and the specific implementation process and technical principle thereof refer to the related description in the method shown in fig. 2, which is not repeated herein.
In this embodiment, by determining the topic of the information; determining the writing degree of a theme; screening out target topics according to the similarity among the topics and the sketching degree of the topics; and sending the target theme to the recommender user. Therefore, topics can be screened according to the similarity and the sketching degree between the topics, topics with higher popularity and timeliness are searched out from the whole network, and a recommender user can create information with higher popularity and higher timeliness.
FIG. 6 is a schematic diagram according to a fifth embodiment of the present application; as shown in fig. 6, the apparatus in this embodiment may further include, on the basis of the apparatus shown in fig. 5:
the obtaining module 35 is specifically configured to:
acquiring user information of a recommender user;
a target topic that matches the user information is determined.
In this embodiment, a matching check may be further performed on the target subject according to the user information, so that the finally recommended target subject matches with the field of the recommender user that is good for writing, and more accurate information recommendation is achieved.
The information subject recommending apparatus of the present embodiment may execute the technical schemes in the methods shown in fig. 2 and 3, and the specific implementation process and technical principle thereof refer to the related descriptions in the methods shown in fig. 2 and 3, which are not repeated here.
In this embodiment, by determining the topic of the information; determining the writing degree of a theme; screening out target topics according to the similarity among the topics and the sketching degree of the topics; and sending the target theme to the recommender user. Therefore, topics can be screened according to the similarity and the sketching degree between the topics, topics with higher popularity and timeliness are searched out from the whole network, and a recommender user can create information with higher popularity and higher timeliness.
In addition, the embodiment can also acquire the user information of the recommender user, and perform a matching check on the target subject according to the recommender user information, so that the finally recommended target subject accords with the field of the recommender user good writing, and more accurate information recommendation is realized.
FIG. 7 is a schematic diagram according to a sixth embodiment of the present application; as shown in fig. 7, the apparatus in this embodiment may further include, on the basis of the apparatus shown in fig. 5:
the third determining module 36 is specifically configured to:
Determining a domain label corresponding to each theme;
and selecting a theme corresponding to the domain label matched with the user information according to the user information of the recommender user.
In this embodiment, before analyzing the sketching degree of the theme, the domain labels of the theme are labeled to obtain a domain division theme with finer granularity. Thereby realizing more accurate topic screening.
In one possible design, the third determination module 36 is specifically configured to:
integrating the information according to the theme to obtain a plurality of pieces of information under the same theme;
classifying the information through a multi-classification model, and marking the domain label of each piece of information; the multi-classification model is obtained through iterative training of an information data set marked with a field label;
and taking the domain labels with the occurrence frequency arranged in the front P bits as the domain labels corresponding to the topics.
In the embodiment, division of fine division of the domain labels of the topics is realized, namely, the domain of the topics is further refined, and the problem of insufficient subdivision of the topics in the prior art is solved.
The information subject recommending apparatus of the present embodiment may execute the technical schemes in the methods shown in fig. 2 and fig. 4, and the specific implementation process and technical principle thereof refer to the related descriptions in the methods shown in fig. 2 and fig. 4, which are not repeated here.
In this embodiment, by determining the topic of the information; determining the writing degree of a theme; screening out target topics according to the similarity among the topics and the sketching degree of the topics; and sending the target theme to the recommender user. Therefore, topics can be screened according to the similarity and the sketching degree between the topics, topics with higher popularity and timeliness are searched out from the whole network, and a recommender user can create information with higher popularity and higher timeliness.
In addition, the embodiment can divide the fine division of the domain labels of the topics, namely, the domain of the topics is further refined, and the problem of insufficient subdivision of the topics in the prior art is solved.
FIG. 8 is a block diagram of a terminal used to implement an embodiment of the present application; as shown in fig. 8, is a block diagram of the terminal of fig. 8 according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 8, the terminal includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 8.
Memory 502 is a non-transitory computer readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for recommending information subject of the terminal of fig. 8 provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the recommendation method for the information subject matter of fig. 8 provided herein.
The memory 502 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the recommended method of the information subject of fig. 8 in the embodiments of the present application. The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 502, that is, implements the recommendation method for the information subject of fig. 8 in the above-described method embodiment.
Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the terminal of fig. 8, etc. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected to the terminal of FIG. 8 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The terminal of fig. 8 may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 8.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the terminal of fig. 8, such as a touch screen, keypad, mouse, trackpad, touchpad, pointer stick, one or more mouse buttons, trackball, joystick, and like input devices. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be implemented in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), GPUs (graphics processors), FPGA (field programmable gate array) devices, computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (11)

1. A method for recommending information subjects, the method comprising:
determining a topic of information; the information comes from multiple channels of the network;
determining the writing degree of the theme;
screening out target topics according to the similarity between the topics and the sketching degree of the topics;
transmitting the target theme to a recommender user;
screening out a target theme according to the similarity between the themes and the writing degree of the theme, wherein the screening comprises the following steps:
determining the similarity between every two topics through a similarity judging model; the similarity judging model is obtained through iterative training of a second theme data set with similarity marked in advance, and elements in the second theme data set are subsets formed by two themes;
Performing de-duplication treatment on the topics with similarity larger than a second preset value to obtain candidate topics;
and selecting at least one candidate theme from the candidate themes as the target theme according to the order of the writing degree score from high to low.
2. The method of claim 1, wherein determining the topic of information comprises:
acquiring information to be processed;
classifying the information to be processed through a classification model to obtain information with mass fraction larger than a first preset value;
determining N topics corresponding to each piece of information after classification processing; the N topics comprise M entity type topics and N-M topic type topics.
3. The method of claim 2, wherein determining N topics for each piece of information after the classification process comprises:
according to the existing general knowledge graph, extracting and representing knowledge of the information content to obtain an entity link relation;
according to the entity link relation, entity knowledge and entity association information related to the input content are found from the knowledge graph, and an entity type theme of the information is obtained;
extracting keywords of the information;
according to the statistical information of the keywords in the information, determining topic subjects corresponding to the information; wherein the statistical information includes: the frequency of keywords, the part of speech of the keywords, and the degree to which the keywords conform to the subject matter of the article.
4. The method of claim 1, wherein the determining the degree of composition of the theme comprises:
extracting theme characteristics of the theme, wherein the theme characteristics comprise: the method comprises the steps of information quantity under a theme, user behavior data of the information under the theme in a time window, click rate of the theme in the time window, semantic distance scoring of the theme and a user field, and event probability scoring of the theme;
inputting the theme characteristics into a theme click rate model to obtain the sketching degree score of the theme; the topic click rate model is obtained through iterative training of a first topic data set which is marked with click rate and writing degree scores in advance.
5. The method of any of claims 1-4, further comprising, prior to transmitting the target topic to a recommender user:
acquiring user information of the recommender user;
and determining a target theme matched with the user information.
6. The method of any of claims 1-4, further comprising, prior to determining the writing level of the theme:
determining a domain label corresponding to each theme;
and selecting a theme corresponding to the domain label matched with the user information according to the user information of the recommender user.
7. The method of claim 6, wherein determining the domain label corresponding to each topic comprises:
integrating the information according to the theme to obtain a plurality of pieces of information under the same theme;
classifying the information through a multi-classification model, and marking the domain label of each piece of information; the multi-classification model is obtained through iterative training of an information data set marked with a field label;
and taking the domain labels with the occurrence frequency arranged in the front P bits as the domain labels corresponding to the topics.
8. An information topic recommendation device, the device comprising:
a first determining module for determining a topic of the information; the information comes from multiple channels of the network;
the second determining module is used for determining the writing degree of the theme;
the screening module is used for screening out target topics according to the similarity between the topics and the sketching degree of the topics;
the recommending module is used for sending the target theme to a recommender user;
the screening module is specifically configured to:
determining the similarity between every two topics through a similarity judging model; the similarity judging model is obtained through iterative training of a second theme data set with similarity marked in advance, and elements in the second theme data set are subsets formed by two themes;
Performing de-duplication treatment on the topics with similarity larger than a second preset value to obtain candidate topics;
and selecting at least one candidate theme from the candidate themes as the target theme according to the order of the writing degree score from high to low.
9. A terminal, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.
11. A method for recommending information subjects, the method comprising:
determining a topic of information; the information comes from multiple channels of the network;
determining the writing degree of the theme;
screening out target topics according to the writing degree of the topics;
transmitting the target theme to a recommender user;
Screening out a target theme according to the writing degree of the theme, including:
and selecting at least one theme from the themes as the target theme according to the order of the writing degree score from high to low.
CN202010227922.1A 2020-03-27 2020-03-27 Information theme recommendation method, device, terminal and storage medium Active CN111310058B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010227922.1A CN111310058B (en) 2020-03-27 2020-03-27 Information theme recommendation method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010227922.1A CN111310058B (en) 2020-03-27 2020-03-27 Information theme recommendation method, device, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN111310058A CN111310058A (en) 2020-06-19
CN111310058B true CN111310058B (en) 2023-08-08

Family

ID=71147430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010227922.1A Active CN111310058B (en) 2020-03-27 2020-03-27 Information theme recommendation method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN111310058B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343083A (en) * 2021-05-25 2021-09-03 北京字节跳动网络技术有限公司 Theme pushing method and device, storage medium and computer equipment
CN113378555B (en) * 2021-06-22 2023-06-27 富途网络科技(深圳)有限公司 Intelligent association method of individual strands and related products

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577579A (en) * 2013-11-08 2014-02-12 南方电网科学研究院有限责任公司 Resource recommendation method and system based on potential demands of users
CN103678474A (en) * 2013-09-24 2014-03-26 浙江大学 Method for acquiring large number of hot topics fast in social network
CN105095202A (en) * 2014-04-17 2015-11-25 华为技术有限公司 Method and device for message recommendation
CN105740468A (en) * 2016-03-07 2016-07-06 达而观信息科技(上海)有限公司 Individuation recommendation method and system combined with content publisher information
CN109460518A (en) * 2018-12-07 2019-03-12 杭州东信北邮信息技术有限公司 A kind of book recommendation method based on user website access record
CN110457439A (en) * 2019-08-06 2019-11-15 北京如优教育科技有限公司 One-stop intelligent writes householder method, device and system
CN110457581A (en) * 2019-08-02 2019-11-15 达而观信息科技(上海)有限公司 A kind of information recommended method, device, electronic equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306249A1 (en) * 2009-05-27 2010-12-02 James Hill Social network systems and methods
WO2012137397A1 (en) * 2011-04-01 2012-10-11 パナソニック株式会社 Content-processing device, content-processing method, content-processing program, and integrated circuit
US20130066862A1 (en) * 2011-09-12 2013-03-14 Microsoft Corporation Multi-factor correlation of internet content resources
US20140278986A1 (en) * 2013-03-14 2014-09-18 Clipfile Corporation Tagging and ranking content
KR102535044B1 (en) * 2015-12-08 2023-05-23 삼성전자주식회사 Terminal, server and method for suggesting event thereof
US11086883B2 (en) * 2016-04-15 2021-08-10 Google Llc Systems and methods for suggesting content to a writer based on contents of a document
US10191990B2 (en) * 2016-11-21 2019-01-29 Comcast Cable Communications, Llc Content recommendation system with weighted metadata annotations
US10685178B2 (en) * 2018-03-26 2020-06-16 Adobe Inc. Defining and delivering personalized entity recommendations

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678474A (en) * 2013-09-24 2014-03-26 浙江大学 Method for acquiring large number of hot topics fast in social network
CN103577579A (en) * 2013-11-08 2014-02-12 南方电网科学研究院有限责任公司 Resource recommendation method and system based on potential demands of users
CN105095202A (en) * 2014-04-17 2015-11-25 华为技术有限公司 Method and device for message recommendation
CN105740468A (en) * 2016-03-07 2016-07-06 达而观信息科技(上海)有限公司 Individuation recommendation method and system combined with content publisher information
CN109460518A (en) * 2018-12-07 2019-03-12 杭州东信北邮信息技术有限公司 A kind of book recommendation method based on user website access record
CN110457581A (en) * 2019-08-02 2019-11-15 达而观信息科技(上海)有限公司 A kind of information recommended method, device, electronic equipment and storage medium
CN110457439A (en) * 2019-08-06 2019-11-15 北京如优教育科技有限公司 One-stop intelligent writes householder method, device and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于用户行为特征的多维度文本聚类;黎万英;黄瑞章;丁志远;陈艳平;徐立洋;;计算机应用(第11期);全文 *

Also Published As

Publication number Publication date
CN111310058A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN111984689B (en) Information retrieval method, device, equipment and storage medium
CN111104514B (en) Training method and device for document tag model
CN112650907B (en) Search word recommendation method, target model training method, device and equipment
US20210209416A1 (en) Method and apparatus for generating event theme
CN111831821B (en) Training sample generation method and device of text classification model and electronic equipment
CN111859982B (en) Language model training method and device, electronic equipment and readable storage medium
CN111783468B (en) Text processing method, device, equipment and medium
CN109189931B (en) Target statement screening method and device
US20210200813A1 (en) Human-machine interaction method, electronic device, and storage medium
CN111858905B (en) Model training method, information identification device, electronic equipment and storage medium
CN112329453B (en) Method, device, equipment and storage medium for generating sample chapter
CN111091006A (en) Entity intention system establishing method, device, equipment and medium
CN111563198B (en) Material recall method, device, equipment and storage medium
CN111310058B (en) Information theme recommendation method, device, terminal and storage medium
CN111400456B (en) Information recommendation method and device
CN111984775A (en) Question and answer quality determination method, device, equipment and storage medium
CN112084150A (en) Model training method, data retrieval method, device, equipment and storage medium
CN111984774A (en) Search method, device, equipment and storage medium
CN111385188A (en) Recommendation method and device for dialog elements, electronic equipment and medium
CN113342946B (en) Model training method and device for customer service robot, electronic equipment and medium
CN111291184A (en) Expression recommendation method, device, equipment and storage medium
CN112052397B (en) User characteristic generation method and device, electronic equipment and storage medium
CN111460257B (en) Thematic generation method, apparatus, electronic device and storage medium
CN112650919A (en) Entity information analysis method, apparatus, device and storage medium
CN112148979B (en) Event-associated user identification method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant