CN113836399A - Theme recommendation method and device, computing equipment and storage medium - Google Patents

Theme recommendation method and device, computing equipment and storage medium Download PDF

Info

Publication number
CN113836399A
CN113836399A CN202111033672.9A CN202111033672A CN113836399A CN 113836399 A CN113836399 A CN 113836399A CN 202111033672 A CN202111033672 A CN 202111033672A CN 113836399 A CN113836399 A CN 113836399A
Authority
CN
China
Prior art keywords
keyword
character
sequence
candidate
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111033672.9A
Other languages
Chinese (zh)
Inventor
袁威强
李家诚
俞霖霖
沙雨辰
胡光龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202111033672.9A priority Critical patent/CN113836399A/en
Publication of CN113836399A publication Critical patent/CN113836399A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The embodiment of the disclosure provides a theme recommendation method, a theme recommendation device, a computing device and a storage medium, wherein the method comprises the following steps: acquiring at least one keyword in the hotspot text; expanding each keyword in the at least one keyword to obtain an expanded result of each keyword; obtaining at least one keyword combination according to the expansion result of the at least one keyword; obtaining a candidate theme according to each keyword combination in the at least one keyword combination; and recommending the theme according to at least one candidate theme generated by the at least one keyword combination. By extracting keywords from the current hot spot text and expanding and combining the keywords to generate candidate topics for topic recommendation, the candidate topics are determined to be in accordance with the hot spots, and meanwhile the diversity of the candidate topics is also ensured.

Description

Theme recommendation method and device, computing equipment and storage medium
Technical Field
Embodiments of the present disclosure relate to the field of information processing technologies, and in particular, to a theme recommendation method and apparatus, a computing device, and a storage medium.
Background
This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
In the scene of creating contents such as video, audio, text and the like, the first step is the most basic step for the creator to plan the theme which is in line with the hot spot and can attract the audience. Tracking the hot spots and planning out the proper theme is time-consuming and labor-consuming for the creator.
Disclosure of Invention
In view of this, the embodiments of the present disclosure at least provide a topic recommendation method, device, computing device and storage medium, which can perform topic recommendation based on a current hotspot and provide various authoring topics that conform to the hotspot for an author.
In a first aspect of embodiments of the present disclosure, there is provided a topic recommendation method, including:
acquiring at least one keyword in the hotspot text;
expanding each keyword in the at least one keyword to obtain an expanded result of each keyword;
obtaining at least one keyword combination according to the expansion result of the at least one keyword;
obtaining a candidate theme according to each keyword combination in the at least one keyword combination;
and recommending the theme according to at least one candidate theme generated by the at least one keyword combination.
In an embodiment of the present disclosure, the acquiring at least one keyword in the hotspot text includes:
acquiring a first character sequence corresponding to the hot spot text;
performing feature extraction on the first character sequence to obtain a first feature sequence of the first character sequence;
determining a classification result of each character in the first character sequence according to the first feature sequence, wherein the classification result represents that the character is the beginning of a keyword, or is a part except the beginning of the keyword, or does not belong to the keyword;
and obtaining at least one keyword in the hot spot text according to the classification result of each character in the first character sequence.
In an embodiment of the present disclosure, the obtaining at least one keyword in the hot text according to a classification result of each character in the first character sequence includes:
taking continuous characters in the first character sequence which meet the following conditions as keywords:
the first character is followed by at least one second character, wherein the classification result of the first character indicates the beginning of the keyword, and the classification result of the second character indicates the part of the keyword except the beginning.
In an embodiment of the present disclosure, the expanding each keyword of the at least one keyword to obtain an expanded result of each keyword includes at least one of:
obtaining an expansion word of the keyword according to the synonym or the synonym of the keyword;
using the keywords to search, and obtaining the expanded words of the keywords according to the matching result
And taking the words with the word vector similarity meeting the preset requirement as the expansion words of the keywords.
In an embodiment of the disclosure, the obtaining at least one keyword combination according to the expansion result of the at least one keyword includes:
obtaining an expanded keyword set corresponding to the keyword according to the keyword and the expanded words of the keyword;
selecting a plurality of target keywords from an expanded keyword set corresponding to the at least one keyword, wherein the target keywords are the keywords or the expanded words, and the target keywords comprise at least one keyword;
and combining the plurality of target keywords to obtain the keyword combination.
In an embodiment of the present disclosure, the obtaining a candidate topic according to each keyword combination of the at least one keyword combination includes:
acquiring a second character sequence corresponding to the keyword combination;
performing feature extraction on the second character sequence to obtain a second feature sequence of the second character sequence;
inputting the second sequence of features into a decoder, the decoder generating a prediction of at least one character;
and obtaining the candidate theme according to the prediction result.
In one embodiment of the present disclosure, the decoder determines a character to be generated corresponding to the t +1 th position in the candidate subject generated by the decoder by:
determining probability values of the characters to be generated corresponding to all characters in the preset word list according to the preset word list, the second characteristic sequence and a third characteristic sequence of t characters before the characters to be generated;
determining the character to be generated based on the maximum value in the probability values;
wherein t is a positive integer, and the third feature sequence is obtained by performing feature extraction on t characters before the character to be generated;
and, the 1 st character when t is 1 is determined according to the preset word list and the second feature sequence.
In an embodiment of the present disclosure, the obtaining a candidate topic according to each keyword combination of the at least one keyword combination includes:
obtaining the relevance score of the keyword combination and each topic in a preset topic library;
sorting the topics in the set topic library according to the relevance scores;
and obtaining the candidate theme according to the sorting result.
In an embodiment of the present disclosure, the obtaining a relevance score of the keyword combination and each topic in a preset topic library includes:
determining a correlation score between each word and the theme according to the frequency of each word in the keyword combination in the theme, the length of the theme and the average length of all the themes in the preset theme library;
determining the weight of each word according to the number of the topics contained in the preset topic library and the occurrence frequency of each word in the keyword combination in all the topics of the preset topic library;
and obtaining the relevance score of the keyword combination and the theme according to the relevance score and the weight corresponding to each word in the keyword combination.
In an embodiment of the present disclosure, the recommending a topic according to at least one candidate topic generated by the at least one keyword combination includes:
filtering the candidate topics according to a sensitive word dictionary;
and recommending the topics according to the filtered candidate topics.
In an embodiment of the present disclosure, the recommending a topic according to at least one candidate topic generated by the at least one keyword combination includes:
searching one candidate topic in the at least one candidate topic by using a search engine to obtain the previous p search results; wherein p is a positive integer;
acquiring the time interval between the time corresponding to each search result in the p search results and the current time;
determining an average time interval according to the time intervals corresponding to the p search results;
discarding the candidate topic in response to the average time interval exceeding a preset time threshold.
In an embodiment of the present disclosure, the recommending a topic according to at least one candidate topic generated by the at least one keyword combination includes:
acquiring a fourth character sequence of one candidate theme in the at least one candidate theme;
performing feature extraction on the fourth character sequence to obtain a fourth feature sequence of the fourth character sequence;
obtaining sentence characteristics of the candidate subject according to the fourth characteristic sequence;
carrying out full connection processing on the sentence characteristics to obtain an attraction degree score of the candidate theme;
discarding the candidate topic in response to the attractiveness score being below a set score threshold.
In an embodiment of the present disclosure, the obtaining, according to the fourth feature sequence, a sentence feature of the candidate topic includes:
converting the fourth feature sequence into sentence features by carrying out any one of averaging, weighted averaging, taking the maximum and taking the minimum on the features of each dimension in the fourth feature sequence; or;
and converting the fourth feature sequence into sentence features by adopting an attention mechanism.
In one embodiment of the present disclosure, for a plurality of hot texts, the method further comprises:
randomly acquiring q h k candidate topics from the candidate topics corresponding to the hot spot texts; wherein q, k are positive integers, the number of q k is less than the number of candidate topics;
and according to the attraction degree scores of the candidate topics, sequentially selecting k candidate topics from the q k candidate topics for recommendation, wherein the similarity between any two candidate topics in the selected candidate topics is smaller than a set similarity threshold value.
In a second aspect of the disclosed embodiments, there is provided a topic recommendation apparatus comprising:
the acquisition unit is used for acquiring at least one keyword in the hotspot text;
the expansion unit is used for expanding each keyword in the at least one keyword and obtaining at least one keyword combination according to the expansion result of the at least one keyword;
the generating unit is used for obtaining a candidate theme according to each keyword combination in the at least one keyword combination;
and the recommending unit is used for recommending the topics for the generated quality evaluation result of the candidate topics according to the at least one keyword.
In an embodiment of the present disclosure, the obtaining unit is specifically configured to:
acquiring a first character sequence corresponding to the hot spot text;
performing feature extraction on the first character sequence to obtain a first feature sequence of the first character sequence;
determining a classification result of each character in the first character sequence according to the first feature sequence, wherein the classification result represents that the character is the beginning of a keyword, or is a part except the beginning of the keyword, or does not belong to the keyword;
and obtaining at least one keyword in the hot spot text according to the classification result of each character in the first character sequence.
In an embodiment of the disclosure, when the obtaining unit is configured to obtain at least one keyword in the hotspot text according to a classification result of each character in the first character sequence, the obtaining unit is specifically configured to:
taking continuous characters in the first character sequence which meet the following conditions as keywords:
the first character is followed by at least one second character, wherein the classification result of the first character indicates the beginning of the keyword, and the classification result of the second character indicates the part of the keyword except the beginning.
In an embodiment of the disclosure, the extension unit is specifically configured to at least one of:
obtaining an expansion word of the keyword according to the synonym or the synonym of the keyword;
searching by using the keywords, and obtaining the expansion words of the keywords according to matching results;
and taking the words with the word vector similarity meeting the preset requirement as the expansion words of the keywords.
In an embodiment of the present disclosure, when the extension unit is configured to obtain at least one keyword combination according to an extension result of the at least one keyword, the extension unit is specifically configured to:
obtaining an expanded keyword set corresponding to the keyword according to the keyword and the expanded words of the keyword;
selecting a plurality of target keywords from an expanded keyword set corresponding to the at least one keyword, wherein the target keywords are the keywords or the expanded words, and the target keywords comprise at least one keyword;
and combining the plurality of target keywords to obtain the keyword combination.
In an embodiment of the present disclosure, the generating unit is specifically configured to:
acquiring a second character sequence corresponding to the keyword combination;
performing feature extraction on the second character sequence to obtain a second feature sequence of the second character sequence;
inputting the second sequence of features into a decoder, the decoder generating a prediction of at least one character;
and obtaining the candidate theme according to the prediction result.
In one embodiment of the present disclosure, the decoder determines a character to be generated corresponding to the t +1 th position in the candidate subject generated by the decoder by:
determining probability values of the characters to be generated corresponding to all characters in the preset word list according to the preset word list, the second characteristic sequence and a third characteristic sequence of t characters before the characters to be generated;
determining the character to be generated based on the maximum value in the probability values;
wherein t is a positive integer, and the third feature sequence is obtained by performing feature extraction on t characters before the character to be generated;
and, the 1 st character when t is 1 is determined according to the preset word list and the second feature sequence.
In an embodiment of the disclosure, when the generating unit is configured to obtain the candidate topic according to each keyword combination of the at least one keyword combination, the generating unit is specifically configured to:
obtaining the relevance score of the keyword combination and each topic in a preset topic library;
sorting the topics in the set topic library according to the relevance scores;
and obtaining the candidate theme according to the sorting result.
In an embodiment of the disclosure, the generating unit, when configured to obtain the correlation score between the keyword combination and each topic in the preset topic library, is specifically configured to:
determining a correlation score between each word and the theme according to the frequency of each word in the keyword combination in the theme, the length of the theme and the average length of all the themes in the preset theme library;
determining the weight of each word according to the number of the topics contained in the preset topic library and the occurrence frequency of each word in the keyword combination in all the topics of the preset topic library;
and obtaining the relevance score of the keyword combination and the theme according to the relevance score and the weight corresponding to each word in the keyword combination.
In an embodiment of the present disclosure, the recommending unit is specifically configured to:
filtering the candidate topics according to a sensitive word dictionary;
and recommending the topics according to the filtered candidate topics.
In an embodiment of the present disclosure, the recommending unit is specifically configured to:
searching one candidate topic in the at least one candidate topic by using a search engine to obtain the previous p search results; wherein p is a positive integer;
acquiring the time interval between the time corresponding to each search result in the p search results and the current time;
determining an average time interval according to the time intervals corresponding to the p search results;
discarding the candidate topic in response to the average time interval exceeding a preset time threshold.
In an embodiment of the present disclosure, the recommending unit is specifically configured to:
acquiring a fourth character sequence of one candidate theme in the at least one candidate theme;
performing feature extraction on the fourth character sequence to obtain a fourth feature sequence of the fourth character sequence;
obtaining sentence characteristics of the candidate subject according to the fourth characteristic sequence;
carrying out full connection processing on the sentence characteristics to obtain an attraction degree score of the candidate theme;
discarding the candidate topic in response to the attractiveness score being below a set score threshold.
In an embodiment of the present disclosure, the recommending unit is specifically configured to:
converting the fourth feature sequence into sentence features by carrying out any one of averaging, weighted averaging, taking the maximum and taking the minimum on the features of each dimension in the fourth feature sequence; or;
and converting the fourth feature sequence into sentence features by adopting an attention mechanism.
In an embodiment of the present disclosure, the subject apparatus further includes a selecting unit, configured to:
randomly acquiring q h k candidate topics from the candidate topics corresponding to the hot spot texts; wherein q, k are positive integers, the number of q k is less than the number of candidate topics;
and according to the attraction degree scores of the candidate topics, sequentially selecting k candidate topics from the q k candidate topics for recommendation, wherein the similarity between any two candidate topics in the selected candidate topics is smaller than a set similarity threshold value.
In a third aspect of embodiments of the present disclosure, there is provided a computing device comprising: a processor; and a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method of any embodiment of the present disclosure.
In a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium comprising: a computer program which, when executed by a processor, implements the method of any embodiment of the disclosure.
According to the topic recommendation method, the topic recommendation device, the computing equipment and the storage medium of the embodiment of the disclosure, each keyword in at least one keyword is expanded by acquiring the at least one keyword in a hot text, so that an expansion result of each keyword is obtained; and finally, performing theme recommendation according to at least one candidate theme generated by the at least one keyword combination.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
FIG. 1 schematically illustrates a flow chart of a topic recommendation method in accordance with an embodiment of the present disclosure;
FIG. 2 is a schematic diagram schematically illustrating a method for obtaining keywords according to an embodiment of the disclosure;
FIG. 3 schematically shows a schematic diagram of a method of generating candidate topics according to an embodiment of the present disclosure;
FIG. 4 schematically shows a schematic diagram of an attractiveness score prediction method according to an embodiment of the present disclosure;
FIG. 5 schematically shows a schematic structural diagram of a topic recommendation apparatus according to an embodiment of the present disclosure;
FIG. 6 schematically illustrates a computer-readable storage medium according to an embodiment of the present disclosure;
fig. 7 schematically shows a structural schematic diagram of a computing device according to an embodiment of the present disclosure.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to the embodiment of the disclosure, a theme recommendation method, a theme recommendation device, a computing device and a storage medium are provided. In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.
The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.
In the scene of creating contents such as video, audio, text and the like, the first step is the most basic step for the creator to plan the theme which is in line with the hot spot and can attract the audience. Tracking the hot spots and planning out the proper theme is time-consuming and labor-consuming for the creator.
Therefore, the embodiment of the present disclosure provides a topic recommendation scheme, where a candidate topic is obtained by expanding keywords in a hot text and combining at least one keyword according to an expansion result of each keyword, and a topic recommendation is performed according to the candidate topic, which aims to perform topic recommendation based on a current hot spot, and provide a topic selection idea for an author when planning topics.
The theme recommendation scheme provided by the embodiment of the disclosure can be applied to various recommendation scenes, such as authoring theme recommendation of an audio and video platform or a literature authoring platform and authoring theme recommendation in an audio and video APP, or can be applied to an independent authoring theme recommendation platform or APP, and the like.
Fig. 1 schematically shows a flow of a topic recommendation method according to an embodiment of the present disclosure. The theme recommendation method can be executed by an electronic device such as a terminal device or a server, the terminal device can be a fixed terminal or a mobile terminal, such as a mobile phone, a tablet computer, a game machine, a desktop computer, an advertisement machine, a one-piece machine, a vehicle-mounted terminal, and the like, the server includes a local server or a cloud server, and the method can be implemented by a processor calling a computer readable instruction stored in a memory. As shown in fig. 1, the method may include the following processes:
in step 101, at least one keyword in the hotspot text is obtained.
The hot text refers to text content corresponding to the current hot topic. The hot text may include real-time hot data collected from the internet, such as the title and content of hot news on a content platform, such as a microblog, a hundred degree, a headline, etc. Those skilled in the art should understand that the hot text may also be real-time hot data collected through other mediums, and the embodiments of the present disclosure do not limit the channel, means, and specific form of the hot text for obtaining the hot text.
In step 102, each keyword of the at least one keyword is expanded to obtain an expanded result of each keyword.
In order to improve the richness of candidate topics, keywords extracted from the hot text can be expanded. The expansion of the keywords refers to expanding a keyword into at least one similar keyword, taking the keyword and the at least one similar keyword as the expansion result of the keyword, and expanding the obtained similar keyword to obtain an expanded word of the keyword.
In step 103, at least one keyword combination is obtained according to the expansion result of the at least one keyword.
Combining the keywords and the expanded words contained in the expanded results corresponding to the keywords extracted from the hot text to obtain one or more keyword combinations.
Taking the hotspot text as an example of "score of college entrance examination", the keywords extracted from the hotspot text include "college entrance examination", "score", wherein the expansion words similar to the keyword "college entrance examination" include "exam", "middle entrance examination", and the like, and the expansion words similar to the keyword "score" include "score", "result", and the like, that is, the expansion result of the keywords obtained from the hotspot text includes the above keywords and expansion words, and the keyword combination obtained from the expansion result includes "score of college entrance examination", "result of college entrance examination", "middle entrance examination score", and the like.
In step 104, a candidate topic is obtained according to each keyword combination of the at least one keyword combination.
For each keyword combination generated in step 103, one or more candidate topics may be obtained. The trained candidate topic generation network may be used to generate candidate topics according to the keyword combination prediction, or candidate topics may be obtained by retrieving the keyword combination, or candidate topics may be obtained according to the keyword combination in other ways.
In step 105, a topic recommendation is made based on at least one candidate topic generated by the at least one keyword combination.
In the at least one candidate topic obtained in step 104, quality evaluation may be performed from multiple aspects of security, timeliness, attractiveness, and the like, so as to perform topic recommendation, so as to ensure the quality of the topic recommended to the creator.
In the embodiment of the disclosure, according to the topic recommendation method, the topic recommendation device, the computing device and the storage medium in the embodiment of the disclosure, each keyword in at least one keyword is expanded by acquiring the at least one keyword in a hotspot text, so as to obtain an expansion result of each keyword; and then obtaining at least one keyword combination according to the expansion result of the at least one keyword, obtaining a candidate topic according to each keyword combination in the at least one keyword combination, and finally performing topic recommendation according to at least one candidate topic generated by the at least one keyword combination.
Having described the general principles of the present disclosure, various non-limiting embodiments of the present disclosure are described in detail below.
Fig. 2 schematically shows a schematic diagram of a method for acquiring a keyword according to an embodiment of the present disclosure. The method may be implemented by a trained keyword extraction neural network, wherein the keyword extraction network may include an Encoder (Encoder)201 and a Multi-Layer Perceptron (MLP) 202. As shown in fig. 2, the method may include the following processes:
first, a first character sequence { c1, c2, c3,.., cn } corresponding to the hot text is obtained, wherein n is the number of characters contained in the hot text.
The first character sequence corresponding to the hot text refers to a sequence formed by each character in the hot text. If the hot text is in a Chinese form, taking a Chinese character as the minimum unit of the character; if the text is hot text in a foreign language form, one word is taken as the minimum unit of the character.
In a case that the hot text contains a plurality of parts of text content, the plurality of parts of text content may be first spliced into one text sentence S, and then the text sentence may be split into a plurality of characters to obtain the first character sequence. For example, when the hot text is hot search news of a certain news platform, the title and the text of the hot search news may be spliced into a text sentence S, and the text sentence may be split into a plurality of characters to obtain a first character sequence.
And then, performing feature extraction on the first character sequence to obtain a first feature sequence of the first character sequence.
As shown in fig. 2, the first character sequence { c1, c2, c3, ·, cn } may be input to the encoder 201 for encoding, and a first feature sequence { h1, h2, h3, ·, hn } of the first character sequence { c1, c2, c3, ·, cn } is obtained. The encoder 201 may adopt a network structure in the related art, and the specific structure of the encoder is not limited in the embodiment of the present disclosure.
And then, according to the first characteristic sequence, determining a classification result of each character in the first character sequence.
In one example, a first sequence of features { h1, h2, h 3.,. hn } may be input to multi-layered perceptron 202, which multi-layered perceptron 202 outputs a per-character classification result, i.e., a sequence of class labels { h1, h2, h 3.,. hn }. The character represented by the classification result is the beginning of the keyword, or is a part (including the middle part and the end of the keyword) except the beginning of the keyword, or does not belong to the keyword.
Specifically, the multi-layer perceptron 202 may output a keyword tag probability distribution yi of a character i, where i ∈ [1, n ], n is the number of characters contained in the hotspot text, and yi is a normalized probability vector whose vector dimension is determined by the number of tags in the keyword tag set. That is, the multi-layered perceptron 202 may output the keyword tag probability sequences { y1, y2, y3,.. and yn } corresponding to the first character sequence.
In the embodiment of the present disclosure, the set of tags of the keyword may be set to L ═ { B _ Key, I _ Key, O }, where the tag B _ Key represents that the character is the beginning of the keyword, the tag I _ Key represents that the character is a part other than the beginning of the keyword, and the tag is set to L ═ B _ Key, I _ Key, O }, where the tag B _ Key represents that the character is a part other than the beginning of the keywordoCharacterizing the character does not belong to a keyword. When the keyword tag set is L ═ { B _ Key, I _ Key, O }, the vector dimension of the keyword tag probability distribution yi of the character I output by the multilayer perceptron 202 is 3, and the tag type of each character, that is, the classification result of the character, can be determined according to yi.
In an example, after obtaining the keyword tag probability distribution yi of the character i output by the multilayer perceptron 202, the serial number j corresponding to the maximum value of yi may be obtained, and then the tag type of the character i is the jth tag in the keyword tag set L. For example, the sequence number corresponding to the maximum value of the keyword tag probability distribution yi is 2, and then the tag at the position is the 2 nd tag in the keyword tag set L, that is, the I _ Key.
In the embodiment of the disclosure, the hot text is converted into the first character sequence, and the classification result of each character in the first character sequence is obtained by utilizing keyword extraction neural network prediction, that is, it is determined that each character is the beginning of a keyword, or is a part other than the beginning of the keyword, or does not belong to the keyword, so that the keyword included in the first character sequence can be accurately extracted.
In some embodiments, consecutive characters in the first character sequence that satisfy the following condition may be taken as keywords: the first character is followed by at least one second character, wherein the classification result of the first character indicates the beginning of the keyword, and the classification result of the second character indicates the part of the keyword except the beginning. That is, according to the category tag sequence { h1, h2, h3,.., hn } corresponding to the first character sequence, a continuous character sequence beginning with a B _ Key and marked with one or more I _ keys is used as an independent keyword, and a final keyword set is obtained according to all continuous character sequences meeting the condition in the first character sequence.
For example, the first character sequence corresponding to the hot text is "college achievements published this time", wherein the probability distribution y3 corresponding to the 3 rd character "high" is [0.9, 0.08, 0.02], so that it is known that the maximum value of y3, 0.9, corresponds to the position sequence 1, and the tag corresponding to "high" is the B _ Key. The probability distribution yn corresponding to the nth character is calculated in the same way, and the whole class label sequence can be obtained as { O, O, B _ Key, l _ Key, O, O, O, O, O }. In the category label sequence, if a consecutive character beginning with the B _ Key and followed by at least one ike can be used as an independent keyword, the keyword set contained in the first character sequence can be determined to be { college entrance examination, achievement }.
It is noted that the class tag sequence output by the multi-tier perceptron 202 may have instances where an I _ Key does not follow a B _ Key, e.g., { O, O, O, I _ Key, B _ Key, l _ Key, O, O, O, O, O, O }. Since the beginning of the keyword must be the B _ Key tag, the I _ Key tag at location 4 is considered an illegal tag, and can be ignored or processed as tag O and not recognized as a keyword.
In the embodiment of the disclosure, the keywords are determined according to the classification result of each character in the continuous characters, and the independent keywords contained in the first character sequence corresponding to the hot text can be accurately identified.
In some embodiments, each keyword of the at least one keyword may be expanded to obtain an expanded result of each keyword by:
firstly, obtaining the expansion words of the keywords according to the synonyms or the similar synonyms of the keywords.
And secondly, searching by using the keywords, and obtaining the expansion words of the keywords according to matching results. For example, the keyword may be retrieved in a search engine, and the expanded word of the keyword is obtained according to a matching result obtained by the search engine.
And thirdly, taking the words with the word vector similarity meeting the preset requirement as the expansion words of the keywords. For example, a word whose word vector similarity to the keyword is higher than a set threshold may be used as the expanded word of the keyword.
It should be understood by those skilled in the art that the above method for expanding the keyword is only an example, and the keyword may also be expanded in other suitable manners, which is not limited by the embodiment of the disclosure.
In the embodiment of the present disclosure, the keywords may be expanded in combination with the above multiple expansion manners to improve the richness of candidate questions obtained based on the keywords.
After the expanded words of each keyword included in the hot text are obtained, a keyword combination can be obtained in the following manner.
Firstly, obtaining an expanded keyword set corresponding to the keyword according to the keyword and the expanded words of the keyword.
Taking a hotspot text as an example of a score of a college entrance examination, keywords extracted from the hotspot text comprise a college entrance examination and a score, wherein an expansion keyword set of the college entrance examination is { "examination", "Chinese examination" }; the expansion keyword set of the score is { "score", "result".
And then, selecting a plurality of target keywords from an expanded keyword set corresponding to the at least one keyword, wherein the target keywords are the keywords or the expanded words, and the target keywords comprise at least one keyword.
For example, the extended keyword set corresponding to the keywords "college entrance", "achievement" included in the hotspot text "college entrance achievement" is { examination "," middle entrance "," achievement "," score "," result ". One or more target keywords may be selected from the set to be used for constructing a keyword combination, where the target keywords are the keywords or the expansion words, but it should be noted that, in order to ensure the correlation of the generated candidate topics, at least one keyword needs to be included in the selected one or more target keywords, that is, all the selected target keywords may not be expansion words.
And finally, combining the plurality of target keywords to obtain the keyword combination.
For example, in the case that the selected target keywords include "exam", "score", the keyword combination "exam score" may be obtained by combination; in the case where the selected target keywords include "college entrance", "score", a combination of keywords "college entrance score", and the like may be obtained.
In the embodiment of the disclosure, the richness of generating candidate topics can be improved by obtaining the extended keyword set corresponding to each keyword in the hotspot text, and then selecting the target keywords from the extended keyword sets corresponding to all the keywords to perform keyword combination; and by ensuring that each keyword combination comprises an original keyword, the relevance of the generated candidate subject is ensured.
In some embodiments, a trained candidate topic generation network may be utilized to generate candidate topics from the keyword combination prediction, and fig. 3 schematically illustrates a schematic diagram of a method for generating candidate topics according to an embodiment of the disclosure. The candidate topic generation network may include an encoder 301 and a decoder 302. As shown in fig. 3, the method may include the following processes:
first, a second character sequence { k11, k12< seg >, > k1, k 2< seg > } corresponding to the keyword combination is obtained.
For each keyword combination generated in step 103, the keyword combination may be represented as a second character sequence. Specifically, for each independent keyword kn in the keyword combination, the keyword may be firstly split into single characters, resulting in a sub-character sequence { kn1,.., knt }, where t is the length of the keyword kn; and after a segmentation mark < seg > is added behind each keyword, characters of all keywords in the keyword combination are spliced to obtain a second character sequence { k11, k12< seg >,.
And then, performing feature extraction on the second character sequence to obtain a second feature sequence of the second character sequence.
As shown in fig. 3, a second character sequence { k11, k12< seg >,. said, kn1, kn2< seg > } may be input to the encoder 301, and feature extraction may be performed on the second character sequence to obtain a second feature sequence { s1, s2, s 3.,. sm } of the second character sequence, where m is the number of characters included in the keyword combination, that is, the length of the second character sequence.
The second signature sequence is then input to a decoder 302, which generates a prediction of at least one character.
After obtaining the second signature sequence { s1, s2, s 3., sm }, the decoder 302 decodes character by character until the prediction result is generated, until the end marker < EOS >.
In some embodiments, the decoder 302 may obtain the prediction result of the character to be generated corresponding to the t +1 th position in the candidate topic generated by the decoder by: determining probability values of the characters to be generated corresponding to all characters in the preset word list according to the preset word list, the second characteristic sequence and a third characteristic sequence of t characters before the characters to be generated; determining the character to be generated based on a maximum value of the probability values. Wherein t is a positive integer, and the third feature sequence is obtained by performing feature extraction on t characters before the character to be generated; and, for the 1 st character when t is 1, it is determined according to the preset word list and the second characteristic sequence; the preset vocabulary usually includes a plurality of common words, and may be obtained in advance or constructed in advance.
Specifically, t characters before the character to be generated, that is, the generated character sequence, may be expressed as a<SOS>A1, a2, a3,.., at }, wherein,<SOS>is the starting marker. First, obtaining a third feature sequence { e0, e1, e2, e3,. and. et } of the generated character sequence, then inputting a second feature sequence { s1, s2, s3,. and. sm } of the second character sequence and the third feature sequence { e1, e2, e3,. and. et } of the generated character sequence into the decoder 302, and the decoder 302 outputting probability values of t +1 th character corresponding to each character in the preset word list, that is, the decoder 302 outputting a word list probability distribution y of t +1 th charactert+1,yt+1The value of the probability distribution vector is consistent with the size of the preset word list, and the value of each dimension j represents the probability size of the jth word in the preset word list. The word with the highest probability may generally be selected as the output as the currently generated character (the t +1 th character).
If no character is generated, that is, the character to be generated is the 1 st character, the probability value of the 1 st character corresponding to each character in the preset vocabulary can be obtained according to the preset vocabulary and the second feature sequence. For example, the second feature sequence { s1, s2, s3, ·, sm } is input to the decoder 302, and after the decoder 302 outputs the vocabulary probability distribution y1 of the 1 st character, the sequence number r corresponding to the maximum value in y1 may be obtained, and the r-th word in the preset vocabulary corresponding to the sequence number r is taken as the 1 st character.
And finally, obtaining the candidate theme according to the prediction result.
In the embodiment of the disclosure, the character to be generated can be accurately predicted by using the second feature sequence corresponding to the keyword combination and the third feature sequence of the generated character, so as to obtain the candidate theme.
In some embodiments, the candidate topic may be obtained by retrieving the keyword combination.
Specifically, the relevance score between the keyword combination and each topic in a preset topic library may be obtained first, the topics in the preset topic library are ranked according to the relevance score, and finally the candidate topics are obtained according to the ranking result. The preset topic library includes a large number of topics, and the sources of the topics may include, but are not limited to, various topic libraries disclosed on the internet, obtained user query topics, and the like.
By performing relevance calculation on the keyword combination and each topic in a preset topic library, all topics in the preset topic library can be ranked, generally, the topics are ranked from high to low according to the relevance of the keyword combination, and the top-ranked topics are topics more relevant to the keyword combination. Topics ranked before the preset ranking may be taken as candidate topics. For example, a topic with a relevance ranking 10 top to a keyword combination in the preset topic library may be used as a candidate topic.
In one example, the relevancy score of the keyword combination to each topic in the preset topic library may be obtained by the following method.
Firstly, determining the relevance score of each word and the theme according to the frequency of each word in the keyword combination in the theme, the length of the theme and the average length of all the themes in the preset theme library. That is, the higher the frequency of occurrence of a word in a keyword combination in a topic, the higher the relevance of the word to the topic.
And then, determining the weight of each word according to the number of the topics contained in the preset topic library and the occurrence frequency of each word in the keyword combination in all the topics of the preset topic library. That is, the more topics including a word in the preset topic library, the lower the weight of the word, that is, the lower the importance of the word in determining the relevance.
And obtaining the relevance score of the keyword combination and the theme according to the relevance score and the weight corresponding to each word in the keyword combination. For example, the relevance score corresponding to each word is weighted by the weight of each word, and then the weighted relevance scores of each word in the keyword combination are accumulated, so that the relevance score between the keyword combination and the topic can be obtained.
In the embodiment of the present disclosure, the algorithm for calculating the relevance between the keyword combination and each topic in the preset topic library may be built in a self-constructed search engine, so that each keyword group can be searched by the search engine, and a ranking result of the relevance score between each topic in the preset topic library and the keyword combination is obtained.
In the embodiment of the present disclosure, candidate topics more consistent with a hot text may be obtained by obtaining candidate topics according to the relevance scores of the keyword combinations and the respective topics in the preset topic library.
For the candidate topics obtained in step 104, the embodiment of the present disclosure performs quality evaluation from multiple perspectives, such as security, timeliness, and attractiveness, to perform topic recommendation, so as to ensure the quality of the topic recommended to the creator.
From the perspective of content security, in order to avoid the occurrence of topics with potential safety hazards in the recommended topics, the candidate topics may be filtered according to the sensitive word dictionary, and the topic recommendation may be performed according to the filtered candidate topics.
From a timeliness perspective, if a recommended topic is not timeliness good, the topic may not be in compliance with the current hotspot, not in compliance with the author's expectations. In view of this, the embodiment of the present disclosure provides a method for performing quality evaluation according to timeliness.
Firstly, searching one candidate topic in the at least one candidate topic by using a search engine to obtain the previous p search results; wherein p is a positive integer.
For example, inputting a candidate topic into a search engine for searching may obtain a large number of search results, and from these search results, the top p search results are obtained.
And then, acquiring the time interval between the time corresponding to each search result in the p search results and the current time.
And the time corresponding to the search result refers to the time of issuing the search result. Taking the search result as a news report as an example, the time corresponding to the search result is the generation time of news.
And then, determining an average time interval according to the time intervals corresponding to the p search results. Responsive to the average time interval exceeding a preset time threshold, discarding the candidate topic. The value of p and the preset time threshold may be set according to actual needs, for example, p is set to 20, and the preset time threshold is set to 60 days.
In the embodiment of the present disclosure, by filtering the candidate topics according to the occurrence time of the search results of the candidate topics, the timeliness of the recommended topics can be ensured.
On the basis of meeting the safety and timeliness, the embodiment of the disclosure further provides that the candidate topics are filtered according to the attraction degree scores of the candidate topics.
In some embodiments, the attractiveness score of the candidate topic may be predicted by using a trained attractiveness evaluation network, and fig. 4 schematically illustrates a schematic diagram of an attractiveness score prediction method according to an embodiment of the disclosure. The attractiveness evaluation network may include an encoder 401, a sentence feature extraction network 402, a full connection layer 403, and an activation layer 404. As shown in fig. 4, the method may include the following processes:
first, a fourth character sequence { d1, d2, d3,. and dt } of one candidate topic in the at least one candidate topic is obtained, wherein t is the number of characters contained in the candidate topic. The fourth character sequence of the candidate theme refers to a sequence formed by each character in the candidate theme.
And then, performing feature extraction on the fourth character sequence to obtain a fourth feature sequence of the fourth character sequence.
As shown in fig. 4, the fourth character sequence { d1, d2, d3, ·, dt } may be input to the encoder 401 for encoding, and a fourth feature sequence { f1, f2, f3,... ft } of the fourth character sequence { d1, d2, d3,. dt } is obtained. The encoder 401 may adopt a network structure in the related art, and the specific structure of the encoder is not limited in the embodiments of the present disclosure.
And then, obtaining sentence characteristics of the candidate topic according to the fourth characteristic sequence.
As shown in fig. 4, the sentence feature extraction network 402 may be used to perform any of the following operations on the fourth feature sequence to obtain the sentence features of the candidate topic: converting the fourth feature sequence into sentence features by carrying out any one of averaging, weighted averaging, taking the maximum and taking the minimum on the features of each dimension in the fourth feature sequence; or converting the fourth feature sequence into sentence features by adopting an attention mechanism.
Then, the sentence features are subjected to full connection processing, and the attraction degree score of the candidate topic can be obtained. As shown in fig. 4, the attraction score of the candidate topic may be obtained by inputting sentence features into the fully connected layer 403 and the active layer 404, wherein the active function used by the active layer 404 may be Sigmod, for example, so that the attraction score is between 0 and 1.
Finally, in response to the attractiveness score being below a set score threshold, the candidate topic is discarded. That is, when the attraction degree score does not satisfy the requirement, the candidate topic is filtered out and not recommended.
In the embodiment of the disclosure, the candidate topics are filtered according to the attraction degree scores of the candidate topics, so that the attraction of the recommended topics to the audience is ensured.
According to the topic recommendation method provided by the embodiment of the disclosure, a plurality of related candidate topics can be generated for each original hot text, and a plurality of hot texts can finally generate a large number of candidate topics, wherein candidate topics with high similarity can exist. If the candidate questions with high similarity are recommended to the creators at the same time, the use experience of the creators is seriously influenced, and the thought divergence of the creators is also favorably stimulated. On the other hand, if the generated large number of candidate questions are directly subjected to duplicate removal, the time complexity is high, and the diversity of the candidate questions is damaged to a certain extent. In view of this, the embodiment of the present disclosure further provides a method for selecting a topic meeting the requirement from candidate topics for recommendation. A certain amount of topics are selected from the candidate topics, and then recommendation is performed on the selected topics according to the similarity between the attraction degree score and the candidate topics, so that the situation that the duplication of a large number of candidate topics is directly removed is avoided, and the diversity of the candidate topics is maintained.
Specifically, q h k candidate topics are randomly obtained from candidate topics corresponding to the hot spot texts; wherein q, k are positive integers, and the number of q k is less than the number of candidate topics. That is, each time a topic recommendation is made, q k candidate topics are first randomly selected from a large number of candidate topics.
And then, according to the attraction degree scores of the candidate topics, sequentially selecting k candidate topics from the q & ltk candidate topics for recommendation, wherein the similarity between any two candidate topics in the selected candidate topics is smaller than a set similarity threshold value.
For the selected q x k candidate topics, firstly, sorting the candidate topics from top to bottom according to the attraction degree score, and sequentially selecting k candidate topics to be recommended. When each candidate theme is selected, the similarity between the candidate theme and any one of the candidate themes selected before needs to be ensured to be lower than a set similarity threshold. For example, the set similarity threshold may be set to O.9, q may take 5, and k may take 5.
The similarity between candidate topics can be measured in various ways, including but not limited to training a similarity neural network model, cosine similarity based on sentence vectors, Jaccard similarity coefficient, and the like. A
In the embodiment of the disclosure, the candidate topics meeting the requirements are selected from the candidate topics for recommendation in the above manner, so that the recommendation method is flexible and convenient, and the diversity of the recommended topics is also ensured.
In order to realize the theme recommendation method of any embodiment of the disclosure, the embodiment of the disclosure also provides a theme recommendation device. Fig. 5 schematically shows a structural diagram of a theme recommendation apparatus according to an embodiment of the present disclosure, which may be applied to a computing device, for example. In the following description, the functions of the respective modules of the apparatus will be briefly described, and the detailed processing thereof may be combined with the description of the subject recommendation method referring to any one of the embodiments of the present disclosure described above.
As shown in fig. 5, the apparatus may include:
an obtaining unit 501, configured to obtain at least one keyword in a hotspot text;
an expansion unit 502, configured to expand each keyword of the at least one keyword, and obtain at least one keyword combination according to an expansion result of the at least one keyword;
a generating unit 503, configured to obtain a candidate topic according to each keyword combination in the at least one keyword combination;
a recommending unit 504, configured to recommend a topic to the generated quality evaluation result of the candidate topic according to the at least one keyword.
In an embodiment of the present disclosure, the obtaining unit is specifically configured to:
acquiring a first character sequence corresponding to the hot spot text;
performing feature extraction on the first character sequence to obtain a first feature sequence of the first character sequence;
determining a classification result of each character in the first character sequence according to the first feature sequence, wherein the classification result represents that the character is the beginning of a keyword, or is a part except the beginning of the keyword, or does not belong to the keyword;
and obtaining at least one keyword in the hot spot text according to the classification result of each character in the first character sequence.
In an embodiment of the disclosure, when the obtaining unit is configured to obtain at least one keyword in the hotspot text according to a classification result of each character in the first character sequence, the obtaining unit is specifically configured to:
taking continuous characters in the first character sequence which meet the following conditions as keywords:
the first character is followed by at least one second character, wherein the classification result of the first character indicates the beginning of the keyword, and the classification result of the second character indicates the part of the keyword except the beginning.
In an embodiment of the disclosure, the extension unit is specifically configured to at least one of:
obtaining an expansion word of the keyword according to the synonym or the synonym of the keyword;
searching by using the keywords, and obtaining the expansion words of the keywords according to matching results;
and taking the words with the word vector similarity meeting the preset requirement as the expansion words of the keywords.
In an embodiment of the present disclosure, when the extension unit is configured to obtain at least one keyword combination according to an extension result of the at least one keyword, the extension unit is specifically configured to:
obtaining an expanded keyword set corresponding to the keyword according to the keyword and the expanded words of the keyword;
selecting a plurality of target keywords from an expanded keyword set corresponding to the at least one keyword, wherein the target keywords are the keywords or the expanded words, and the target keywords comprise at least one keyword;
and combining the plurality of target keywords to obtain the keyword combination.
In an embodiment of the present disclosure, the generating unit is specifically configured to:
acquiring a second character sequence corresponding to the keyword combination;
performing feature extraction on the second character sequence to obtain a second feature sequence of the second character sequence;
inputting the second sequence of features into a decoder, the decoder generating a prediction of at least one character;
and obtaining the candidate theme according to the prediction result.
In one embodiment of the present disclosure, the decoder determines a character to be generated corresponding to the t +1 th position in the candidate subject generated by the decoder by:
determining probability values of the characters to be generated corresponding to all characters in the preset word list according to the preset word list, the second characteristic sequence and a third characteristic sequence of t characters before the characters to be generated;
determining the character to be generated based on the maximum value in the probability values;
wherein t is a positive integer, and the third feature sequence is obtained by performing feature extraction on t characters before the character to be generated;
and, the 1 st character when t is 1 is determined according to the preset word list and the second feature sequence.
In an embodiment of the disclosure, when the generating unit is configured to obtain the candidate topic according to each keyword combination of the at least one keyword combination, the generating unit is specifically configured to:
obtaining the relevance score of the keyword combination and each topic in a preset topic library;
sorting the topics in the set topic library according to the relevance scores;
and obtaining the candidate theme according to the sorting result.
In an embodiment of the disclosure, the generating unit, when configured to obtain the correlation score between the keyword combination and each topic in the preset topic library, is specifically configured to:
determining a correlation score between each word and the theme according to the frequency of each word in the keyword combination in the theme, the length of the theme and the average length of all the themes in the preset theme library;
determining the weight of each word according to the number of the topics contained in the preset topic library and the occurrence frequency of each word in the keyword combination in all the topics of the preset topic library;
and obtaining the relevance score of the keyword combination and the theme according to the relevance score and the weight corresponding to each word in the keyword combination.
In an embodiment of the present disclosure, the recommending unit is specifically configured to:
filtering the candidate topics according to a sensitive word dictionary;
and recommending the topics according to the filtered candidate topics.
In an embodiment of the present disclosure, the recommending unit is specifically configured to:
searching one candidate topic in the at least one candidate topic by using a search engine to obtain the previous p search results; wherein p is a positive integer;
acquiring the time interval between the time corresponding to each search result in the p search results and the current time;
determining an average time interval according to the time intervals corresponding to the p search results;
discarding the candidate topic in response to the average time interval exceeding a preset time threshold.
In an embodiment of the present disclosure, the recommending unit is specifically configured to:
acquiring a fourth character sequence of one candidate theme in the at least one candidate theme;
performing feature extraction on the fourth character sequence to obtain a fourth feature sequence of the fourth character sequence;
obtaining sentence characteristics of the candidate subject according to the fourth characteristic sequence;
carrying out full connection processing on the sentence characteristics to obtain an attraction degree score of the candidate theme;
discarding the candidate topic in response to the attractiveness score being below a set score threshold.
In an embodiment of the present disclosure, the recommending unit is specifically configured to:
converting the fourth feature sequence into sentence features by carrying out any one of averaging, weighted averaging, taking the maximum and taking the minimum on the features of each dimension in the fourth feature sequence; or;
and converting the fourth feature sequence into sentence features by adopting an attention mechanism.
In an embodiment of the present disclosure, the subject apparatus further includes a selecting unit, configured to:
randomly acquiring q h k candidate topics from the candidate topics corresponding to the hot spot texts; wherein q, k are positive integers, the number of q k is less than the number of candidate topics;
and according to the attraction degree scores of the candidate topics, sequentially selecting k candidate topics from the q k candidate topics for recommendation, wherein the similarity between any two candidate topics in the selected candidate topics is smaller than a set similarity threshold value.
It should be noted that although several units/modules or sub-units/modules of the subject recommendation device are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
The embodiment of the disclosure also provides a computer readable storage medium. As shown in fig. 6, the storage medium stores a computer program 601 thereon, and the computer program 601 can execute the subject recommendation method according to any embodiment of the disclosure when executed by a processor.
The disclosed embodiments also provide a computing device, which may include a memory for storing computer instructions executable on a processor, and a processor for implementing the subject recommendation method of any of the disclosed embodiments when executing the computer instructions.
FIG. 7 illustrates one configuration of the computing device, and as shown in FIG. 7, the computing device 70 may include, but is not limited to: a processor 71, a memory 72, and a bus 73 that connects the various system components, including the memory 72 and the processor 71.
Wherein the memory 72 stores computer instructions executable by the processor 71 to enable the processor 71 to perform the subject recommendation method of any of the embodiments of the present disclosure. The memory 72 may include a random access memory unit RAM721, a cache memory unit 722, and/or a read only memory unit ROM 723. The memory 72 may also include: program tool 725 having a set of program modules 724, the program modules 724 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, one or more combinations of which may comprise an implementation of a network environment.
The bus 73 may include, for example, a data bus, an address bus, a control bus, and the like. The computing device 70 may also communicate with external devices 75 through the I/O interface 74, the external devices 75 may be, for example, a keyboard, a bluetooth device, etc. The computing device 70 may also communicate with one or more networks, such as a local area network, a wide area network, a public network, etc., through the network adapter 76. The network adapter 76 may also communicate with other modules of the computing device 70 via the bus 73, as shown in FIG. 7.
Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A method for recommending a subject, comprising:
acquiring at least one keyword in the hotspot text;
expanding each keyword in the at least one keyword to obtain an expanded result of each keyword;
obtaining at least one keyword combination according to the expansion result of the at least one keyword;
obtaining a candidate theme according to each keyword combination in the at least one keyword combination;
and recommending the theme according to at least one candidate theme generated by the at least one keyword combination.
2. The method of claim 1, wherein the obtaining at least one keyword in the hotspot text comprises:
acquiring a first character sequence corresponding to the hot spot text;
performing feature extraction on the first character sequence to obtain a first feature sequence of the first character sequence;
determining a classification result of each character in the first character sequence according to the first feature sequence, wherein the classification result represents that the character is the beginning of a keyword, or is a part except the beginning of the keyword, or does not belong to the keyword;
and obtaining at least one keyword in the hot spot text according to the classification result of each character in the first character sequence.
3. The method according to claim 2, wherein obtaining at least one keyword in the hot text according to the classification result of each character in the first character sequence comprises:
taking continuous characters in the first character sequence which meet the following conditions as keywords:
the first character is followed by at least one second character, wherein the classification result of the first character indicates the beginning of the keyword, and the classification result of the second character indicates the part of the keyword except the beginning.
4. The method of claim 1, wherein the expanding each keyword of the at least one keyword to obtain an expanded result of the each keyword comprises at least one of:
obtaining an expansion word of the keyword according to the synonym or the synonym of the keyword;
searching by using the keywords, and obtaining the expansion words of the keywords according to matching results;
and taking the words with the word vector similarity meeting the preset requirement as the expansion words of the keywords.
5. The method according to claim 4, wherein the deriving at least one keyword combination according to the expanded result of the at least one keyword comprises:
obtaining an expanded keyword set corresponding to the keyword according to the keyword and the expanded words of the keyword;
selecting a plurality of target keywords from an expanded keyword set corresponding to the at least one keyword, wherein the target keywords are the keywords or the expanded words, and the target keywords comprise at least one keyword;
and combining the plurality of target keywords to obtain the keyword combination.
6. The method according to any one of claims 1 to 5, wherein obtaining a candidate topic from each of the at least one keyword combination comprises:
acquiring a second character sequence corresponding to the keyword combination;
performing feature extraction on the second character sequence to obtain a second feature sequence of the second character sequence;
inputting the second sequence of features into a decoder, the decoder generating a prediction of at least one character;
and obtaining the candidate theme according to the prediction result.
7. The method of claim 6, wherein the decoder determines the character to be generated by determining that the character to be generated corresponds to the candidate topic generated by the decoder at the t +1 th position:
determining probability values of the characters to be generated corresponding to all characters in the preset word list according to the preset word list, the second characteristic sequence and a third characteristic sequence of t characters before the characters to be generated;
determining the character to be generated based on the maximum value in the probability values;
wherein t is a positive integer, and the third feature sequence is obtained by performing feature extraction on t characters before the character to be generated;
and, the 1 st character when t is 1 is determined according to the preset word list and the second feature sequence.
8. A topic recommendation apparatus, comprising:
the acquisition unit is used for acquiring at least one keyword in the hotspot text;
the expansion unit is used for expanding each keyword in the at least one keyword and obtaining at least one keyword combination according to the expansion result of the at least one keyword;
the generating unit is used for obtaining a candidate theme according to each keyword combination in the at least one keyword combination;
and the recommending unit is used for recommending the topics for the generated quality evaluation result of the candidate topics according to the at least one keyword.
9. A computing device, comprising:
a processor; and
a memory having stored thereon computer readable instructions which, when executed by the processor, implement the recommendation method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized by comprising a computer program which, when executed by a processor, implements the recommendation method according to any one of claims 1 to 7.
CN202111033672.9A 2021-09-03 2021-09-03 Theme recommendation method and device, computing equipment and storage medium Pending CN113836399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111033672.9A CN113836399A (en) 2021-09-03 2021-09-03 Theme recommendation method and device, computing equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111033672.9A CN113836399A (en) 2021-09-03 2021-09-03 Theme recommendation method and device, computing equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113836399A true CN113836399A (en) 2021-12-24

Family

ID=78962303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111033672.9A Pending CN113836399A (en) 2021-09-03 2021-09-03 Theme recommendation method and device, computing equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113836399A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115186050A (en) * 2022-09-08 2022-10-14 粤港澳大湾区数字经济研究院(福田) Method, system and related equipment for recommending selected questions based on natural language processing

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115186050A (en) * 2022-09-08 2022-10-14 粤港澳大湾区数字经济研究院(福田) Method, system and related equipment for recommending selected questions based on natural language processing
CN115186050B (en) * 2022-09-08 2023-01-10 粤港澳大湾区数字经济研究院(福田) Method, system and related equipment for recommending selected questions based on natural language processing

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
JP5257071B2 (en) Similarity calculation device and information retrieval device
CN105824959B (en) Public opinion monitoring method and system
US8051080B2 (en) Contextual ranking of keywords using click data
US8229949B2 (en) Apparatus, method and program product for presenting next search keyword
US8577882B2 (en) Method and system for searching multilingual documents
CN110717106B (en) Information pushing method and device
US8271502B2 (en) Presenting multiple document summarization with search results
US9390161B2 (en) Methods and systems for extracting keyphrases from natural text for search engine indexing
KR20160107187A (en) Coherent question answering in search results
WO2010014082A1 (en) Method and apparatus for relating datasets by using semantic vectors and keyword analyses
CN107679070B (en) Intelligent reading recommendation method and device and electronic equipment
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
AU2018250372B2 (en) Method to construct content based on a content repository
CN108038099B (en) Low-frequency keyword identification method based on word clustering
KR20200087977A (en) Multimodal ducument summary system and method
CN112257452A (en) Emotion recognition model training method, device, equipment and storage medium
CN114021577A (en) Content tag generation method and device, electronic equipment and storage medium
CN112989824A (en) Information pushing method and device, electronic equipment and storage medium
CN109634436B (en) Method, device, equipment and readable storage medium for associating input method
CN113836399A (en) Theme recommendation method and device, computing equipment and storage medium
CN110851560B (en) Information retrieval method, device and equipment
CN115618873A (en) Data processing method and device, computer equipment and storage medium
CN107609094B (en) Data disambiguation method and device and computer equipment
CN107818091B (en) Document processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination