CN115080867A - Recommendation method and device for proposal theme, computer equipment and storage medium - Google Patents
Recommendation method and device for proposal theme, computer equipment and storage medium Download PDFInfo
- Publication number
- CN115080867A CN115080867A CN202211013812.0A CN202211013812A CN115080867A CN 115080867 A CN115080867 A CN 115080867A CN 202211013812 A CN202211013812 A CN 202211013812A CN 115080867 A CN115080867 A CN 115080867A
- Authority
- CN
- China
- Prior art keywords
- word
- document
- news
- proposal
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application belongs to the field of computers, and relates to a recommendation method of a proposal theme, which comprises the following steps: acquiring news data and historical behavior data of a user; the news data includes news words; the historical behavioral data includes documents; according to the news data, the heat value of each news word is obtained by combining Bayesian transformation; according to the historical behavior data of the user, calculating the news reading similarity and the news content similarity between the proposal user and other users to obtain the final similarity, and calculating the interest of the user in each document to obtain an interested document; segmenting the interested document to obtain document words, and calculating the TF-IDF value of each document word; obtaining a vocabulary entry set according to the news words and the document words; and obtaining recommended words according to the heat value of each news word and the TF-IDF value of each document word, and finishing the recommendation of the proposal theme. By adopting the method, the proposal theme can be recommended, and the proposal quality is improved.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for recommending a proposal topic, a computer device, and a storage medium.
Background
The proposal is written opinions and suggestions to the meeting by the meeting department, group and participants.
In the prior art, when a participant submits a proposal, because proper data and data reference are not available, subjective propositions are often relied on.
However, subjective propositions do not reflect real needs and hot spots; the submitted proposal lacks support of relevant data and is not of high quality.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, a computer device and a storage medium for recommending a proposal topic, which can recommend a proposal topic and improve the quality of the proposal.
A recommendation method of a proposal topic, comprising: acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
according to the news data, the heat value of each news word is obtained by combining Bayesian transformation;
calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
In one embodiment, further comprising:
acquiring historical proposal data; the historical proposal data comprises a plurality of proposal documents;
performing word segmentation on the proposal document to obtain a plurality of proposal words, and calculating the TF-IDF value of each proposal word;
obtaining a vocabulary entry set according to the news words, the document words and the proposal words; and obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word.
In one embodiment, obtaining the popularity value of each news word according to the news data by combining bayesian transformation comprises:
calculating an initial heat value of the news word in a certain day according to the word frequency of the certain day and the word frequency of the total statistical days:
wherein the content of the first and second substances,representing news wordsIn the first placeiAn initial heat value of day;representing news wordsIn the first placeFrequency of occurrence of days;representing news wordsFrequency of occurrence in H days;representing a certain news word;is shown asiCounting days; h is total statistical days;
and correcting the initial heat value by using Bayesian transformation to obtain a corrected heat value of the news word in a certain day:
wherein the content of the first and second substances,representing news wordsIn the first placeiThe corrected heat value of each day, C is the average word frequency, j is the jth news word, I is the total number of the news words, and m is a priori average score;
obtaining a final heat value of the news word according to the corrected heat value of the news word in a certain day:
wherein the content of the first and second substances,representing news wordsThe final heat value of (a).
In one embodiment, the calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain a final similarity, and calculating the interest of the user in each document according to the final similarity to obtain the document of interest includes:
calculating the news reading similarity of the proposal user u and other users q:
in the formula, sim 1 Representing the news reading similarity of the proposal user u and other users q; s is the total number of documents contained in this data set,indicating the number of clicks of the ith document by the user u,representing the click times of the ith document by the user q;
calculating the similarity of news contents of the proposal user u and other users q:
in the formula (I), the compound is shown in the specification,representing the number of documents that user u and user q have viewed,andrespectively representing the number of documents of which the user u and the user q have generated historical behaviors;
and obtaining the final similarity of the proposal user u and other users q according to the news reading similarity and the news content similarity:
taking the former M users with the maximum final similarity as an M neighbor user set of the user u, respectively calculating the interest degrees between the proposed user u and the documents clicked by the M users, and the interest degree of the proposed user u to the document j:
wherein the content of the first and second substances,to propose the M neighbor user set of user u,for the final similarity of user u and user q,representing the number of clicks made by user q on the jth document,represents the popularity of document j;
and obtaining the interesting document of the user according to the interestingness.
In one embodiment, the segmenting the document of interest to obtain a plurality of document words, and calculating a TF-IDF value of each document word, the segmenting the proposal document to obtain a plurality of proposal words, and calculating the TF-IDF value of each proposal word includes:
calculating an initial TF value of the document word or proposal word:
in the formula (I), the compound is shown in the specification,is the total number of words of the document of interest or proposal document,represents the total number of times of the document word or proposal word d;
and (3) introducing a word frequency control model to optimize an initial TF value:
wherein the content of the first and second substances,the word frequency control coefficient is expressed by the word frequency control coefficient,for the total number of samples introduced,the average document length of the sample is represented,a TF value representing a document word or a proposal word;
calculating the IDF value of the document word or proposal word:
wherein the content of the first and second substances,idf d an IDF value representing a document word or a proposal word,for the number of documents of interest containing document terms or the number of proposal documents containing proposal terms,,representing the relevance of the document words to the document of interest or the relevance of the proposal words to the proposal document;andis a regulatory factor;
calculating the TF-IDF value of the document word or the proposal word d according to the TF value of the document word or the proposal word and the IDF value of the document word or the proposal word:
in the formula (I), the compound is shown in the specification,ididf d a TF-IDF value representing a document word or a proposal word.
In one embodiment, a vocabulary entry set is obtained according to news words, document words and proposal words; obtaining a recommendation value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word, wherein the recommendation value of each entry in the entry set comprises the following steps:
news words, document words and proposal words are entries which together form an entry set;
obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word:
in the formula (I), the compound is shown in the specification,a recommended value representing an entry of the word,a result of the normalization processing representing the heat value of the news word,the result of the normalization processing of the TF-IDF value representing the document word,indicates the result of the normalization processing of the TF-IDF value of the proposal word,representing news wordsdThe value of the heat of the gas (C),representation collectionAThe sum of the heat values of all the news words in the list,representing document wordsdThe TF-IDF value of (a),representation collectionBThe sum of the TF-IDF values of all document words in (a),to express a proposaldThe TF-IDF value of (a),representation collectionCThe sum of TF-IDF values of all the proposed words in (1).
In one embodiment, the news words are obtained after performing word segmentation, stop word processing and unknown word processing on news data.
A recommendation apparatus of a proposal topic, comprising:
the acquisition module is used for acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
the popularity module is used for obtaining the popularity value of each news word by combining Bayesian transformation according to the news data;
the word frequency module is used for calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
the recommendation module is used for obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and completing the recommendation of the proposal theme.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
according to the news data, the heat value of each news word is obtained by combining Bayesian transformation;
calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
according to the news data, the heat value of each news word is obtained by combining Bayesian transformation;
calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
According to the recommendation method, device, computer equipment and storage medium for the proposal theme, the heat value of each news word is obtained according to news data; obtaining an interested document and a plurality of document words according to the historical behavior data of the user, and further obtaining the TF-IDF value of each document word; on the basis, the recommendation words are obtained, and recommendation of the proposal subject is completed. The method and the device for recommending the proposal topic obtain the recommended words based on the news data and the historical behavior data of the user and complete the recommendation of the proposal topic, and can be based on the hot news and the historical data to improve the quality of the proposal.
Drawings
FIG. 1 is a diagram illustrating an exemplary implementation of a proposed topic recommendation method;
FIG. 2 is a flow diagram that illustrates a method for recommending a proposed topic, according to one embodiment;
FIG. 3 is a flowchart illustrating a method for recommending a proposed topic in another embodiment;
FIG. 4 is a flow diagram that illustrates the calculation of a heat value for a news word in one embodiment;
FIG. 5 is a flow diagram that illustrates the calculation of TF-IDF values for a document word, under an embodiment;
FIG. 6 is a flow diagram of computing TF-IDF values for a proposal word in one embodiment;
FIG. 7 is a block diagram showing the structure of a recommending apparatus for proposing a subject in one embodiment;
FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that all the directional indicators (such as up, down, left, right, front, and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.
In addition, the descriptions related to "first", "second", etc. in the present invention are only used for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the present invention, unless otherwise expressly specified or limited, the terms "connected," "secured," and the like are to be construed broadly, e.g., "secured" may be a fixed connection, a removable connection, or an integral part; the connection can be mechanical connection, electrical connection, physical connection or wireless communication connection; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope of the present invention.
The method provided by the application can be applied to the application environment shown in FIG. 1. The terminal 102 communicates with the server 104 through a network, the terminal 102 may include but is not limited to various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be a server corresponding to various portal websites and working system backgrounds.
As shown in fig. 2, the present application provides a method for recommending a proposed subject, which is described by taking the method as an example applied to the terminal in fig. 1, and in one embodiment, the method includes:
step 202, acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data includes a number of documents.
The news data is a data set established on the basis of news data of all the great-power WeChat news newspapers and periodicals all year round.
The historical behavior data of the users is a data set established on the basis of the historical behavior data (the number of clicks on news, browsing time) of each committee.
And 204, obtaining the heat value of each news word by combining Bayesian transformation according to the news data.
Specifically, the method comprises the following steps:
calculating an initial heat value of a news word on a certain day according to a word frequency of the certain day (word frequency of the day) and a word frequency of total statistical days (total word frequency), for example, a ratio of the two:
wherein the content of the first and second substances,representing entriesIn the first placeiAn initial heat value of day;representing entriesIn the first placeThe frequency of occurrence of the day, namely the word frequency of the day;representing entriesThe frequency of occurrence in H days, namely the total word frequency;representing a certain news word;is shown asiCounting days; h is the total number of statistical days.
And correcting the initial heat value by using Bayesian transformation to obtain a corrected heat value of the news word in a certain day:
wherein the content of the first and second substances,representing news wordsIn the first placeiThe corrected heat value of day, C is the average word frequency, j is the jth news word, I is the total number of news words, and m is a priori average score.
Obtaining a final heat value of the news word according to the corrected heat value of the news word in a certain day:
wherein the content of the first and second substances,representing news wordsThe final heat value of (a).
In this step, the news words are obtained after document preprocessing (including word segmentation, stop word processing, and unregistered word processing) is performed on the news data.
Word segmentation processing: the participle document is loaded through three modes of a result participle library (an accurate mode, namely, the text is accurately segmented, a full mode, namely, all possible words in the text are listed one by one, and a search engine mode, namely, the long word is segmented).
Stop words removal: establishing a word list of common stop words; comparing the stop word list with the word segmentation list; and deleting stop words in the word segmentation table.
Processing unknown words: and loading the hot words arranged in the empowerment news newspapers (which can be retrieved from the home pages of the websites) into the word segmentation word list.
Specifically, the method comprises the following steps:
calculating the news reading similarity of the proposal user u and other users q:
in the formula, sim 1 Representing the news reading similarity of the proposal user u and other users q; s is the total number of documents contained in this data set,indicating the number of clicks of user u on the ith document,representing the number of clicks of the ith document by the user q;
calculating the similarity of news contents of the proposal user u and other users q:
in the formula (I), the compound is shown in the specification,representing the number of documents that user u and user q have viewed,andrespectively representing the number of documents of which the user u and the user q have generated historical behaviors (including clicking and browsing);
according to the similarity of news reading and the similarity of news contents, i.e. mergingAndand obtaining the final similarity of the proposed user u and other users q:
wherein the content of the first and second substances,the weight factor is selected from 0-1 (for example, 0.5);
arranging all final similarities according to a descending order, taking the former M users (for example, the former 20) with the maximum final similarity as an M adjacent user set of the user u, and respectively calculating the interest degrees between the proposed user u and the documents clicked by the M users, for example, if the proposed user u and the M users click the document j, the calculation formula of the interest degree of the proposed user u to the document j is as follows:
whereinTo propose the M neighbor user set of user u,for the final similarity of user u and user q,representing the number of clicks made by user q on the jth document,and the popularity of the document j is represented, namely the ratio of the accumulated number of click users to the total number of users of the document j.
And obtaining the interesting document of the user according to the interestingness. Specifically, all the interestingness of the proposed user is arranged in the descending order, and the first N documents with the highest interestingness (the specific N value can be determined according to the actual situation) are taken as the documents of interest of the user.
Segmenting the interested document to obtain a plurality of document words, and calculating the initial TF value of the document words:
in the formula (I), the compound is shown in the specification,is the total number of words of the document of interest,represents the total number of times of the document word d;
and (3) introducing a word frequency control model to optimize the initial TF value:
wherein the content of the first and second substances,the expression word frequency control coefficient is an empirical value, the value is determined through a large number of experiments,for the total number of samples introduced (i.e. the total number of documents of interest),representing a sample average document length (document length of a document, i.e., the number of words of the document, average document length, i.e., the ratio of the total number of words of the document to the total number of documents, a sample, i.e., the document of interest, a sample average document length, i.e., the ratio of the total number of words of the document of interest to the total number of documents of interest),a TF value representing a document word;
and (3) introducing a word frequency-document correlation model to calculate the IDF value of the document word:
wherein the content of the first and second substances,idf d the IDF value representing the word of the document,for the number of documents of interest containing a document word,,representing the relevance of the document words to the document of interest;andis a factor for the regulation of the flow rate,;
and calculating the TF-IDF value of the document word d according to the TF value of the document word and the IDF value of the document word:
in the formula (I), the compound is shown in the specification,ididf d the TF-IDF value representing the document word.
The news words and the document words are entries and form an entry set together;
need to explain: after the popularity value of each news word is obtained, all the news words and the popularity values thereof are stored according to a format of 'news word-popularity value', and a set A is formed together; after the TF-IDF value of each document word is obtained, all the document words and the TF-IDF values thereof are stored according to the format of the document words-TF-IDF values, and a set B is formed jointly.
Obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word:
in the formula (I), the compound is shown in the specification,a recommended value representing an entry of the word,a result of the normalization processing representing the heat value of the news word,the result of the normalization processing of the TF-IDF value representing the document word,representing news wordsdIn the collectionAThe value of the heat in (1) is,representation collectionAThe sum of the heat values of all the news words in the list,representing document wordsdIn the collectionBThe TF-IDF value of (a),representation collectionBThe sum of the TF-IDF values of all document words in (a).
Need to explain: if News word d does not appear in set A, then(ii) a If the document word d does not appear in the set B, then。
Proposing user recommended value in termsAnd selecting Y entries from the maximum first X entries as a proposal theme. The specific X and Y values may be determined as practical, for example, X =15, Y = 3.
in the formula (I), the compound is shown in the specification,m 1 andm 2 different values can be taken according to the emphasis of the user on the news data and the historical behavior data for adjusting the factors, but the recommendation of the proposal theme can be completed no matter how many values are taken by the adjusting factors, and the quality of the proposal theme can be further improved.
According to the recommendation method, device, computer equipment and storage medium for the proposal theme, the heat value of each news word is obtained according to news data; obtaining an interested document and a plurality of document words according to the historical behavior data of the user, and further obtaining the TF-IDF value of each document word; on the basis, the recommendation words are obtained, and recommendation of the proposal subject is completed.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, further comprising:
acquiring historical proposal data; the historical proposal data comprises a plurality of proposal documents;
performing word segmentation on the proposal document to obtain a plurality of proposal words, and calculating the TF-IDF value of each proposal word;
the set of entries also includes a proposal, namely: obtaining a vocabulary entry set according to the news words, the document words and the proposal words; and obtaining the recommended value of each entry in the entry set according to the TF-IDF value of each proposed word, namely: and obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word.
Specifically, the method comprises the following steps:
the historical proposal data refers to data of proposals that have already been submitted.
Segmenting the proposal document to obtain a plurality of proposal words, and calculating the initial TF value of the proposal words:
in the formula (I), the compound is shown in the specification,is the total number of words of the proposal document,represents the total number of times of proposal word d;
and (3) introducing a word frequency control model to optimize the initial TF value:
wherein the content of the first and second substances,the expression word frequency control coefficient is an empirical value, the value is determined through a large number of experiments,for the total number of samples introduced (i.e. the total number of proposal documents),indicating a sample average document length (document length of a certain document, i.e. the number of words of the document, average document length, i.e. the ratio of the total number of words of the document to the total number of documents, a sample, i.e. a proposal document, sample average document length, i.e. the ratio of the total number of words of the proposal document to the total number of proposal documents),a TF value representing a proposal word;
and (3) introducing a word frequency-document correlation model to calculate the IDF value of the proposal word:
wherein the content of the first and second substances,idf d the IDF value representing the proposed word,for the number of proposal documents containing a proposal word,,representing the relevance of the proposal words to the proposal document;andis the adjustment factor for the number of cells in the cell,;
calculating the TF-IDF value of the proposal d according to the TF value of the proposal and the IDF value of the proposal:
in the formula (I), the compound is shown in the specification,ididf d the TF-IDF value representing the proposed word.
News words, document words and proposal words are entries which together form an entry set;
the following description is required: after the popularity value of each news word is obtained, all the news words and the popularity values thereof are stored according to a format of 'news word-popularity value', and a set A is formed together; after obtaining the TF-IDF value of each document word, storing all the document words and the TF-IDF values thereof according to the format of the document words-TF-IDF values to form a set B together; after the TF-IDF value of each proposal word is obtained, all the proposal words and the TF-IDF values thereof are stored according to the format of the proposal words-TF-IDF values, and together form a set C.
Obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word:
in the formula (I), the compound is shown in the specification,a recommended value representing an entry of the word,a result of the normalization processing representing the heat value of the news word,indicates the result of the normalization processing of the TF-IDF values of the document words,indicates the result of normalization processing of the TF-IDF value of the proposal word,representing news wordsdIn the collectionAThe value of the heat in (1) is,representation collectionAThe sum of the heat values of all the news words in the list,representing document wordsdIn the collectionBThe TF-IDF value of (a),representation collectionBThe sum of the TF-IDF values of all document words in (a),to express a proposaldIn the collectionCThe TF-IDF value of (a),representation collectionCThe sum of the TF-IDF values of all the proposed words in (1).
Need to explain: if News word d does not appear in set A, then(ii) a If the document word d does not appear in the set B, then(ii) a If News word d does not appear in set C, then。
Proposing a user's recommended value in termsOne to two entries are selected from the first ten entries with the largest size as a proposal subject.
The method adopts a keyword extraction method combined with Bayesian transformation to extract hot vocabulary entries throughout the year, considers the popularity score of each vocabulary entry every day, and does not simply depend on the word frequency score to obtain the popularity value of a news word; selecting news most relevant to daily work of the proposal users by taking historical behavior data of all the proposal users as input, extracting key word entries of the news, taking correlation calculation between the key words and the documents into consideration during extraction to obtain document words, and obtaining TFIDF values of the document words by adopting an improved TFIDF method; screening out the keyword bars in the proposal library by adopting an improved TFIDF method to obtain the TF-IDF value of the proposal words; and finally, taking the first two data as positive factors and the third data as negative factors, comprehensively weighting, and selecting one to two entries from the first ten entries with the maximum total heat values of the entries by a proposal user as a proposal theme.
The method integrates the vocabulary entry heat value and the vocabulary entry-document correlation characteristics, provides a new keyword extraction method combined with Bayesian transformation and an improved TFIDF method, and screens out the proposed similar proposals for effectively avoiding the occurrence of repeated proposals or similar proposals, thereby achieving the effect of improving the proposal quality.
In one embodiment, as shown in fig. 7, there is provided a recommendation apparatus for a proposal topic, including: an obtaining module 702, a popularity module 704, a word frequency module 706, and a recommending module 708, wherein:
an obtaining module 702, configured to obtain news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
the popularity module 704 is used for obtaining a popularity value of each news word according to the news data by combining Bayesian transformation;
the word frequency module 706 is configured to calculate, according to the historical behavior data of the user, a news reading similarity and a news content similarity between the proposal user and another user to obtain a final similarity, and calculate, according to the final similarity, an interest degree of the user in each document to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
a recommending module 708, configured to obtain a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
For the specific definition of the recommendation device for the proposal subject, reference may be made to the above definition of the recommendation method for the proposal subject, which is not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a recommendation method for a proposed topic. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method in the above embodiments when the processor executes the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method in the above-mentioned embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus (Rambus) direct RAM (RDRAM), direct bused dynamic RAM (DRDRAM), and bused dynamic RAM (RDRAM).
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A recommendation method for a proposal topic, comprising:
acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
according to the news data, the heat value of each news word is obtained by combining Bayesian transformation;
calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
2. The method of claim 1, further comprising:
acquiring historical proposal data; the historical proposal data comprises a plurality of proposal documents;
performing word segmentation on the proposal document to obtain a plurality of proposal words, and calculating the TF-IDF value of each proposal word;
obtaining a vocabulary entry set according to news words, document words and proposal words; and obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word.
3. The method of claim 2, wherein obtaining a popularity value for each news word from the news data in conjunction with a bayesian transformation comprises:
calculating an initial heat value of the news word in a certain day according to the word frequency of the certain day and the word frequency of the total statistical days:
wherein the content of the first and second substances,representing news wordsIn the first placeiAn initial heat value of day;representing news wordsIn the first placeiFrequency of occurrence of days;representing news wordsFrequency of occurrence in H days;representing a certain news word;is shown asiCounting days; h is total statistical days;
and correcting the initial heat value by using Bayesian transformation to obtain a corrected heat value of the news word in a certain day:
wherein the content of the first and second substances,representing news wordsIn the first placeiThe corrected heat value of day, C is the average word frequency, j is the jth news word, I is the total number of the news words, and m is a priori average score;
obtaining a final heat value of the news word according to the corrected heat value of the news word in a certain day:
4. The method of claim 3, wherein calculating news reading similarity and news content similarity between the proposed user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the user interest degree of each document according to the final similarity to obtain the interested document comprises:
calculating the news reading similarity of the proposal user u and other users q:
in the formula, sim 1 Representing the news reading similarity of the proposal user u and other users q; s is the total number of documents contained in the data set,indicating the number of clicks of user u on the ith document,representing the click times of the ith document by the user q;
calculating the similarity of news contents of the proposal user u and other users q:
in the formula (I), the compound is shown in the specification,representing the number of documents that user u and user q have viewed,andrespectively representing the number of documents of which the user u and the user q have generated historical behaviors;
and obtaining the final similarity of the proposal user u and other users q according to the news reading similarity and the news content similarity:
taking the former M users with the maximum final similarity as an M neighbor user set of the user u, respectively calculating the interest degrees between the proposed user u and the documents clicked by the M users, and the interest degree of the proposed user u to the document j:
wherein, the first and the second end of the pipe are connected with each other,to propose the M neighbor user set of user u,for the final similarity of user u and user q,representing the number of clicks made by user q on the jth document,represents the popularity of document j;
and obtaining the interesting document of the user according to the interestingness.
5. The method of any of claims 2 to 4, wherein tokenizing the document of interest to obtain a plurality of document terms and calculating a TF-IDF value for each document term, tokenizing the proposal document to obtain a plurality of proposals and calculating a TF-IDF value for each proposal comprises:
calculating an initial TF value for the document word or proposal word:
in the formula (I), the compound is shown in the specification,is the total number of words of the document of interest or proposal document,representing the total number of document words or proposal words d;
and (3) introducing a word frequency control model to optimize an initial TF value:
wherein the content of the first and second substances,the word frequency control coefficient is represented by a word frequency control coefficient,for the total number of samples introduced,the average document length of the sample is represented,a TF value representing a document word or a proposal word;
calculating the IDF value of the document word or proposal word:
wherein the content of the first and second substances,idf d an IDF value representing a document word or a proposal word,for the number of documents of interest containing document terms or the number of proposal documents containing proposal terms,,representing the relevance of the document words to the document of interest or the relevance of the proposal words to the proposal document;andis a regulatory factor;
calculating the TF-IDF value of the document word or the proposal word d according to the TF value of the document word or the proposal word and the IDF value of the document word or the proposal word:
in the formula (I), the compound is shown in the specification,ididf d a TF-IDF value representing a document word or a proposal word.
6. The method according to any one of claims 2 to 4, wherein a set of terms is obtained from news words, document words, and proposal words; obtaining a recommendation value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word, wherein the recommendation value of each entry in the entry set comprises the following steps:
news words, document words and proposal words are entries which together form an entry set;
obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word:
in the formula (I), the compound is shown in the specification,a recommended value representing an entry of the word,a result of the normalization processing representing the heat value of the news word,the result of the normalization processing of the TF-IDF value representing the document word,indicates the result of the normalization processing of the TF-IDF value of the proposal word,representing news wordsdThe value of the heat of the gas (C),representation collectionAThe sum of the heat values of all the news words in the list,representing document wordsdThe TF-IDF value of (a),representation collectionBThe sum of the TF-IDF values of all document words in (a),to express a proposaldThe TF-IDF value of (a),representation collectionCThe sum of the TF-IDF values of all the proposed words in (1).
7. The method according to any one of claims 2 to 4, wherein the news words are obtained after performing word segmentation, stop word processing and unknown word processing on news data.
8. A recommendation device for a proposed topic, comprising:
the acquisition module is used for acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
the popularity module is used for obtaining the popularity value of each news word by combining Bayesian transformation according to the news data;
the word frequency module is used for calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
the recommendation module is used for obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211013812.0A CN115080867B (en) | 2022-08-23 | 2022-08-23 | Recommendation method and device for proposal theme, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211013812.0A CN115080867B (en) | 2022-08-23 | 2022-08-23 | Recommendation method and device for proposal theme, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115080867A true CN115080867A (en) | 2022-09-20 |
CN115080867B CN115080867B (en) | 2022-11-15 |
Family
ID=83245454
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211013812.0A Active CN115080867B (en) | 2022-08-23 | 2022-08-23 | Recommendation method and device for proposal theme, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115080867B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115827877A (en) * | 2023-02-07 | 2023-03-21 | 湖南正宇软件技术开发有限公司 | Proposal auxiliary combination method, device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160269344A1 (en) * | 2015-03-13 | 2016-09-15 | International Business Machines Corporation | Recommending hashtags to be used in composed message to increase propagation speed and enhance desired sentiment of composed message |
CN109271574A (en) * | 2018-08-28 | 2019-01-25 | 麒麟合盛网络技术股份有限公司 | A kind of hot word recommended method and device |
CN110188265A (en) * | 2019-04-26 | 2019-08-30 | 中国科学院计算技术研究所 | A kind of network public-opinion focus recommendation method and system of fusion user portrait |
CN110334202A (en) * | 2019-03-28 | 2019-10-15 | 平安科技(深圳)有限公司 | User interest label construction method and relevant device based on news application software |
US20200074475A1 (en) * | 2018-08-30 | 2020-03-05 | Dariusz Zabrzenski | Intelligent system enabling automated scenario-based responses in customer service |
-
2022
- 2022-08-23 CN CN202211013812.0A patent/CN115080867B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160269344A1 (en) * | 2015-03-13 | 2016-09-15 | International Business Machines Corporation | Recommending hashtags to be used in composed message to increase propagation speed and enhance desired sentiment of composed message |
CN109271574A (en) * | 2018-08-28 | 2019-01-25 | 麒麟合盛网络技术股份有限公司 | A kind of hot word recommended method and device |
US20200074475A1 (en) * | 2018-08-30 | 2020-03-05 | Dariusz Zabrzenski | Intelligent system enabling automated scenario-based responses in customer service |
CN110334202A (en) * | 2019-03-28 | 2019-10-15 | 平安科技(深圳)有限公司 | User interest label construction method and relevant device based on news application software |
CN110188265A (en) * | 2019-04-26 | 2019-08-30 | 中国科学院计算技术研究所 | A kind of network public-opinion focus recommendation method and system of fusion user portrait |
Non-Patent Citations (2)
Title |
---|
张舒雅等: "基于Spark和改进的TF-IDF算法的用户特征分析", 《软件工程》 * |
鲁燃: "融合人工蜂群的微博话题推荐算法", 《山西大学学报(自然科学版)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115827877A (en) * | 2023-02-07 | 2023-03-21 | 湖南正宇软件技术开发有限公司 | Proposal auxiliary combination method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115080867B (en) | 2022-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109583620B (en) | Enterprise potential risk early warning method, enterprise potential risk early warning device, computer equipment and storage medium | |
CN109740152B (en) | Text category determination method and device, storage medium and computer equipment | |
Zamani et al. | Situational context for ranking in personal search | |
US8612435B2 (en) | Activity based users' interests modeling for determining content relevance | |
CN109885773B (en) | Personalized article recommendation method, system, medium and equipment | |
CN109766438A (en) | Biographic information extracting method, device, computer equipment and storage medium | |
US8370286B2 (en) | System for personalized term expansion and recommendation | |
CN112988980B (en) | Target product query method and device, computer equipment and storage medium | |
CN112559895B (en) | Data processing method and device, electronic equipment and storage medium | |
CN110363580B (en) | Information recommendation method and device, computer equipment and storage medium | |
CN112434216B (en) | Intelligent recommendation method and device for investment projects, storage medium and computer equipment | |
CN112560444A (en) | Text processing method and device, computer equipment and storage medium | |
CN110705489B (en) | Training method and device for target recognition network, computer equipment and storage medium | |
CN115080867B (en) | Recommendation method and device for proposal theme, computer equipment and storage medium | |
CN112434158B (en) | Enterprise tag acquisition method, enterprise tag acquisition device, storage medium and computer equipment | |
CN109801101A (en) | Label determines method, apparatus, computer equipment and storage medium | |
CN110162689B (en) | Information pushing method, device, computer equipment and storage medium | |
CN110389963A (en) | The recognition methods of channel effect, device, equipment and storage medium based on big data | |
CN113961823A (en) | News recommendation method, system, storage medium and equipment | |
CN115827877B (en) | Proposal-assisted case merging method, device, computer equipment and storage medium | |
CN111597480A (en) | Webpage resource preloading method and device, computer equipment and storage medium | |
CN114491296B (en) | Proposal affiliate recommendation method, system, computer device and readable storage medium | |
CN110377819A (en) | Arbitrator's recommended method, device and computer equipment based on big data | |
CN113190658B (en) | Method and device for accurately extracting proposal hotspot, computer equipment and storage medium | |
CN110826921B (en) | Data processing method, data processing device, computer readable storage medium and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |