CN115080867A - Recommendation method and device for proposal theme, computer equipment and storage medium - Google Patents

Recommendation method and device for proposal theme, computer equipment and storage medium Download PDF

Info

Publication number
CN115080867A
CN115080867A CN202211013812.0A CN202211013812A CN115080867A CN 115080867 A CN115080867 A CN 115080867A CN 202211013812 A CN202211013812 A CN 202211013812A CN 115080867 A CN115080867 A CN 115080867A
Authority
CN
China
Prior art keywords
word
document
news
proposal
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211013812.0A
Other languages
Chinese (zh)
Other versions
CN115080867B (en
Inventor
刘跃华
王新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Zhengyu Software Technology Development Co ltd
Original Assignee
Hunan Zhengyu Software Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Zhengyu Software Technology Development Co ltd filed Critical Hunan Zhengyu Software Technology Development Co ltd
Priority to CN202211013812.0A priority Critical patent/CN115080867B/en
Publication of CN115080867A publication Critical patent/CN115080867A/en
Application granted granted Critical
Publication of CN115080867B publication Critical patent/CN115080867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application belongs to the field of computers, and relates to a recommendation method of a proposal theme, which comprises the following steps: acquiring news data and historical behavior data of a user; the news data includes news words; the historical behavioral data includes documents; according to the news data, the heat value of each news word is obtained by combining Bayesian transformation; according to the historical behavior data of the user, calculating the news reading similarity and the news content similarity between the proposal user and other users to obtain the final similarity, and calculating the interest of the user in each document to obtain an interested document; segmenting the interested document to obtain document words, and calculating the TF-IDF value of each document word; obtaining a vocabulary entry set according to the news words and the document words; and obtaining recommended words according to the heat value of each news word and the TF-IDF value of each document word, and finishing the recommendation of the proposal theme. By adopting the method, the proposal theme can be recommended, and the proposal quality is improved.

Description

Recommendation method and device for proposal theme, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for recommending a proposal topic, a computer device, and a storage medium.
Background
The proposal is written opinions and suggestions to the meeting by the meeting department, group and participants.
In the prior art, when a participant submits a proposal, because proper data and data reference are not available, subjective propositions are often relied on.
However, subjective propositions do not reflect real needs and hot spots; the submitted proposal lacks support of relevant data and is not of high quality.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, a computer device and a storage medium for recommending a proposal topic, which can recommend a proposal topic and improve the quality of the proposal.
A recommendation method of a proposal topic, comprising: acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
according to the news data, the heat value of each news word is obtained by combining Bayesian transformation;
calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
In one embodiment, further comprising:
acquiring historical proposal data; the historical proposal data comprises a plurality of proposal documents;
performing word segmentation on the proposal document to obtain a plurality of proposal words, and calculating the TF-IDF value of each proposal word;
obtaining a vocabulary entry set according to the news words, the document words and the proposal words; and obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word.
In one embodiment, obtaining the popularity value of each news word according to the news data by combining bayesian transformation comprises:
calculating an initial heat value of the news word in a certain day according to the word frequency of the certain day and the word frequency of the total statistical days:
Figure 140519DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 964119DEST_PATH_IMAGE002
representing news words
Figure 742588DEST_PATH_IMAGE003
In the first placeiAn initial heat value of day;
Figure 108978DEST_PATH_IMAGE004
representing news words
Figure 291698DEST_PATH_IMAGE005
In the first place
Figure 494272DEST_PATH_IMAGE006
Frequency of occurrence of days;
Figure 420639DEST_PATH_IMAGE007
representing news words
Figure 375957DEST_PATH_IMAGE003
Frequency of occurrence in H days;
Figure 995157DEST_PATH_IMAGE008
representing a certain news word;
Figure 917983DEST_PATH_IMAGE009
is shown asiCounting days; h is total statistical days;
and correcting the initial heat value by using Bayesian transformation to obtain a corrected heat value of the news word in a certain day:
Figure 913620DEST_PATH_IMAGE010
Figure 723445DEST_PATH_IMAGE011
Figure 405224DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 425133DEST_PATH_IMAGE013
representing news words
Figure 693303DEST_PATH_IMAGE014
In the first placeiThe corrected heat value of each day, C is the average word frequency, j is the jth news word, I is the total number of the news words, and m is a priori average score;
obtaining a final heat value of the news word according to the corrected heat value of the news word in a certain day:
Figure 623213DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 584216DEST_PATH_IMAGE016
representing news words
Figure 216054DEST_PATH_IMAGE017
The final heat value of (a).
In one embodiment, the calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain a final similarity, and calculating the interest of the user in each document according to the final similarity to obtain the document of interest includes:
calculating the news reading similarity of the proposal user u and other users q:
Figure 287915DEST_PATH_IMAGE018
in the formula, sim 1 Representing the news reading similarity of the proposal user u and other users q; s is the total number of documents contained in this data set,
Figure 337911DEST_PATH_IMAGE019
indicating the number of clicks of the ith document by the user u,
Figure 95914DEST_PATH_IMAGE020
representing the click times of the ith document by the user q;
calculating the similarity of news contents of the proposal user u and other users q:
Figure 355994DEST_PATH_IMAGE021
in the formula (I), the compound is shown in the specification,
Figure 841333DEST_PATH_IMAGE022
representing the number of documents that user u and user q have viewed,
Figure 870469DEST_PATH_IMAGE023
and
Figure 32328DEST_PATH_IMAGE024
respectively representing the number of documents of which the user u and the user q have generated historical behaviors;
and obtaining the final similarity of the proposal user u and other users q according to the news reading similarity and the news content similarity:
Figure 779705DEST_PATH_IMAGE025
wherein the content of the first and second substances,
Figure 68735DEST_PATH_IMAGE026
is a weight factor;
taking the former M users with the maximum final similarity as an M neighbor user set of the user u, respectively calculating the interest degrees between the proposed user u and the documents clicked by the M users, and the interest degree of the proposed user u to the document j:
Figure 686798DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 160504DEST_PATH_IMAGE028
to propose the M neighbor user set of user u,
Figure 755696DEST_PATH_IMAGE029
for the final similarity of user u and user q,
Figure 707472DEST_PATH_IMAGE030
representing the number of clicks made by user q on the jth document,
Figure 586566DEST_PATH_IMAGE031
represents the popularity of document j;
and obtaining the interesting document of the user according to the interestingness.
In one embodiment, the segmenting the document of interest to obtain a plurality of document words, and calculating a TF-IDF value of each document word, the segmenting the proposal document to obtain a plurality of proposal words, and calculating the TF-IDF value of each proposal word includes:
calculating an initial TF value of the document word or proposal word:
Figure 355808DEST_PATH_IMAGE032
in the formula (I), the compound is shown in the specification,
Figure 546618DEST_PATH_IMAGE033
is the total number of words of the document of interest or proposal document,
Figure 443029DEST_PATH_IMAGE034
represents the total number of times of the document word or proposal word d;
and (3) introducing a word frequency control model to optimize an initial TF value:
Figure 301264DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure 742872DEST_PATH_IMAGE036
the word frequency control coefficient is expressed by the word frequency control coefficient,
Figure 686557DEST_PATH_IMAGE037
for the total number of samples introduced,
Figure 855501DEST_PATH_IMAGE038
the average document length of the sample is represented,
Figure 302663DEST_PATH_IMAGE039
a TF value representing a document word or a proposal word;
calculating the IDF value of the document word or proposal word:
Figure 679287DEST_PATH_IMAGE040
Figure 110268DEST_PATH_IMAGE041
wherein the content of the first and second substances,idf d an IDF value representing a document word or a proposal word,
Figure 82903DEST_PATH_IMAGE042
for the number of documents of interest containing document terms or the number of proposal documents containing proposal terms,
Figure 384572DEST_PATH_IMAGE043
Figure 541884DEST_PATH_IMAGE044
representing the relevance of the document words to the document of interest or the relevance of the proposal words to the proposal document;
Figure 86260DEST_PATH_IMAGE045
and
Figure 721640DEST_PATH_IMAGE046
is a regulatory factor;
calculating the TF-IDF value of the document word or the proposal word d according to the TF value of the document word or the proposal word and the IDF value of the document word or the proposal word:
Figure 18761DEST_PATH_IMAGE047
in the formula (I), the compound is shown in the specification,ididf d a TF-IDF value representing a document word or a proposal word.
In one embodiment, a vocabulary entry set is obtained according to news words, document words and proposal words; obtaining a recommendation value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word, wherein the recommendation value of each entry in the entry set comprises the following steps:
news words, document words and proposal words are entries which together form an entry set;
obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word:
Figure 346974DEST_PATH_IMAGE048
Figure 611602DEST_PATH_IMAGE049
Figure 316253DEST_PATH_IMAGE050
Figure 592513DEST_PATH_IMAGE051
in the formula (I), the compound is shown in the specification,
Figure 966994DEST_PATH_IMAGE052
a recommended value representing an entry of the word,
Figure 594284DEST_PATH_IMAGE053
a result of the normalization processing representing the heat value of the news word,
Figure 209285DEST_PATH_IMAGE054
the result of the normalization processing of the TF-IDF value representing the document word,
Figure 605631DEST_PATH_IMAGE055
indicates the result of the normalization processing of the TF-IDF value of the proposal word,
Figure 151013DEST_PATH_IMAGE056
representing news wordsdThe value of the heat of the gas (C),
Figure 265600DEST_PATH_IMAGE057
representation collectionAThe sum of the heat values of all the news words in the list,
Figure 171108DEST_PATH_IMAGE058
representing document wordsdThe TF-IDF value of (a),
Figure 421961DEST_PATH_IMAGE059
representation collectionBThe sum of the TF-IDF values of all document words in (a),
Figure 262878DEST_PATH_IMAGE060
to express a proposaldThe TF-IDF value of (a),
Figure 740126DEST_PATH_IMAGE061
representation collectionCThe sum of TF-IDF values of all the proposed words in (1).
In one embodiment, the news words are obtained after performing word segmentation, stop word processing and unknown word processing on news data.
A recommendation apparatus of a proposal topic, comprising:
the acquisition module is used for acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
the popularity module is used for obtaining the popularity value of each news word by combining Bayesian transformation according to the news data;
the word frequency module is used for calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
the recommendation module is used for obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and completing the recommendation of the proposal theme.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
according to the news data, the heat value of each news word is obtained by combining Bayesian transformation;
calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
according to the news data, the heat value of each news word is obtained by combining Bayesian transformation;
calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
According to the recommendation method, device, computer equipment and storage medium for the proposal theme, the heat value of each news word is obtained according to news data; obtaining an interested document and a plurality of document words according to the historical behavior data of the user, and further obtaining the TF-IDF value of each document word; on the basis, the recommendation words are obtained, and recommendation of the proposal subject is completed. The method and the device for recommending the proposal topic obtain the recommended words based on the news data and the historical behavior data of the user and complete the recommendation of the proposal topic, and can be based on the hot news and the historical data to improve the quality of the proposal.
Drawings
FIG. 1 is a diagram illustrating an exemplary implementation of a proposed topic recommendation method;
FIG. 2 is a flow diagram that illustrates a method for recommending a proposed topic, according to one embodiment;
FIG. 3 is a flowchart illustrating a method for recommending a proposed topic in another embodiment;
FIG. 4 is a flow diagram that illustrates the calculation of a heat value for a news word in one embodiment;
FIG. 5 is a flow diagram that illustrates the calculation of TF-IDF values for a document word, under an embodiment;
FIG. 6 is a flow diagram of computing TF-IDF values for a proposal word in one embodiment;
FIG. 7 is a block diagram showing the structure of a recommending apparatus for proposing a subject in one embodiment;
FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that all the directional indicators (such as up, down, left, right, front, and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.
In addition, the descriptions related to "first", "second", etc. in the present invention are only used for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the present invention, unless otherwise expressly specified or limited, the terms "connected," "secured," and the like are to be construed broadly, e.g., "secured" may be a fixed connection, a removable connection, or an integral part; the connection can be mechanical connection, electrical connection, physical connection or wireless communication connection; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should not be considered to exist, and is not within the protection scope of the present invention.
The method provided by the application can be applied to the application environment shown in FIG. 1. The terminal 102 communicates with the server 104 through a network, the terminal 102 may include but is not limited to various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be a server corresponding to various portal websites and working system backgrounds.
As shown in fig. 2, the present application provides a method for recommending a proposed subject, which is described by taking the method as an example applied to the terminal in fig. 1, and in one embodiment, the method includes:
step 202, acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data includes a number of documents.
The news data is a data set established on the basis of news data of all the great-power WeChat news newspapers and periodicals all year round.
The historical behavior data of the users is a data set established on the basis of the historical behavior data (the number of clicks on news, browsing time) of each committee.
And 204, obtaining the heat value of each news word by combining Bayesian transformation according to the news data.
Specifically, the method comprises the following steps:
calculating an initial heat value of a news word on a certain day according to a word frequency of the certain day (word frequency of the day) and a word frequency of total statistical days (total word frequency), for example, a ratio of the two:
Figure 59112DEST_PATH_IMAGE062
wherein the content of the first and second substances,
Figure 56149DEST_PATH_IMAGE063
representing entries
Figure 67968DEST_PATH_IMAGE064
In the first placeiAn initial heat value of day;
Figure 625988DEST_PATH_IMAGE065
representing entries
Figure 889610DEST_PATH_IMAGE066
In the first place
Figure 115055DEST_PATH_IMAGE067
The frequency of occurrence of the day, namely the word frequency of the day;
Figure 422409DEST_PATH_IMAGE068
representing entries
Figure 998884DEST_PATH_IMAGE066
The frequency of occurrence in H days, namely the total word frequency;
Figure 535038DEST_PATH_IMAGE066
representing a certain news word;
Figure 614990DEST_PATH_IMAGE069
is shown asiCounting days; h is the total number of statistical days.
And correcting the initial heat value by using Bayesian transformation to obtain a corrected heat value of the news word in a certain day:
Figure 860289DEST_PATH_IMAGE070
Figure 392901DEST_PATH_IMAGE071
Figure 857380DEST_PATH_IMAGE072
wherein the content of the first and second substances,
Figure 932784DEST_PATH_IMAGE073
representing news words
Figure 722885DEST_PATH_IMAGE064
In the first placeiThe corrected heat value of day, C is the average word frequency, j is the jth news word, I is the total number of news words, and m is a priori average score.
Obtaining a final heat value of the news word according to the corrected heat value of the news word in a certain day:
Figure 601848DEST_PATH_IMAGE074
wherein the content of the first and second substances,
Figure 135598DEST_PATH_IMAGE075
representing news words
Figure 331087DEST_PATH_IMAGE076
The final heat value of (a).
In this step, the news words are obtained after document preprocessing (including word segmentation, stop word processing, and unregistered word processing) is performed on the news data.
Word segmentation processing: the participle document is loaded through three modes of a result participle library (an accurate mode, namely, the text is accurately segmented, a full mode, namely, all possible words in the text are listed one by one, and a search engine mode, namely, the long word is segmented).
Stop words removal: establishing a word list of common stop words; comparing the stop word list with the word segmentation list; and deleting stop words in the word segmentation table.
Processing unknown words: and loading the hot words arranged in the empowerment news newspapers (which can be retrieved from the home pages of the websites) into the word segmentation word list.
Step 206, calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain a final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; and segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word.
Specifically, the method comprises the following steps:
calculating the news reading similarity of the proposal user u and other users q:
Figure 26511DEST_PATH_IMAGE077
in the formula, sim 1 Representing the news reading similarity of the proposal user u and other users q; s is the total number of documents contained in this data set,
Figure 425393DEST_PATH_IMAGE078
indicating the number of clicks of user u on the ith document,
Figure 497254DEST_PATH_IMAGE079
representing the number of clicks of the ith document by the user q;
calculating the similarity of news contents of the proposal user u and other users q:
Figure 16092DEST_PATH_IMAGE080
in the formula (I), the compound is shown in the specification,
Figure 147996DEST_PATH_IMAGE081
representing the number of documents that user u and user q have viewed,
Figure 267130DEST_PATH_IMAGE082
and
Figure 877103DEST_PATH_IMAGE083
respectively representing the number of documents of which the user u and the user q have generated historical behaviors (including clicking and browsing);
according to the similarity of news reading and the similarity of news contents, i.e. merging
Figure 906239DEST_PATH_IMAGE084
And
Figure 818831DEST_PATH_IMAGE085
and obtaining the final similarity of the proposed user u and other users q:
Figure 300628DEST_PATH_IMAGE086
wherein the content of the first and second substances,
Figure 340391DEST_PATH_IMAGE087
the weight factor is selected from 0-1 (for example, 0.5);
arranging all final similarities according to a descending order, taking the former M users (for example, the former 20) with the maximum final similarity as an M adjacent user set of the user u, and respectively calculating the interest degrees between the proposed user u and the documents clicked by the M users, for example, if the proposed user u and the M users click the document j, the calculation formula of the interest degree of the proposed user u to the document j is as follows:
Figure 630558DEST_PATH_IMAGE088
wherein
Figure 838685DEST_PATH_IMAGE089
To propose the M neighbor user set of user u,
Figure 932412DEST_PATH_IMAGE090
for the final similarity of user u and user q,
Figure 884187DEST_PATH_IMAGE091
representing the number of clicks made by user q on the jth document,
Figure 622336DEST_PATH_IMAGE092
and the popularity of the document j is represented, namely the ratio of the accumulated number of click users to the total number of users of the document j.
And obtaining the interesting document of the user according to the interestingness. Specifically, all the interestingness of the proposed user is arranged in the descending order, and the first N documents with the highest interestingness (the specific N value can be determined according to the actual situation) are taken as the documents of interest of the user.
Segmenting the interested document to obtain a plurality of document words, and calculating the initial TF value of the document words:
Figure 142311DEST_PATH_IMAGE093
in the formula (I), the compound is shown in the specification,
Figure 333120DEST_PATH_IMAGE094
is the total number of words of the document of interest,
Figure 714685DEST_PATH_IMAGE095
represents the total number of times of the document word d;
and (3) introducing a word frequency control model to optimize the initial TF value:
Figure 572920DEST_PATH_IMAGE096
wherein the content of the first and second substances,
Figure 122850DEST_PATH_IMAGE097
the expression word frequency control coefficient is an empirical value, the value is determined through a large number of experiments,
Figure 676322DEST_PATH_IMAGE098
for the total number of samples introduced (i.e. the total number of documents of interest),
Figure 235480DEST_PATH_IMAGE099
representing a sample average document length (document length of a document, i.e., the number of words of the document, average document length, i.e., the ratio of the total number of words of the document to the total number of documents, a sample, i.e., the document of interest, a sample average document length, i.e., the ratio of the total number of words of the document of interest to the total number of documents of interest),
Figure 807275DEST_PATH_IMAGE100
a TF value representing a document word;
and (3) introducing a word frequency-document correlation model to calculate the IDF value of the document word:
Figure 793686DEST_PATH_IMAGE101
Figure 100033DEST_PATH_IMAGE102
wherein the content of the first and second substances,idf d the IDF value representing the word of the document,
Figure 197302DEST_PATH_IMAGE103
for the number of documents of interest containing a document word,
Figure 498971DEST_PATH_IMAGE104
Figure 282381DEST_PATH_IMAGE105
representing the relevance of the document words to the document of interest;
Figure 935079DEST_PATH_IMAGE106
and
Figure 711406DEST_PATH_IMAGE107
is a factor for the regulation of the flow rate,
Figure 133160DEST_PATH_IMAGE108
and calculating the TF-IDF value of the document word d according to the TF value of the document word and the IDF value of the document word:
Figure 586007DEST_PATH_IMAGE109
in the formula (I), the compound is shown in the specification,ididf d the TF-IDF value representing the document word.
Step 208, obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and completing the recommendation of the proposal theme.
The news words and the document words are entries and form an entry set together;
need to explain: after the popularity value of each news word is obtained, all the news words and the popularity values thereof are stored according to a format of 'news word-popularity value', and a set A is formed together; after the TF-IDF value of each document word is obtained, all the document words and the TF-IDF values thereof are stored according to the format of the document words-TF-IDF values, and a set B is formed jointly.
Obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word:
Figure 726001DEST_PATH_IMAGE110
Figure 40439DEST_PATH_IMAGE111
Figure 316699DEST_PATH_IMAGE112
in the formula (I), the compound is shown in the specification,
Figure 815814DEST_PATH_IMAGE113
a recommended value representing an entry of the word,
Figure 803623DEST_PATH_IMAGE114
a result of the normalization processing representing the heat value of the news word,
Figure 46386DEST_PATH_IMAGE115
the result of the normalization processing of the TF-IDF value representing the document word,
Figure 583678DEST_PATH_IMAGE116
representing news wordsdIn the collectionAThe value of the heat in (1) is,
Figure 988114DEST_PATH_IMAGE117
representation collectionAThe sum of the heat values of all the news words in the list,
Figure 837122DEST_PATH_IMAGE118
representing document wordsdIn the collectionBThe TF-IDF value of (a),
Figure 8209DEST_PATH_IMAGE119
representation collectionBThe sum of the TF-IDF values of all document words in (a).
Need to explain: if News word d does not appear in set A, then
Figure 259062DEST_PATH_IMAGE120
(ii) a If the document word d does not appear in the set B, then
Figure 975345DEST_PATH_IMAGE121
Proposing user recommended value in terms
Figure 311648DEST_PATH_IMAGE122
And selecting Y entries from the maximum first X entries as a proposal theme. The specific X and Y values may be determined as practical, for example, X =15, Y = 3.
Further, the method comprises the following steps:
Figure 510593DEST_PATH_IMAGE123
Figure 615952DEST_PATH_IMAGE124
in the formula (I), the compound is shown in the specification,m 1 andm 2 different values can be taken according to the emphasis of the user on the news data and the historical behavior data for adjusting the factors, but the recommendation of the proposal theme can be completed no matter how many values are taken by the adjusting factors, and the quality of the proposal theme can be further improved.
According to the recommendation method, device, computer equipment and storage medium for the proposal theme, the heat value of each news word is obtained according to news data; obtaining an interested document and a plurality of document words according to the historical behavior data of the user, and further obtaining the TF-IDF value of each document word; on the basis, the recommendation words are obtained, and recommendation of the proposal subject is completed.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, further comprising:
acquiring historical proposal data; the historical proposal data comprises a plurality of proposal documents;
performing word segmentation on the proposal document to obtain a plurality of proposal words, and calculating the TF-IDF value of each proposal word;
the set of entries also includes a proposal, namely: obtaining a vocabulary entry set according to the news words, the document words and the proposal words; and obtaining the recommended value of each entry in the entry set according to the TF-IDF value of each proposed word, namely: and obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word.
Specifically, the method comprises the following steps:
the historical proposal data refers to data of proposals that have already been submitted.
Segmenting the proposal document to obtain a plurality of proposal words, and calculating the initial TF value of the proposal words:
Figure 627771DEST_PATH_IMAGE125
in the formula (I), the compound is shown in the specification,
Figure 592316DEST_PATH_IMAGE126
is the total number of words of the proposal document,
Figure 449413DEST_PATH_IMAGE127
represents the total number of times of proposal word d;
and (3) introducing a word frequency control model to optimize the initial TF value:
Figure 65071DEST_PATH_IMAGE128
wherein the content of the first and second substances,
Figure 247791DEST_PATH_IMAGE129
the expression word frequency control coefficient is an empirical value, the value is determined through a large number of experiments,
Figure 293107DEST_PATH_IMAGE130
for the total number of samples introduced (i.e. the total number of proposal documents),
Figure 94841DEST_PATH_IMAGE131
indicating a sample average document length (document length of a certain document, i.e. the number of words of the document, average document length, i.e. the ratio of the total number of words of the document to the total number of documents, a sample, i.e. a proposal document, sample average document length, i.e. the ratio of the total number of words of the proposal document to the total number of proposal documents),
Figure 440372DEST_PATH_IMAGE132
a TF value representing a proposal word;
and (3) introducing a word frequency-document correlation model to calculate the IDF value of the proposal word:
Figure 420092DEST_PATH_IMAGE133
Figure 952704DEST_PATH_IMAGE134
wherein the content of the first and second substances,idf d the IDF value representing the proposed word,
Figure 558129DEST_PATH_IMAGE135
for the number of proposal documents containing a proposal word,
Figure 758166DEST_PATH_IMAGE136
Figure 17109DEST_PATH_IMAGE137
representing the relevance of the proposal words to the proposal document;
Figure 427231DEST_PATH_IMAGE138
and
Figure 960980DEST_PATH_IMAGE139
is the adjustment factor for the number of cells in the cell,
Figure 625311DEST_PATH_IMAGE140
calculating the TF-IDF value of the proposal d according to the TF value of the proposal and the IDF value of the proposal:
Figure 320734DEST_PATH_IMAGE141
in the formula (I), the compound is shown in the specification,ididf d the TF-IDF value representing the proposed word.
News words, document words and proposal words are entries which together form an entry set;
the following description is required: after the popularity value of each news word is obtained, all the news words and the popularity values thereof are stored according to a format of 'news word-popularity value', and a set A is formed together; after obtaining the TF-IDF value of each document word, storing all the document words and the TF-IDF values thereof according to the format of the document words-TF-IDF values to form a set B together; after the TF-IDF value of each proposal word is obtained, all the proposal words and the TF-IDF values thereof are stored according to the format of the proposal words-TF-IDF values, and together form a set C.
Obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word:
Figure 719617DEST_PATH_IMAGE142
Figure 525899DEST_PATH_IMAGE143
Figure 434949DEST_PATH_IMAGE144
Figure 442219DEST_PATH_IMAGE145
in the formula (I), the compound is shown in the specification,
Figure 171141DEST_PATH_IMAGE146
a recommended value representing an entry of the word,
Figure 171327DEST_PATH_IMAGE147
a result of the normalization processing representing the heat value of the news word,
Figure 669304DEST_PATH_IMAGE148
indicates the result of the normalization processing of the TF-IDF values of the document words,
Figure 972110DEST_PATH_IMAGE149
indicates the result of normalization processing of the TF-IDF value of the proposal word,
Figure 329273DEST_PATH_IMAGE150
representing news wordsdIn the collectionAThe value of the heat in (1) is,
Figure 742937DEST_PATH_IMAGE151
representation collectionAThe sum of the heat values of all the news words in the list,
Figure 987098DEST_PATH_IMAGE152
representing document wordsdIn the collectionBThe TF-IDF value of (a),
Figure 460805DEST_PATH_IMAGE153
representation collectionBThe sum of the TF-IDF values of all document words in (a),
Figure 305264DEST_PATH_IMAGE154
to express a proposaldIn the collectionCThe TF-IDF value of (a),
Figure 257040DEST_PATH_IMAGE155
representation collectionCThe sum of the TF-IDF values of all the proposed words in (1).
Need to explain: if News word d does not appear in set A, then
Figure 995189DEST_PATH_IMAGE156
(ii) a If the document word d does not appear in the set B, then
Figure 764430DEST_PATH_IMAGE157
(ii) a If News word d does not appear in set C, then
Figure 220820DEST_PATH_IMAGE158
Proposing a user's recommended value in terms
Figure 586073DEST_PATH_IMAGE159
One to two entries are selected from the first ten entries with the largest size as a proposal subject.
The method adopts a keyword extraction method combined with Bayesian transformation to extract hot vocabulary entries throughout the year, considers the popularity score of each vocabulary entry every day, and does not simply depend on the word frequency score to obtain the popularity value of a news word; selecting news most relevant to daily work of the proposal users by taking historical behavior data of all the proposal users as input, extracting key word entries of the news, taking correlation calculation between the key words and the documents into consideration during extraction to obtain document words, and obtaining TFIDF values of the document words by adopting an improved TFIDF method; screening out the keyword bars in the proposal library by adopting an improved TFIDF method to obtain the TF-IDF value of the proposal words; and finally, taking the first two data as positive factors and the third data as negative factors, comprehensively weighting, and selecting one to two entries from the first ten entries with the maximum total heat values of the entries by a proposal user as a proposal theme.
The method integrates the vocabulary entry heat value and the vocabulary entry-document correlation characteristics, provides a new keyword extraction method combined with Bayesian transformation and an improved TFIDF method, and screens out the proposed similar proposals for effectively avoiding the occurrence of repeated proposals or similar proposals, thereby achieving the effect of improving the proposal quality.
In one embodiment, as shown in fig. 7, there is provided a recommendation apparatus for a proposal topic, including: an obtaining module 702, a popularity module 704, a word frequency module 706, and a recommending module 708, wherein:
an obtaining module 702, configured to obtain news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
the popularity module 704 is used for obtaining a popularity value of each news word according to the news data by combining Bayesian transformation;
the word frequency module 706 is configured to calculate, according to the historical behavior data of the user, a news reading similarity and a news content similarity between the proposal user and another user to obtain a final similarity, and calculate, according to the final similarity, an interest degree of the user in each document to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
a recommending module 708, configured to obtain a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
For the specific definition of the recommendation device for the proposal subject, reference may be made to the above definition of the recommendation method for the proposal subject, which is not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a recommendation method for a proposed topic. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method in the above embodiments when the processor executes the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method in the above-mentioned embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus (Rambus) direct RAM (RDRAM), direct bused dynamic RAM (DRDRAM), and bused dynamic RAM (RDRAM).
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A recommendation method for a proposal topic, comprising:
acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
according to the news data, the heat value of each news word is obtained by combining Bayesian transformation;
calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
2. The method of claim 1, further comprising:
acquiring historical proposal data; the historical proposal data comprises a plurality of proposal documents;
performing word segmentation on the proposal document to obtain a plurality of proposal words, and calculating the TF-IDF value of each proposal word;
obtaining a vocabulary entry set according to news words, document words and proposal words; and obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word.
3. The method of claim 2, wherein obtaining a popularity value for each news word from the news data in conjunction with a bayesian transformation comprises:
calculating an initial heat value of the news word in a certain day according to the word frequency of the certain day and the word frequency of the total statistical days:
Figure 889416DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 963813DEST_PATH_IMAGE002
representing news words
Figure 274709DEST_PATH_IMAGE003
In the first placeiAn initial heat value of day;
Figure 76443DEST_PATH_IMAGE004
representing news words
Figure 156394DEST_PATH_IMAGE005
In the first placeiFrequency of occurrence of days;
Figure 510015DEST_PATH_IMAGE006
representing news words
Figure 432841DEST_PATH_IMAGE005
Frequency of occurrence in H days;
Figure 162899DEST_PATH_IMAGE005
representing a certain news word;
Figure 238303DEST_PATH_IMAGE007
is shown asiCounting days; h is total statistical days;
and correcting the initial heat value by using Bayesian transformation to obtain a corrected heat value of the news word in a certain day:
Figure 762825DEST_PATH_IMAGE008
Figure 674412DEST_PATH_IMAGE009
Figure 942582DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure 872492DEST_PATH_IMAGE011
representing news words
Figure 567915DEST_PATH_IMAGE012
In the first placeiThe corrected heat value of day, C is the average word frequency, j is the jth news word, I is the total number of the news words, and m is a priori average score;
obtaining a final heat value of the news word according to the corrected heat value of the news word in a certain day:
Figure 465333DEST_PATH_IMAGE013
wherein the content of the first and second substances,
Figure 271615DEST_PATH_IMAGE014
representing news words
Figure 587190DEST_PATH_IMAGE015
The final heat value of (a).
4. The method of claim 3, wherein calculating news reading similarity and news content similarity between the proposed user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the user interest degree of each document according to the final similarity to obtain the interested document comprises:
calculating the news reading similarity of the proposal user u and other users q:
Figure 187935DEST_PATH_IMAGE016
in the formula, sim 1 Representing the news reading similarity of the proposal user u and other users q; s is the total number of documents contained in the data set,
Figure 74114DEST_PATH_IMAGE017
indicating the number of clicks of user u on the ith document,
Figure 684087DEST_PATH_IMAGE018
representing the click times of the ith document by the user q;
calculating the similarity of news contents of the proposal user u and other users q:
Figure 447644DEST_PATH_IMAGE019
in the formula (I), the compound is shown in the specification,
Figure 625815DEST_PATH_IMAGE020
representing the number of documents that user u and user q have viewed,
Figure 107612DEST_PATH_IMAGE021
and
Figure 645910DEST_PATH_IMAGE022
respectively representing the number of documents of which the user u and the user q have generated historical behaviors;
and obtaining the final similarity of the proposal user u and other users q according to the news reading similarity and the news content similarity:
Figure 529552DEST_PATH_IMAGE023
wherein the content of the first and second substances,
Figure 613046DEST_PATH_IMAGE024
is a weight factor;
taking the former M users with the maximum final similarity as an M neighbor user set of the user u, respectively calculating the interest degrees between the proposed user u and the documents clicked by the M users, and the interest degree of the proposed user u to the document j:
Figure 582139DEST_PATH_IMAGE025
wherein, the first and the second end of the pipe are connected with each other,
Figure 160013DEST_PATH_IMAGE026
to propose the M neighbor user set of user u,
Figure 163741DEST_PATH_IMAGE027
for the final similarity of user u and user q,
Figure 683715DEST_PATH_IMAGE028
representing the number of clicks made by user q on the jth document,
Figure 140104DEST_PATH_IMAGE029
represents the popularity of document j;
and obtaining the interesting document of the user according to the interestingness.
5. The method of any of claims 2 to 4, wherein tokenizing the document of interest to obtain a plurality of document terms and calculating a TF-IDF value for each document term, tokenizing the proposal document to obtain a plurality of proposals and calculating a TF-IDF value for each proposal comprises:
calculating an initial TF value for the document word or proposal word:
Figure 20205DEST_PATH_IMAGE030
in the formula (I), the compound is shown in the specification,
Figure 347281DEST_PATH_IMAGE031
is the total number of words of the document of interest or proposal document,
Figure 162790DEST_PATH_IMAGE032
representing the total number of document words or proposal words d;
and (3) introducing a word frequency control model to optimize an initial TF value:
Figure 981841DEST_PATH_IMAGE033
wherein the content of the first and second substances,
Figure 432677DEST_PATH_IMAGE034
the word frequency control coefficient is represented by a word frequency control coefficient,
Figure 879838DEST_PATH_IMAGE035
for the total number of samples introduced,
Figure 600670DEST_PATH_IMAGE036
the average document length of the sample is represented,
Figure 907017DEST_PATH_IMAGE037
a TF value representing a document word or a proposal word;
calculating the IDF value of the document word or proposal word:
Figure 4286DEST_PATH_IMAGE038
Figure 696168DEST_PATH_IMAGE039
wherein the content of the first and second substances,idf d an IDF value representing a document word or a proposal word,
Figure 587900DEST_PATH_IMAGE040
for the number of documents of interest containing document terms or the number of proposal documents containing proposal terms,
Figure 381544DEST_PATH_IMAGE041
Figure 282504DEST_PATH_IMAGE042
representing the relevance of the document words to the document of interest or the relevance of the proposal words to the proposal document;
Figure 330357DEST_PATH_IMAGE043
and
Figure 658570DEST_PATH_IMAGE044
is a regulatory factor;
calculating the TF-IDF value of the document word or the proposal word d according to the TF value of the document word or the proposal word and the IDF value of the document word or the proposal word:
Figure 532985DEST_PATH_IMAGE045
in the formula (I), the compound is shown in the specification,ididf d a TF-IDF value representing a document word or a proposal word.
6. The method according to any one of claims 2 to 4, wherein a set of terms is obtained from news words, document words, and proposal words; obtaining a recommendation value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word, wherein the recommendation value of each entry in the entry set comprises the following steps:
news words, document words and proposal words are entries which together form an entry set;
obtaining a recommended value of each entry in the entry set according to the heat value of each news word, the TF-IDF value of each document word and the TF-IDF value of each proposal word:
Figure 113002DEST_PATH_IMAGE046
Figure 389262DEST_PATH_IMAGE047
Figure 13011DEST_PATH_IMAGE048
Figure 905880DEST_PATH_IMAGE049
in the formula (I), the compound is shown in the specification,
Figure 24009DEST_PATH_IMAGE050
a recommended value representing an entry of the word,
Figure 154776DEST_PATH_IMAGE051
a result of the normalization processing representing the heat value of the news word,
Figure 450890DEST_PATH_IMAGE052
the result of the normalization processing of the TF-IDF value representing the document word,
Figure 565477DEST_PATH_IMAGE053
indicates the result of the normalization processing of the TF-IDF value of the proposal word,
Figure 221717DEST_PATH_IMAGE054
representing news wordsdThe value of the heat of the gas (C),
Figure 738149DEST_PATH_IMAGE055
representation collectionAThe sum of the heat values of all the news words in the list,
Figure 313487DEST_PATH_IMAGE056
representing document wordsdThe TF-IDF value of (a),
Figure 40004DEST_PATH_IMAGE057
representation collectionBThe sum of the TF-IDF values of all document words in (a),
Figure 624569DEST_PATH_IMAGE058
to express a proposaldThe TF-IDF value of (a),
Figure 605294DEST_PATH_IMAGE059
representation collectionCThe sum of the TF-IDF values of all the proposed words in (1).
7. The method according to any one of claims 2 to 4, wherein the news words are obtained after performing word segmentation, stop word processing and unknown word processing on news data.
8. A recommendation device for a proposed topic, comprising:
the acquisition module is used for acquiring news data and historical behavior data of a user; the news data comprises a plurality of news words; the historical behavioral data comprises a plurality of documents;
the popularity module is used for obtaining the popularity value of each news word by combining Bayesian transformation according to the news data;
the word frequency module is used for calculating the news reading similarity and the news content similarity between the proposal user and other users according to the historical behavior data of the user to obtain final similarity, and calculating the interest degree of the user in each document according to the final similarity to obtain an interested document; segmenting the interesting document to obtain a plurality of document words, and calculating the TF-IDF value of each document word;
the recommendation module is used for obtaining a vocabulary entry set according to the news words and the document words; obtaining a recommended value of each entry in the entry set according to the heat value of each news word and the TF-IDF value of each document word; and obtaining a recommended word according to the recommended value, and finishing the recommendation of the proposal subject.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202211013812.0A 2022-08-23 2022-08-23 Recommendation method and device for proposal theme, computer equipment and storage medium Active CN115080867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211013812.0A CN115080867B (en) 2022-08-23 2022-08-23 Recommendation method and device for proposal theme, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211013812.0A CN115080867B (en) 2022-08-23 2022-08-23 Recommendation method and device for proposal theme, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115080867A true CN115080867A (en) 2022-09-20
CN115080867B CN115080867B (en) 2022-11-15

Family

ID=83245454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211013812.0A Active CN115080867B (en) 2022-08-23 2022-08-23 Recommendation method and device for proposal theme, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115080867B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115827877A (en) * 2023-02-07 2023-03-21 湖南正宇软件技术开发有限公司 Proposal auxiliary combination method, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160269344A1 (en) * 2015-03-13 2016-09-15 International Business Machines Corporation Recommending hashtags to be used in composed message to increase propagation speed and enhance desired sentiment of composed message
CN109271574A (en) * 2018-08-28 2019-01-25 麒麟合盛网络技术股份有限公司 A kind of hot word recommended method and device
CN110188265A (en) * 2019-04-26 2019-08-30 中国科学院计算技术研究所 A kind of network public-opinion focus recommendation method and system of fusion user portrait
CN110334202A (en) * 2019-03-28 2019-10-15 平安科技(深圳)有限公司 User interest label construction method and relevant device based on news application software
US20200074475A1 (en) * 2018-08-30 2020-03-05 Dariusz Zabrzenski Intelligent system enabling automated scenario-based responses in customer service

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160269344A1 (en) * 2015-03-13 2016-09-15 International Business Machines Corporation Recommending hashtags to be used in composed message to increase propagation speed and enhance desired sentiment of composed message
CN109271574A (en) * 2018-08-28 2019-01-25 麒麟合盛网络技术股份有限公司 A kind of hot word recommended method and device
US20200074475A1 (en) * 2018-08-30 2020-03-05 Dariusz Zabrzenski Intelligent system enabling automated scenario-based responses in customer service
CN110334202A (en) * 2019-03-28 2019-10-15 平安科技(深圳)有限公司 User interest label construction method and relevant device based on news application software
CN110188265A (en) * 2019-04-26 2019-08-30 中国科学院计算技术研究所 A kind of network public-opinion focus recommendation method and system of fusion user portrait

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张舒雅等: "基于Spark和改进的TF-IDF算法的用户特征分析", 《软件工程》 *
鲁燃: "融合人工蜂群的微博话题推荐算法", 《山西大学学报(自然科学版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115827877A (en) * 2023-02-07 2023-03-21 湖南正宇软件技术开发有限公司 Proposal auxiliary combination method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN115080867B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
CN109583620B (en) Enterprise potential risk early warning method, enterprise potential risk early warning device, computer equipment and storage medium
CN109740152B (en) Text category determination method and device, storage medium and computer equipment
Zamani et al. Situational context for ranking in personal search
US8612435B2 (en) Activity based users' interests modeling for determining content relevance
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
CN109766438A (en) Biographic information extracting method, device, computer equipment and storage medium
US8370286B2 (en) System for personalized term expansion and recommendation
CN112988980B (en) Target product query method and device, computer equipment and storage medium
CN112559895B (en) Data processing method and device, electronic equipment and storage medium
CN110363580B (en) Information recommendation method and device, computer equipment and storage medium
CN112434216B (en) Intelligent recommendation method and device for investment projects, storage medium and computer equipment
CN112560444A (en) Text processing method and device, computer equipment and storage medium
CN110705489B (en) Training method and device for target recognition network, computer equipment and storage medium
CN115080867B (en) Recommendation method and device for proposal theme, computer equipment and storage medium
CN112434158B (en) Enterprise tag acquisition method, enterprise tag acquisition device, storage medium and computer equipment
CN109801101A (en) Label determines method, apparatus, computer equipment and storage medium
CN110162689B (en) Information pushing method, device, computer equipment and storage medium
CN110389963A (en) The recognition methods of channel effect, device, equipment and storage medium based on big data
CN113961823A (en) News recommendation method, system, storage medium and equipment
CN115827877B (en) Proposal-assisted case merging method, device, computer equipment and storage medium
CN111597480A (en) Webpage resource preloading method and device, computer equipment and storage medium
CN114491296B (en) Proposal affiliate recommendation method, system, computer device and readable storage medium
CN110377819A (en) Arbitrator's recommended method, device and computer equipment based on big data
CN113190658B (en) Method and device for accurately extracting proposal hotspot, computer equipment and storage medium
CN110826921B (en) Data processing method, data processing device, computer readable storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant