CN117474703B - Topic intelligent recommendation method based on social network - Google Patents

Topic intelligent recommendation method based on social network Download PDF

Info

Publication number
CN117474703B
CN117474703B CN202311809535.9A CN202311809535A CN117474703B CN 117474703 B CN117474703 B CN 117474703B CN 202311809535 A CN202311809535 A CN 202311809535A CN 117474703 B CN117474703 B CN 117474703B
Authority
CN
China
Prior art keywords
topic
word
text content
text
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311809535.9A
Other languages
Chinese (zh)
Other versions
CN117474703A (en
Inventor
方波
唐路遥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Huiyou Network Technology Co ltd
Original Assignee
Wuhan Huiyou Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Huiyou Network Technology Co ltd filed Critical Wuhan Huiyou Network Technology Co ltd
Priority to CN202311809535.9A priority Critical patent/CN117474703B/en
Publication of CN117474703A publication Critical patent/CN117474703A/en
Application granted granted Critical
Publication of CN117474703B publication Critical patent/CN117474703B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data mining recommendation, in particular to an intelligent topic recommendation method based on a social network, which comprises the following steps: acquiring sentences in all text contents in a text content data set of all topics, text vocabulary sequences of all sentences and word vectors of vocabulary in all text vocabulary sequences; acquiring the part of speech of each word in a text word sequence, and counting the word frequency of all words in each text content, thereby acquiring the attribute score of the topic core word; obtaining a topic high-frequency word stock; obtaining important coefficients of each sentence in each text content according to the topic core word attribute score, and obtaining topic core content representative coefficients of each text content; acquiring vocabulary association indexes among topics according to the word vectors; acquiring topic core content association indexes of each text content among topics, acquiring the interest coincidence degree of recommended topics, and recommending users. The invention aims to solve the problem of poor recommendation effect caused by considering only content similar to history.

Description

Topic intelligent recommendation method based on social network
Technical Field
The application relates to the technical field of data mining recommendation, in particular to an intelligent topic recommendation method based on a social network.
Background
With the rapid development of the internet, more and more people use the internet as an important channel for information acquisition and social contact. Social networks also rise rapidly with this advantage, becoming a network platform for people to communicate and share. The intelligent recommendation based on the social network can provide personalized and accurate recommendation content according to the behavior, interests and social relations of the user on the platform, so that personalized use experience of the user on the social network is improved. By analyzing the diversity of the social circle of the user, richer and more contents are recommended to the user, and the formation of a cocoon house is prevented.
The topic intelligent recommendation system based on the social network needs to analyze the text content of topics interested by users, and generally adopts a natural language processing technology to extract key information of the text. The text data has complex and various characteristics, and keyword information of different topics is different. A conventional topic intelligent recommendation method adopts a topic word extraction related algorithm to acquire high-frequency topic keywords of a user on a social network, extracts keywords of topic texts in a topic library, and recommends topics with high keyword matching and recognition to the user. The conventional recommendation algorithm has the problem that the recommendation effect is poor because the historical behavior of the user forms filtering bias and only content similar to the history is considered.
Disclosure of Invention
In order to solve the technical problems, the invention provides an intelligent topic recommendation method based on a social network, so as to solve the existing problems.
The intelligent topic recommendation method based on the social network adopts the following technical scheme:
the embodiment of the invention provides a social network-based topic intelligent recommendation method, which comprises the following steps of:
acquiring sentences in all text contents in a text content data set of all topics, text vocabulary sequences of all sentences and word vectors of vocabulary in each text vocabulary sequence;
acquiring the part of speech of each word in a text word sequence, and counting the word frequency of all words in each text content; obtaining topic core word attribute scores of each sentence according to word frequencies of all words in each sentence and the number of interval words among words; acquiring topic high-frequency word banks of all topics according to word frequencies of words in a text content data set; acquiring the importance coefficients of each sentence in each text content according to the number of the words of each sentence in the topic high-frequency word stock and the topic core word attribute score; obtaining topic core content representing coefficients of each text content according to word frequencies of words in sentences, the number of words appearing in a topic high-frequency word bank in the text content and important coefficients of the sentences; acquiring word association indexes among topics according to cosine similarity of word vectors among words in the topics; obtaining topic core content association indexes of each text content among topics according to topic core content representative coefficients and vocabulary association indexes; acquiring the interest coincidence degree of the recommended topics of the text content according to the topic core content association index and the praise, comment and forwarding times of the user on the topics;
and calculating the interest coincidence degree of recommended topics of the text contents of all the unbrown topics in the near three days, sorting the text contents from big to small, and intelligently recommending the topic contents in sequence.
Further, the acquiring the part of speech of each word in the text word sequence includes:
and marking the parts of speech in each text vocabulary sequence by using a BiLSTM-CRF model to obtain the parts of speech of each vocabulary.
Further, the counting word frequencies of all words in each text content includes:
and carrying out word frequency statistics on the words in the text word sequence by using the N-gram model, and obtaining the word frequency of each word in the text content.
Further, the obtaining the topic core word attribute score of each sentence includes:
in each sentence of each text content of each topic, counting the number of nouns of all words in the sentence, calculating the product of word frequency of each word and other words in the sentence in the text content, and calculating the number of interval words between each word and other words in the sentence;
and calculating the ratio of the product to the number of the interval words, and calculating the sum of all the ratios in the sentences as the topic core word attribute score of each sentence.
Further, the obtaining the topic high-frequency word stock of each topic includes:
counting the highest word frequency in the text content data set of each topicThe words are sequenced according to the sequence from small word frequency to large word frequency to form a topic high-frequency word stock;
wherein,the number of the high-frequency words is preset.
Further, the obtaining the importance coefficient of each sentence in each text content includes:
in each sentence of each text content of each topic, calculating the product of the number of words appearing in the topic high-frequency word stock and the topic core word attribute score in the sentence as an important coefficient of each sentence.
Further, the obtaining topic core content representative coefficients of each text content includes:
in each sentence of each text content of each topic, calculating the product of word frequency and important coefficient of each word in the text content, calculating the average value of all the products, and calculating the product of the number of words appearing in the topic high-frequency word stock in the text content and the average value as the topic core content representative coefficient of each text content.
Further, the obtaining the vocabulary association index between topics includes:
and calculating cosine similarity between each word of each text content in each topic and word vectors of each word of each text content in other topics, and calculating a sum value of all cosine similarity as a word association index between each topic and other topics.
Further, the obtaining the topic core content association index of each text content between topics includes:
calculating the product of topic core content representing coefficients of txt text contents in each topic and txt text contents in other topics, and calculating the product of the product and vocabulary association indexes between each topic and other topics as a first product;
calculating the absolute value of the difference value of the word frequency of the jth vocabulary in the txt text contents in each topic and the word frequency of the jth vocabulary in the txt text contents in other topics, and calculating the sum value of the absolute value of the difference value and a preset adjusting parameter;
and calculating the ratio of the first product to the sum value, and calculating the average value of the ratio of all words in each topic as a topic core content association index of each topic and txt text contents of other topics.
Further, the obtaining the interest coincidence degree of the recommended topic of the text content includes:
in each topic, calculating the absolute value of the difference between the sum of the number of times of praise, comment and forwarding of the topic by the current month user and the sum of the number of times of praise, comment and forwarding of the topic in the previous month, and calculating the ratio of the absolute value of the difference to the sum of the number of times of praise, comment and forwarding of the topic in the previous month as the average change rate of the topic in the current month;
calculating the average value of the topic core content association index of the txt text contents of each topic and other topics and the average change rate, and calculating the absolute value of the difference value of the average value and the browsing times of each topic in the previous month as the recommended topic interest coincidence degree of the txt text contents of the other topics.
The invention has at least the following beneficial effects:
according to the method, topic content frequently browsed by a user is analyzed, topic core content association indexes are constructed, the indexes can represent possible browsing degrees of the user on non-recommended topics, the interested degrees of the content to be recommended are ordered according to the topic core content association indexes, and intelligent recommendation is carried out according to the interested degree. The defect that the conventional recommendation algorithm is difficult to provide personalized topic contents is overcome, the problem that the recommendation effect is poor due to the fact that only similar contents as histories are considered is solved, and the beneficial effects of topic recommendation novelty and diversity in a social network are improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a social network-based topic intelligent recommendation method provided by the invention;
fig. 2 is a topic recommendation flow chart.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description is given below of the social network-based topic intelligent recommendation method according to the invention, which is provided by combining the accompanying drawings and the preferred embodiment, and the detailed description of the specific implementation, the structure, the characteristics and the effects thereof is given below. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the topic intelligent recommendation method based on the social network, provided by the invention, with reference to the accompanying drawings.
The invention provides a social network-based topic intelligent recommendation method, and in particular provides a social network-based topic intelligent recommendation method, referring to fig. 1, comprising the following steps:
step S001, collecting text content data sets of topics frequently focused by users in the social network, and preprocessing data.
Relevant data of the user on the social network is obtained. Using the APIs of the social media platform, content such as posts, comments, and tweets posted by users on the social network can be collected. Where an API refers to a set of methods for interacting between two different software applications, methods well known to those skilled in the art are not described in detail herein. According to the embodiment, the text content of the topic of interest of the user for one month is collected from three social media platforms, namely, a learning platform, a microblog platform and a tiger platform, and the user is authorized to obtain the data.
And preprocessing the acquired topic text content data. The extracted topic data content may contain noise such as special symbols, expressions, HTML tags, etc. And for special symbols and expression noise, performing text marking and part-of-speech restoring operation to clear text content data by adopting an NLTK natural language processing library in Python. And for the HTML tag, analyzing the HTML by adopting a Beautiful Soup library in Python to obtain the text content in full Chinese. The text marking and part-of-speech restoring operations are known in the art and will not be described in detail herein.
After text content data of the topic full Chinese is obtained, word segmentation processing is required to be carried out on each sentence in the text content data. In the embodiment, the BERT word segmentation model is adopted to perform word segmentation processing, and all sentences in the text content data are split into words. The BERT word segmentation model is input as single sentences in text content data, and output is word sequence after word segmentation, and each word is separated by space. And converting each word in all sentences in the text content data into a semantic vector by using the BERT word segmentation model as a word vector, namely, representing each word in all sentences in the text content data by using one word vector. The BERT word segmentation model is a model known to those skilled in the art, and will not be described herein.
After the segmentation is completed, words with high occurrence frequency and low information content in the text, such as connective words, prepositions, pronouns and the like, are removed. To achieve this, the stop word processing is performed using a stop word list provided by the spaCy library in Python, avoiding negative impact of these words on subsequent analysis.
So far, sentences in all text contents and text vocabulary sequences of all sentences in the text content data set and word vectors of vocabulary in each text vocabulary sequence are acquired.
Step S002, processing and analyzing topic text data focused by the user, extracting topic keyword features, and constructing recommended topic interest conformity.
The step S001 obtains text vocabulary of topics of interest in the social network for the past month, and due to various types of topics, such as daily life topics, entertainment topics, scientific and technological innovation topics, social and current topics and the like. On a social platform, different topics are classified according to different labels, and partial keywords with higher occurrence frequency of the content of the different topics are different.
Taking daily life topics as an example, extracting and analyzing keywords of text content data of the topics.
First, topics frequently browsed by users are often distinguished by using different tag words, and the tag words are also entity words. In order to identify different topics, three models of BERT+LSTM+CRF are adopted for each text content data set to identify entity words, and tag words of each text content in the text content data set are identified. The BERT model, the LSTM model, and the CRF model are well known to those skilled in the art, and will not be described herein.
In topic recommendation, different parts of speech differ in the importance of the topic text content. Nouns, verbs, and adjectives are more important among these parts of speech. Since nouns generally represent things themselves or concepts, there is a high amount of information; verbs generally represent actions or changes, describing some state of motion of a thing; adjectives describe some attribute features of things. Taking the topic of daily life as an example, nouns used for describing daily living goods in the text of the topic are more than other parts of speech, so that the nouns are more important than the other parts of speech in the topic of daily life, and are further analyzed according to the characteristics.
First, the parts of speech in each text vocabulary sequence are marked by using a BiLSTM-CRF model, and the parts of speech of each vocabulary, such as nouns, verbs and adjectives, are obtained. The input of the BiLSTM-CRF model is all words in the text word sequence, and the output is the part-of-speech label of each word. Furthermore, in this embodiment, word frequency statistics is performed on the words in the text word sequence by using the N-gram model, so as to obtain word frequencies of the words in the text content where the words are located. The BiLSTM-CRF model and the N-gram model are known to those skilled in the art, and will not be described in detail herein.
Constructing topic core word attribute scores according to word frequencies and distances of different parts of speech in topics:
in the method, in the process of the invention,topic core word attribute scores representing the kth sentence in the txt text content in topic x; here taking the noun as an example, ->Represents the number of nouns in the kth sentence in the txt text content in topic x,/>Representing +.>Noun->Representing the +.f. in other parts of speech than nouns in the sentence>Personal word (s)/(s)>Representing the word frequency of the calculated vocabulary in the txt-th text content within topic x. />To calculate the vocabulary in the kth sentenceAnd word->Number of words in space between.
Formula logic: for topic x, if the importance of the noun part of speech is higher, calculateThe higher the value, the more nouns in the kth sentence appear, and the higher the word frequency of other parts of speech, calculated +.>The higher the value, the vocabulary ∈ ->And vocabulary->The closer the link to topic x. Meanwhile, if the distance between two words is closer, the calculated +.>The smaller the value, the vocabulary->And vocabulary->The higher the degree of semantic association between. The higher the attribute score of the core word of the topic is calculated finally, the more important the information quantity expressed in the txt text content in the topic x is.
The text content of different topics contains representative words, and the frequency of the representative words in the topics is high. Further, statistics on the highest frequency in a text content data set of a topicWords, will this->The high-frequency word set is marked as a topic high-frequency word stock, wherein->The empirical value of (2) is 500, and the high-frequency words are ranked in the high-frequency word set according to the sequence from the small appearance frequency to the large appearance frequency.
Building topic core content representative coefficients:
in the method, in the process of the invention,topic core content representative coefficients representing the txt-th text content within topic x,representing the number of words appearing in the topic high-frequency word stock in the txt text content in topic x,/for>Representing the total number of sentences in the txt-th text content within topic x, +.>The word frequency of each word in the kth sentence in the txt text content in the topic x is represented; />Importance coefficient representing kth sentence in txt text content within topic x,/->Representing the number of words appearing in the topic high-frequency word bank in the kth sentence in the txt text content in topic xIf there is no high frequency word, will +.>The value of (2) is set to 1, ">The topic core word attribute score of the kth sentence in the txt text content in topic x is represented.
Formula logic: when the number of times that a sentence in the txt-th text content in the topic x appears in the topic high-frequency thesaurus is larger,the larger the value of (2), if the topic core word attribute score +.>The higher the calculated importance coefficient of the sentence +.>The higher. At the same time (I)>The larger it is indicated that the vocabulary is mainly described in detail in the txt-th text of topic x. />The larger the text, the closer the kth sentence in the txt text of the topic x is to the content of the topic x. But->The larger the value, the more tightly the content of the txt text is combined with the corresponding topic. The larger the content representing coefficient of the topic core is>The more these words are described as representative words of the topic, the core content of the x topics is described.
In step S001, the BERT word segmentation model converts each word in a sentence into a word vector, wherein the word vector represents each word in a text as a special vector, the word vectors of words with similar semantics are closer in a vector space, and conversely, the words with non-similar semantics are farther in the vector space.
And calculating corresponding topic core content representative coefficients for each text content in each topic frequently browsed by the user. When intelligent recommendation of topics is performed, if recommendation is performed only according to the similarity degree of keywords among text contents, the recommended topic contents are easy to uniformly spread, users can only receive too single information in the browsing process, and obvious defects exist in that personalized recommendation cannot be provided for the users. Therefore, in order to realize content recommendation of diversity, topic core content association indexes are calculated by combining topic core content representative coefficients and keywords of topics:
in the method, in the process of the invention,topic core content association index representing the text content of the topic x and the topic y at txt; m is the minimum value of the total number of the vocabulary in the txt text content in the topic x and the total number of the vocabulary in the txt text content in the topic y; />And->Topic core content representative coefficients respectively representing the txth text content in topic x and topic y; />And->Respectively represent the txt text content in topic x and topic yWord frequency of the jth word in the text content; />To adjust the parameters, the denominator is avoided to be 0, and the checked value is taken to be 1.
Vocabulary association index between topic x and topic y, < +.>And->Representing the total number of words in the txt-th text in topic x and topic y, respectively,/>A word vector representing the a-th word in the text content data set in topic x,word vector representing the b-th word in the text content data set in topic y, ++>Representing the calculation of cosine similarity between two word vectors. The cosine similarity is a technique known to those skilled in the art, and is not described herein.
Formula logic: if the semantic similarity between words in the text between different topics is higher, the method calculatesThe larger the value is, the corresponding word association coefficient +.>The larger the content, the closer the content is described between topics x, y. If the frequency of occurrence of the jth word between two topics is closer, the more closely the j-th word is>The smaller the value of (2)The more words that the different topic text uses the same are indicated. If the two topic core content representing coefficients are larger, the calculation is performedThe larger the value, the more consistent the content of the text description is to the respective corresponding topic, and the more closely the topic is but the greater the degree of relevance of the descriptive content is. The higher the calculated topic core content association index, the higher the degree to which topic y may be of interest while the user is browsing topic x frequently.
After calculating the association index between different social network texts, the behavior of the user in the social network, such as praise, comment, forwarding and the like, is considered. The more the user performs the above operation, the more the user's interest level on the topic can be explained. And then, combining topic core content association indexes to construct recommended topic interest coincidence degree:
in the method, in the process of the invention,the interest coincidence degree of the recommended topics of the user on the txt text content in the other topics z is represented;index of topic core content association between browsed topic L and unblown topic z for user,/>The average change rate of praise, comment and forwarding of the content of the browsing topic L by the user in the past month is represented, wherein the average change rate is calculated in the following way: the ratio of the absolute value of the difference between the sum of the number of praise, comment and forwarding times of the content of the browsing topic L in the current month and the sum of the number of praise, comment and forwarding times of the content of the browsing topic L in the previous month to the sum of the number of praise, comment and forwarding times of the content of the browsing topic L in the previous month; />The number of browsing times of browsing the topic L in the past month by the user is represented.
Formula logic: if topic core content association indexThe larger the content of the unbrown topic is, the closer the content of the topic is to the content of the topic often browsed by the user, and the +.>The larger the content, the more frequently the user operates the topic, and the higher the content interest degree. But->The larger the content of the topics browsed by the user in the past period of time is, the more single the content, and the lack of personalization is indicated. Finally calculated recommended topic interest compliance +.>The larger the description topic z is, the more likely it is content of interest to the user. Priority recommendations are required for such topics.
And step S003, personalized recommendation is carried out on the user according to the interest coincidence degree of the recommended topics.
And (3) carrying out the same processing on the text contents of the temporary non-recommended topics according to the process of the step S001, and calculating the interest coincidence degree of the recommended topics of all the non-recommended contents according to the method.
Calculating the interest coincidence degree of recommended topicsIn the process of (1), substituting all topic contents commonly browsed by a user into recommended topic interest coincidence degree ++>The parameters L, L possibly comprise different topic parts, and all the topic contents which are not recommended temporarily are substituted into the recommended topic interest coincidence degree +.>And the parameter z in (a). The parameter L and the parameter z are explained in detail in step S002, and are not described herein.
And calculating the interest coincidence degree of the recommended topics of the text contents of all the unbrown topics in the last three days, and sorting the recommended topics from big to small. And finally, intelligent recommendation is conducted on the topic content in sequence. And substituting the recommended topics as browsed contents into the parameter L every time one recommended topic is browsed, and updating in real time. The topic recommendation flow chart is shown in fig. 2.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and the same or similar parts of each embodiment are referred to each other, and each embodiment mainly describes differences from other embodiments.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; the technical solutions described in the foregoing embodiments are modified or some of the technical features are replaced equivalently, so that the essence of the corresponding technical solutions does not deviate from the scope of the technical solutions of the embodiments of the present application, and all the technical solutions are included in the protection scope of the present application.

Claims (4)

1. The intelligent topic recommendation method based on the social network is characterized by comprising the following steps of:
acquiring sentences in all text contents in a text content data set of all topics, text vocabulary sequences of all sentences and word vectors of vocabulary in each text vocabulary sequence;
acquiring the part of speech of each word in a text word sequence, and counting the word frequency of all words in each text content; obtaining topic core word attribute scores of each sentence according to word frequencies of all words in each sentence and the number of interval words among words; acquiring topic high-frequency word banks of all topics according to word frequencies of words in a text content data set; acquiring the importance coefficients of each sentence in each text content according to the number of the words of each sentence in the topic high-frequency word stock and the topic core word attribute score; obtaining topic core content representing coefficients of each text content according to word frequencies of words in sentences, the number of words appearing in a topic high-frequency word bank in the text content and important coefficients of the sentences; acquiring word association indexes among topics according to cosine similarity of word vectors among words in the topics; obtaining topic core content association indexes of each text content among topics according to topic core content representative coefficients and vocabulary association indexes; acquiring the interest coincidence degree of the recommended topics of the text content according to the topic core content association index and the praise, comment and forwarding times of the user on the topics;
calculating the interest coincidence degree of recommended topics of the text contents of all the unbrown topics in the near three days, sorting the text contents from big to small, and intelligently recommending the topic contents in sequence;
the obtaining the topic core word attribute score of each sentence comprises the following steps:
in the method, in the process of the invention,topic core word attribute scores representing the kth sentence in the txt text content in topic x; />Represents the number of nouns in the kth sentence in the txt text content in topic x,/>Representing +.>Noun->Representing the +.f. in other parts of speech than nouns in the sentence>Personal word (s)/(s)>Representing word frequency of calculated word in txt text content within topic x, ++>To calculate the vocabulary +.>And word->Number of words in space;
the obtaining the important coefficient of each sentence in each text content comprises the following steps:
in the method, in the process of the invention,importance coefficient representing kth sentence in txt text content within topic x,/->Represents the number of words appearing in the topic high-frequency word stock in the kth sentence in the txt text content in topic x,/for>Topic core words representing kth sentence in txt text content in topic xAttribute scores;
the obtaining topic core content representative coefficients of each text content includes:
in the method, in the process of the invention,topic core content representing coefficient representing txt-th text content in topic x, ++>Representing the number of words appearing in the topic high-frequency word stock in the txt text content in topic x,/for>Representing the total number of sentences in the txt-th text content within topic x, +.>The word frequency of each word in the kth sentence in the txt text content in the topic x is represented; />An importance coefficient representing a kth sentence in the txt text content in topic x;
the obtaining topic core content association indexes of each text content among topics comprises the following steps:
in the method, in the process of the invention,topic core content association index representing the text content of the topic x and the topic y at txt; m is the total number of lexical items in the txt text content in topic x and topic y isThe minimum value of the total number of lexicons in the txt text content; />And->Topic core content representative coefficients respectively representing the txth text content in topic x and topic y; />And->The word frequency of the jth vocabulary in the text content of the txt text content in the topic x and the topic y is respectively represented; />For regulating parameters->A vocabulary association index representing between topic x and topic y;
the acquiring the vocabulary association indexes among topics comprises the following steps:
vocabulary association index between topic x and topic y, < +.>And->Respectively representing the total number of words in the txt text content in topic x and topic y, < +.>Word vectors representing the a-th word in the text content data set in topic x +.>Word vector representing the b-th word in the text content data set in topic y, ++>Representing the calculation of cosine similarity between two word vectors;
the obtaining the recommended topic interest coincidence degree of the text content comprises the following steps:
in the method, in the process of the invention,the interest coincidence degree of the recommended topics of the user on the txt text content in the other topics z is represented;index of topic core content association between the txt text content for browsing topic L and unbrown topic z for the user,>representing the average rate of change of praise, comment and forwarding of the content of the browsing topic L by the user in the past month,the browsing times of browsing topics L in the past month of the user are represented;
the average change rate is calculated by the following steps: the ratio of the absolute value of the difference between the sum of the number of praise, comment and forwarding times of the content of the browsing topic L in the current month and the sum of the number of praise, comment and forwarding times of the content of the browsing topic L in the previous month.
2. The intelligent recommendation method for topics based on social network as claimed in claim 1, wherein the obtaining the part of speech of each word in the text word sequence comprises:
and marking the parts of speech in each text vocabulary sequence by using a BiLSTM-CRF model to obtain the parts of speech of each vocabulary.
3. The intelligent recommendation method for topics based on social network as claimed in claim 1, wherein the counting word frequencies of all words in each text content comprises:
and carrying out word frequency statistics on the words in the text word sequence by using the N-gram model, and obtaining the word frequency of each word in the text content.
4. The intelligent recommendation method for topics based on social network as claimed in claim 1, wherein the obtaining the topic high-frequency word stock of each topic comprises:
counting the highest word frequency in the text content data set of each topicThe words are sequenced according to the sequence from small word frequency to large word frequency to form a topic high-frequency word stock;
wherein,the number of the high-frequency words is preset.
CN202311809535.9A 2023-12-26 2023-12-26 Topic intelligent recommendation method based on social network Active CN117474703B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311809535.9A CN117474703B (en) 2023-12-26 2023-12-26 Topic intelligent recommendation method based on social network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311809535.9A CN117474703B (en) 2023-12-26 2023-12-26 Topic intelligent recommendation method based on social network

Publications (2)

Publication Number Publication Date
CN117474703A CN117474703A (en) 2024-01-30
CN117474703B true CN117474703B (en) 2024-03-26

Family

ID=89633294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311809535.9A Active CN117474703B (en) 2023-12-26 2023-12-26 Topic intelligent recommendation method based on social network

Country Status (1)

Country Link
CN (1) CN117474703B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101903874A (en) * 2007-12-20 2010-12-01 雅虎公司 Recommendation system using social behavior analysis and vocabulary taxonomies
CN103092956A (en) * 2013-01-17 2013-05-08 上海交通大学 Method and system for topic keyword self-adaptive expansion on social network platform
CN106844416A (en) * 2016-11-17 2017-06-13 中国科学院计算技术研究所 A kind of sub-topic method for digging
CN110688593A (en) * 2019-08-30 2020-01-14 安徽芃睿科技有限公司 Social media account identification method and system
CN111475729A (en) * 2020-04-07 2020-07-31 腾讯科技(深圳)有限公司 Search content recommendation method and device
US10902197B1 (en) * 2015-03-30 2021-01-26 Audible, Inc Vocabulary determination and vocabulary-based content recommendations

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978314B (en) * 2014-04-01 2019-05-14 深圳市腾讯计算机系统有限公司 Media content recommendations method and device
CN109840321B (en) * 2017-11-29 2022-02-01 腾讯科技(深圳)有限公司 Text recommendation method and device and electronic equipment
JP7176443B2 (en) * 2019-03-11 2022-11-22 トヨタ自動車株式会社 Recommendation statement generation device, recommendation statement generation method, and recommendation statement generation program
WO2020206066A1 (en) * 2019-04-03 2020-10-08 ICX Media, Inc. Method for optimizing media and marketing content using cross-platform video intelligence
CN111414753A (en) * 2020-03-09 2020-07-14 中国美术学院 Method and system for extracting perceptual image vocabulary of product

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101903874A (en) * 2007-12-20 2010-12-01 雅虎公司 Recommendation system using social behavior analysis and vocabulary taxonomies
CN103092956A (en) * 2013-01-17 2013-05-08 上海交通大学 Method and system for topic keyword self-adaptive expansion on social network platform
US10902197B1 (en) * 2015-03-30 2021-01-26 Audible, Inc Vocabulary determination and vocabulary-based content recommendations
CN106844416A (en) * 2016-11-17 2017-06-13 中国科学院计算技术研究所 A kind of sub-topic method for digging
CN110688593A (en) * 2019-08-30 2020-01-14 安徽芃睿科技有限公司 Social media account identification method and system
CN111475729A (en) * 2020-04-07 2020-07-31 腾讯科技(深圳)有限公司 Search content recommendation method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
基于主题标签的在线社区话题发现;周新民等;系统工程;20170728;第35卷(第07期);第40-46页 *
基于微博用户兴趣话题的相似用户挖掘;李鹏飞等;计算机工程与应用;20190123;第55卷(第11期);第102-109页 *
融合词向量与关键词提取的微博话题发现;王立平等;现代计算机;20200815(第23期);第3-9页 *
面向社交媒体评论的子话题挖掘研究;夏丽华等;情报杂志;20200314;第39卷(第04期);第110-116页 *

Also Published As

Publication number Publication date
CN117474703A (en) 2024-01-30

Similar Documents

Publication Publication Date Title
CN106997382B (en) Innovative creative tag automatic labeling method and system based on big data
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
CN104765769B (en) The short text query expansion and search method of a kind of word-based vector
CN105528437B (en) A kind of question answering system construction method extracted based on structured text knowledge
CN113239181A (en) Scientific and technological literature citation recommendation method based on deep learning
US20080221863A1 (en) Search-based word segmentation method and device for language without word boundary tag
CN108681574A (en) A kind of non-true class quiz answers selection method and system based on text snippet
CN111859961B (en) Text keyword extraction method based on improved TopicRank algorithm
CN113761890B (en) Multi-level semantic information retrieval method based on BERT context awareness
CN110347796A (en) Short text similarity calculating method under vector semantic tensor space
CN112559684A (en) Keyword extraction and information retrieval method
CN114065758A (en) Document keyword extraction method based on hypergraph random walk
CN110377695B (en) Public opinion theme data clustering method and device and storage medium
WO2020060718A1 (en) Intelligent search platforms
CN112183059A (en) Chinese structured event extraction method
CN116775812A (en) Traditional Chinese medicine patent analysis and excavation tool based on natural voice processing
Khalid et al. Topic detection from conversational dialogue corpus with parallel dirichlet allocation model and elbow method
CN110399603A (en) A kind of text-processing technical method and system based on sense-group division
Ajallouda et al. Kp-use: an unsupervised approach for key-phrases extraction from documents
CN117474703B (en) Topic intelligent recommendation method based on social network
Karpagam et al. Deep learning approaches for answer selection in question answering system for conversation agents
Fourati et al. Automatic audiovisual documents genre description
CN109298796B (en) Word association method and device
Ducoffe et al. Machine Learning under the light of Phraseology expertise: use case of presidential speeches, De Gaulle-Hollande (1958-2016)
Handayani et al. Sentiment Analysis of Bank BNI User Comments Using the Support Vector Machine Method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant