CN110472016A - Article recommended method, device, electronic equipment and storage medium - Google Patents

Article recommended method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110472016A
CN110472016A CN201910759959.6A CN201910759959A CN110472016A CN 110472016 A CN110472016 A CN 110472016A CN 201910759959 A CN201910759959 A CN 201910759959A CN 110472016 A CN110472016 A CN 110472016A
Authority
CN
China
Prior art keywords
article
theme
label
consistent
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910759959.6A
Other languages
Chinese (zh)
Other versions
CN110472016B (en
Inventor
张新宇
杜颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yayue Technology Co ltd
Original Assignee
Tencent Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Beijing Co Ltd filed Critical Tencent Technology Beijing Co Ltd
Priority to CN201910759959.6A priority Critical patent/CN110472016B/en
Publication of CN110472016A publication Critical patent/CN110472016A/en
Application granted granted Critical
Publication of CN110472016B publication Critical patent/CN110472016B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of article recommended method, device, electronic equipment and storage mediums;Method includes: to distribute the theme belonged to by the article in article library;The focused data that user is corresponded to according to article determines the influence degree for the theme that the concern behavior pair of user is consistent with article;According to influence degree, the theme being consistent with article is updated;Obtain the article list of user;In the theme that article is belonged in article library, the theme being consistent with article in article list is determined;The article for belonging to the theme being consistent with article in article list in article library is obtained, and is executed according to acquired article and recommends operation.Recommend by means of the invention it is possible to more accurately carry out personalized article to user.

Description

Article recommended method, device, electronic equipment and storage medium
Technical field
The present invention relates to the recommended technologies of artificial intelligence field more particularly to a kind of article recommended method, device, electronics to set Standby and storage medium.
Background technique
Artificial intelligence (AI, Artificial Intelligence) is to utilize digital computer or digital computer control Machine simulation, extension and the intelligence for extending people of system, perception environment obtain knowledge and the reason using Knowledge Acquirement optimum By, methods and techniques and application system.In other words, artificial intelligence is a complex art of computer science, it attempts The essence of intelligence is solved, and produces a kind of new intelligence machine that can be made a response in such a way that human intelligence is similar.Artificial intelligence The design principle and implementation method that various intelligence machines can namely be studied make machine have the function of perception, reasoning and decision.
Recommender system is the important application branch of artificial intelligence, by taking article is recommended as an example, is needed with very strong timeliness It asks, new article constantly generates, and the recommender system that the relevant technologies provide often uses off-line training mode, past when applying on line Toward being difficult to accurately identify the theme of newly generated article, the accuracy recommended is influenced.
Summary of the invention
The embodiment of the present invention provides a kind of article recommended method, device, electronic equipment and storage medium, can be realized accurate Article is recommended.
The technical solution of the embodiment of the present invention is achieved in that
The embodiment of the present invention provides a kind of article recommended method, comprising:
The theme belonged to is distributed by the article in article library;
The focused data that user is corresponded to according to the article determines that the concern behavior pair of the user is consistent with the article Theme influence degree;
According to the influence degree, the theme being consistent with the article is updated;
Obtain the article list of user;
In the theme that article is belonged in the article library, the theme being consistent with article in the article list is determined;
Obtain belonged in the article library with the article for the theme that article is consistent in the article list, and according to being obtained The article taken, which executes, recommends operation.
The embodiment of the invention also provides a kind of article recommendation apparatus, described device includes:
Theme distribution module, for distributing the theme belonged to by the article in article library;
First determining module determines the concern row of the user for corresponding to the focused data of user according to the article For the influence degree to the theme being consistent with the article;
Theme update module, for updating the theme being consistent with the article according to the influence degree;
List obtains module, for obtaining the article list of user;
Second determining module, it is determining with the article list in article is belonged in the article library theme The theme that middle article is consistent;
Article recommending module, for obtaining the theme belonged in the article library with article is consistent in the article list Article, and executed according to acquired article and recommend operation.
In above scheme, the theme distribution module includes:
Construction unit, for obtain label multiple included by the theme of each article ownership in the article library and The corresponding weight of the label, constructs weight matrix;
Decomposition unit, for the weight matrix carry out operation splitting, obtain the corresponding article vector of each article, with And the corresponding label vector of multiple labels that each article includes;
Cluster cell is obtained for carrying out clustering processing based on the corresponding article vector of article in the article library The theme respectively belonged to the article in the article library.
In above scheme, the construction unit is also used to determine in the article library that word included by each article is each Self-corresponding score;
The word for meeting score condition is selected to include as corresponding article in each article according to the score of the word Label;
The label for including by the article in each theme is arranged with carrying out descending according to the weight of the correspondence label Sequence;
Determine the label that the preceding label for setting quantity or ratio of sequence includes as the theme, and by the label pair Answer the score of corresponding article as the weight of the label.
In above scheme, the construction unit is also used to determine in the article library that word included by each article exists The reverse document-frequency of word frequency and the word in the article library in belonged to article;
The product for determining the word frequency and the reverse document-frequency is the corresponding score of the word.
In above scheme, the construction unit is also used to when at least two articles include identical in each theme When label, the weight adduction that the identical label corresponds at least two article is determined as new weight;
By the corresponding power of exclusive label of article in the identical corresponding new weight of label and each theme Weight sorts with carrying out descending.
In above scheme, first determining module is also used to the focused data according to the article, determines the user The degree of concern to the article that is characterized of concern behavior;
The theme being consistent with the article is traversed to execute following processing:
By the mark that the article belongs to the score of current traversal theme and the label and current traversal theme include Similarity between label carries out product, and significance level of the label for including using the article in the article is weight system Number, is weighted the product;
By the product of weighted results and the degree of concern, the concern behavior as the user is in current traversal theme The influence degree of each label.
In above scheme, first determining module is also used to the correspondence article for including according to the focused data Click data and exposure data, determine the clicking rate of the article, or
According to the comment data and exposure data of the correspondence article that the focused data includes, the article is determined Assessment Rate.
In above scheme, described device further include:
Module is adjusted, for after the degree of concern for determining the article according to the focused data of the article,
The value for adjusting the degree of concern is adapted with the confidence level of the focused data.
In above scheme, the theme update module, each label in the theme for being also used to be consistent with the article The influence degree of weight, determination corresponding with the theme that the article is consistent sums up, and obtains the weight of each tag update.
In above scheme, the theme update module is also used to when the theme being consistent with the article not include the text When whole labels of chapter, the label for not including and corresponding weight are added to the theme being consistent with the article;
Include label by the theme being consistent with the article to be sorted with carrying out descending according to corresponding weight, be tied according to sequence Fruit deletes part labels.
In above scheme, second determining module, be also used to determine in the article list the corresponding article of article to It measures, the similarity between theme vector corresponding with multiple themes that article in the article library is belonged to;
Determine that similarity meets the theme of similarity condition and is and article is consistent in the article list theme.
In above scheme, the subject recommending module includes:
Query unit, for inquiring the article library, obtain belonging to be consistent with article in the article list it is multiple Theme and the effective article for meeting aging condition;
Computing unit belongs to phase with effective article for effective article to be belonged to the score of corresponding theme It answers the score of theme to carry out product, obtains the new score of effective article;
Sequencing unit is selected for being sorted with carrying out descending according to the new score of effective article according to ranking results The effective article in preceding part sort to execute and recommend operation.
The embodiment of the invention also provides a kind of electronic equipment, comprising:
Memory, for storing executable instruction;
Processor when for executing the executable instruction stored in the memory, is realized provided in an embodiment of the present invention Article recommended method.
The embodiment of the present invention provides a kind of storage medium, is stored with executable instruction, real when for causing processor to execute Existing article recommended method provided in an embodiment of the present invention.
The embodiment of the present invention has the advantages that
After distributing theme for the article in article pond, by user behavior to the influence degree of theme, realize to theme Adaptively update, enable theme to carry out dynamic in time according to user behavior and adjust, to improve based on being consistent Theme carries out the precision of article recommendation, and then the user experience is improved.
Detailed description of the invention
Fig. 1 is the schematic diagram of the article generating process of LDA model provided in an embodiment of the present invention;
Fig. 2 is an optional configuration diagram of article recommender system 100 provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of article recommendation apparatus 600 provided in an embodiment of the present invention;
Fig. 4 is an optional structural schematic diagram of article recommender system provided in an embodiment of the present invention;
Fig. 5 is the flow diagram of article recommended method provided in an embodiment of the present invention;
Fig. 6 is the schematic diagram provided in an embodiment of the present invention clustered to article vector;
Fig. 7 is the flow diagram of article recommended method provided in an embodiment of the present invention;
Fig. 8 is the flow diagram provided in an embodiment of the present invention for obtaining hidden vector;
Fig. 9 is the interface schematic diagram that user provided in an embodiment of the present invention triggers article request;
Figure 10 is the interface schematic diagram that article is recommended in terminal provided in an embodiment of the present invention display;
Figure 11 is the implementation process schematic diagram of Dynamic Theme model provided in an embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into It is described in detail to one step, described embodiment is not construed as limitation of the present invention, and those of ordinary skill in the art are not having All other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.
In the following description, it is related to " some embodiments ", which depict the subsets of all possible embodiments, but can To understand, " some embodiments " can be the same subsets or different subsets of all possible embodiments, and can not conflict In the case where be combined with each other.
In the following description, related term " first second " be only be the similar object of difference, do not represent needle To the particular sorted of object, it is possible to understand that specific sequence or successively can be interchanged in ground, " first second " in the case where permission Order, so that the embodiment of the present invention described herein can be implemented with the sequence other than illustrating or describing herein.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term used herein is intended merely to the purpose of the description embodiment of the present invention, It is not intended to limit the present invention.
Before the embodiment of the present invention is further elaborated, to noun involved in the embodiment of the present invention and term It is illustrated, noun involved in the embodiment of the present invention and term are suitable for following explanation.
1) recommender system is the tool of a kind of association user and article, can be numerous and complicated based on the focused data of user Article in help user screen their interested articles, for user provide personalization information service.
2) focused data, user pays close attention to the related data of article in article exposure process, including clicking rate, click volume, comments By (like/do not like) etc..
3) attention (Attention) mechanism, for calculating correlation, as translation in Chinese and English vocabulary degree of dependence, Input is not only mapped as a vector by Attention, but by input be mapped as three vectors (be respectively designated as query, Key, value), by the similarity between query and key vector, the weight of value vector is corresponded to as query, it is right Value vector weighted sum obtains attention degree (numerical value).
4) influence degree, when indicating article by user's concern (such as clicking, comment), produced by the theme being consistent to article The measurement of influence.
5) clicking rate, the number of clicks of article and the ratio of light exposure.
6) comment rate, the ratio of the comment number and light exposure of article.
7) vectorization of article vector, article indicates, can be characterized using the weighting of the label vector of article.
8) theme is summarized and is concluded to the content of article, and each theme may include one or more labels (tag, i.e. text The keyword of chapter), each label has certain weight, and (i.e. label is in much journeys for the degree of weight expression tag expression theme Theme is expressed on degree).
9) vectorization of theme vector, theme indicates, is the weighting of the corresponding label vector of label included by theme.
10) label vector, the vectorization of label indicate, label mapping to semantic vector space is obtained (such as pass through The mapping of word2vec model), each point in semantic vector space represents a word, and any two word is in semantic vector space In distance, be positively correlated at a distance from semantically.
11) user draws a portrait, including user interest portrait and user base portrait;Wherein,
User interest portrait, refers to the virtual representations of real user, is built upon a series of target on attribute datas and uses Family model herein refers to the interest model of the stratification of the correspondence user gone out according to the historical behavior data abstraction of user, is used for Indicate the interest classification of user;
User base portrait, according to the true gender of user, the age takes in situation, resides and logs in the user bases such as ground letter Cease the information overall picture of the user of the labeling taken out.
12) in response to the condition or state relied on for indicating performed operation, when the relied on condition of satisfaction Or when state, performed one or more operations be can be in real time, it is possible to have the delay of setting;Do not saying especially In the case where bright, there is no the limitations for executing sequencing for performed multiple operations.
Inventor has found that the recommender system for carrying out article recommendation can be real based on topic model in the course of the research It is existing, in some embodiments, the score that each article belongs to each theme is calculated by topic model.When article is recalled, base Article is read in the past in user and is subordinate to theme, and similar article is pulled from theme and is recommended.
Specifically, topic model allows each article to be made of several themes, the corresponding word distribution of each theme, I.e. each word belongs to a theme with certain probability;Relative to article and the word of composition article, theme is one hidden Variable, the description granularity of theme is between article and word.(Latent Dirichlet is distributed with Di Li Cray Allocation, LDA) topic model is illustrated for model.LDA is by the theme of every article in article library according to probability The form of distribution provides, and Fig. 1 is the schematic diagram of the article generating process of LDA model provided in an embodiment of the present invention, referring to Fig. 1, Sampling generates the theme distribution of article from the distribution of Di Li Cray, generates word in article from sampling in the multinomial distribution of theme Theme, Cong Dili Cray distribution in sampling generate theme word distribution, from the multinomial distribution of word sampling generate word.
By the introduction of above-mentioned topic model it is found that 1) topic model in the related technology needs to carry out by a large amount of corpus Sampling study, carries out model training, needs a large amount of training time and space resources, resource consumption is big;2) model training and more Newly relatively slow, article stronger for real-time cannot be accomplished such as news category article to newly there is the real-time assurance of theme, cannot Rapidly capture emerging hot news;3) topic model in the related technology generally represents a master with one group of vocabulary Topic, each vocabulary includes the weight for belonging to this theme, and weight is obtained based on sampling, and what this will cause theme can Explanatory difference will cause interpretation with the new news of old explanation on theme and further decline simultaneously for news scenes;4) right It is to use based on old topic model after a large amount of such articles generate since its renewal speed is fast in the stronger article of real-time When family carries out personalized recommendation article, due to the inaccuracy of model, the interested article of a large amount of non-user can be made to be called back, base The experience of user is seriously affected in the article promotion expo that these articles recalled are carried out.
It is the above-mentioned technical problem at least solving the relevant technologies, the embodiment of the present invention provides article recommended method, device, electricity Sub- equipment and storage medium more accurately can carry out personalized article to user and recommend.Illustrate the embodiment of the present invention below One optional configuration diagram of the article recommender system 100 of offer.
Referring to fig. 2, an exemplary application is supported to realize, terminal (including terminal 400-1 and terminal 400-2) passes through net Network 300 connects server 200, and network 300 can be wide area network or local area network, or be combination, using wireless Or wire link realizes data transmission.
Server 200, for distributing the theme belonged to by the article in article library 500, i.e., in initialization article library The theme that article is belonged to;The focused data that user is corresponded to according to article determines what the concern behavior pair of user was consistent with article The influence degree of theme;According to influence degree, the theme being consistent with article is updated.
Here, in practical applications, server 200 both can be a service of the support various businesses being separately configured Device can also be configured to a server cluster.
Terminal (such as terminal 400-1), the clicking operation for article for triggering in response to target user are sent corresponding Article request to server 200.
Here, in practical applications, terminal can be various types of for smart phone, tablet computer, laptop etc. User terminal can also be broadcast for wearable computing devices, personal digital assistant (PDA), desktop computer, cellular phone, media Put any two in device, navigation equipment, game machine, television set or these data processing equipments or other data processing equipments Or multiple combination.
Server 200 obtains the article list of target user when being also used to receive article request;In article library In the theme that middle article is belonged to, the theme being consistent with article in article list is determined;It obtains and is belonged to and article in article library The article for the theme that article is consistent in list, and executed according to acquired article and recommend operation.
Terminal (such as terminal 400-1), is also used to the article recommended by user interface display server.
In some embodiments, reading client is provided in terminal, user can carry out article by reading client It reads, such as carries out Domestic News class article and read, reading client has recommendation function, based on the embodiment of the present invention to push away Recommending method is that user carries out article recommendation, and in practical applications, user carries out article request by reading client, and server connects It receives and reads the article request that client is sent, obtain the article list of relative users;The master that article is belonged in article library In topic, the theme being consistent with article in article list is determined;It obtains to belong in article library and be consistent with article in article list The article of theme, and executed according to acquired article and recommend operation, correspondingly, reading the article that client display is recommended.
The structure of article recommendation apparatus provided in an embodiment of the present invention is continued to explain, article recommendation apparatus can be various ends End, such as mobile phone, computer etc., are also possible to server 200 as shown in Figure 2.
It is the structural schematic diagram of article recommendation apparatus 600 provided in an embodiment of the present invention referring to Fig. 3, Fig. 3, it is shown in Fig. 3 Article recommendation apparatus 600 includes: at least one processor 610, memory 650, at least one network interface 620 and user interface 630.Various components in terminal 600 are coupled by bus system 640.It is understood that bus system 640 is for realizing this Connection communication between a little components.Bus system 640 except include data/address bus in addition to, further include power bus, control bus and Status signal bus in addition.But for the sake of clear explanation, various buses are all designated as bus system 640 in Fig. 3.
Processor 610 can be a kind of IC chip, the processing capacity with signal, such as general processor, number Word signal processor (DSP, Digital Signal Processor) either other programmable logic device, discrete gate or Transistor logic, discrete hardware components etc., wherein general processor can be microprocessor or any conventional processing Device etc..
User interface 630 include make it possible to present one or more output devices 631 of media content, including one or Multiple loudspeakers and/or one or more visual display screens.User interface 630 further includes one or more input units 632, packet Include the user interface component for facilitating user's input, for example keyboard, mouse, microphone, touch screen display screen, camera, other are defeated Enter button and control.
Memory 650 can be it is removable, it is non-removable or combinations thereof.Illustrative hardware device includes that solid-state is deposited Reservoir, hard disk drive, CD drive etc..Memory 650 optionally includes one geographically far from processor 610 A or multiple storage equipment.
Memory 650 includes volatile memory or nonvolatile memory, may also comprise volatile and non-volatile and deposits Both reservoirs.Nonvolatile memory can be read-only memory (ROM, Read Only Me mory), and volatile memory can To be random access memory (RAM, Random Access Memor y).The memory 650 of description of the embodiment of the present invention is intended to Memory including any suitable type.
In some embodiments, memory 650 can storing data to support various operations, the example of these data includes Program, module and data structure or its subset or superset, below exemplary illustration.
Operating system 651, including for handle various basic system services and execute hardware dependent tasks system program, Such as ccf layer, core library layer, driving layer etc., for realizing various basic businesses and the hardware based task of processing;
Network communication module 652, for reaching other calculating via one or more (wired or wireless) network interfaces 620 Equipment, illustrative network interface 620 include: bluetooth, Wireless Fidelity (WiFi) and universal serial bus (USB, Universal Serial Bus) etc.;
Module 653 is presented, for via one or more associated with user interface 630 output device 631 (for example, Display screen, loudspeaker etc.) make it possible to present information (for example, for operating peripheral equipment and showing the user of content and information Interface);
Input processing module 654, for one to one or more from one of one or more input units 632 or Multiple user's inputs or interaction detect and translate input or interaction detected.
In some embodiments, article recommendation apparatus provided in an embodiment of the present invention can be realized using software mode, Fig. 3 The article recommendation apparatus 655 being stored in memory 650 is shown, can be the software of the forms such as program and plug-in unit, including Following software module: theme distribution module 6551, the first determining module 6552, theme update module 6553, list obtain module 6554, the second determining module 6555 and article recommending module 6556, these modules are in logic, therefore according to the function realized It can be combined arbitrarily or further split, the function of modules will be described hereinafter.
In further embodiments, article recommendation apparatus provided in an embodiment of the present invention can be realized using hardware mode, As an example, article recommendation apparatus provided in an embodiment of the present invention can be the processor using hardware decoding processor form, It is programmed to perform article recommended method provided in an embodiment of the present invention, for example, the processor of hardware decoding processor form One or more application specific integrated circuit (ASIC, Application Specific Integrated can be used Circuit), DSP, programmable logic device (PLD, Progra mmable Logic Device), complicated programmable logic device Part (CPLD, Complex Programmabl e Logic Device), field programmable gate array (FPGA, Field- Programmable Gate Array) or other electronic components.
Next article recommender system used by article recommendation apparatus provided in an embodiment of the present invention is illustrated.This The article recommender system that inventive embodiments provide can be used for the recommendation scene of various types article, such as news push, the public The push of number article, the push of advertisement article etc.;Wherein, for the recommendation scene of advertisement article push, specifically advertisement can be book Nationality advertisement, corresponding advertisement article are text/text segment/preamble in books, can also be cinema sign, corresponding advertisement text Chapter is film introduction/film comment etc..
Fig. 4 is an optional structural schematic diagram of article recommender system provided in an embodiment of the present invention, referring to fig. 4, this The article recommender system of inventive embodiments includes recalling module 41, sorting module 42, the module that reorders 43, user's portrait module 44 And statistical report module 45.
In some embodiments, module 41 is recalled, for distributing institute by theme distribution module for the article in article library The theme of ownership;The focused data for corresponding to user according to article by the first determining module, determine the concern behavior pair of user with The influence degree for the theme that the article is consistent;By theme update module according to influence degree, the master being consistent with article is updated Topic;
And it for obtaining module when the article for receiving client transmission is requested by list, is drawn a portrait based on user The user's portrait for the relative users that module 44 provides, obtains the article list of user, through the second determining module in article library In the theme that article is belonged to, the theme being consistent with article in article list is determined, and article is obtained by article recommending module The article that the theme being consistent with article in article list is belonged in library recalls result as article;To the greatest extent may be used in this way, realizing Recalling the potential article liked of user energy more.
Sorting module 42, user's portrait of the relative users for being provided based on user's portrait module 44, is calculated user and drawn It as the matching degree with the theme of article, and is ranked up, is obtained to article acquired in module 41 is recalled based on matching degree First ranking results.
The module that reorders 43, for being adjusted to the sequence of article in the first ranking results based on user's portrait, so that The article theme of preceding N (N is not less than the 2 positive integer) piece article that sorts is different, so realizes the master for recommending article Topic diversification;Specifically, as combined user's portrait, article content and current context information (network: wifi or honeycomb;It is geographical Position, the pattern that shows of adjustment article, such as pattern that wifi environment then uses size relatively large), text is recalled to all Chapter is ranked up and position optimization;
Statistical report module 45 for obtaining the focused data of article of the user based on recommendation, and carries out focused data It reports, is based on focused data so that recalling module 41, sorting module 42, the module that reorders 43 and user's portrait module 44 and joins Several and data update, and promote the sequence of the article of user's concern, the experience of user is continuously improved.
Next the exemplary application and implementation for combining server provided in an embodiment of the present invention, illustrate the embodiment of the present invention The article recommended method of offer.Fig. 5 is the flow diagram of article recommended method provided in an embodiment of the present invention, in some implementations In example, this article recommended method can by server implementation, or by server and terminal coordinated implementation, by taking server implementation as an example, Such as implemented by the server 200 in Fig. 2, in conjunction with Fig. 2 and Fig. 5, article recommended method provided in an embodiment of the present invention includes:
Step 501: server distributes the theme belonged to by the article in article library.
In actual implementation, an article can belong to one or more themes, each theme may include one or Multiple labels (tag), label can be the high frequency vocabulary in article, the i.e. keyword of article;Each label has certain power Weight, weight indicate the degree of tag expression theme, i.e. label to what extent expresses theme.
It include plurality of articles in article library, specifically, the article in article library is obtained on line, article in article library Source can be at least one of: 1, being contributed by the user of various flows (such as public platform, social networks);2, by climbing The crawl from network (such as various websites, public platform) of worm technology.
In practical applications, server needs to initialize the article in article library before carrying out article recommendation, i.e., The theme belonged to is distributed by the article in article library, in some embodiments, server can realize article in the following way The distribution of the belonged to theme of article in library:
Server obtains the corresponding power of label and label multiple included by the theme of each article ownership in article library Weight constructs weight matrix;Operation splitting is carried out to weight matrix, obtains the corresponding article vector of each article and each text The corresponding label vector of multiple labels that chapter includes;Gathered based on the corresponding article vector of article in article library Class processing, obtains the theme that the article in article library respectively belongs to.
Based on the above-mentioned theme method of salary distribution it is found that server is poly- by building weight matrix, matrix decomposition and article vector Class realizes the theme distribution of article, and since weight matrix is based on constructed by label and the corresponding weight of label, and label has phase To stabilization, the advantages that strong can be explained, in this way, making the interpretation of the theme belonged to distributed by article stronger;Pass through square Battle array, which decomposes obtained article vector, can effectively represent article, so that based on the accurate of the obtained theme of article vector clusters Du Genggao;And the theme that the article in article library respectively belongs to is obtained by way of cluster, have operation rapid, occupied space It is few, can concurrent operation, the advantages that cluster result interpretation is strong;Next to building weight matrix, matrix decomposition and article to The relevant operation of amount cluster is illustrated respectively.
In some embodiments, the theme that server can obtain each article ownership in article library in the following way is wrapped Include multiple corresponding weights of label and label:
Server determines the corresponding score of word included by each article in article library;Existed according to the score of word The label for selecting the word for meeting score condition to include as corresponding article in each article;Include by the article in each theme Label, according to the weight of corresponding label carry out descending sort;Determine the preceding label for setting quantity or ratio of sequence as The label that theme includes, and label is corresponded to weight of the score as label of corresponding article.
In actual implementation, server can be determined as follows in article library word included by each article respectively Corresponding score: determine that word frequency and word of the word included by each article in belonged to article are in text in article library Reverse document-frequency in Zhang Ku;The product for determining word frequency and reverse document-frequency is the corresponding score of word.In practical application In, the corresponding score of word can be TF-IDF (term freque ncy-inverse document frequency) point Number, specifically, word tiIn article djIn TF-IDF score can be calculated by following formula:
tfidfi,j=tfi,j×idfi; (1)
Wherein, tfi,jFor word tiIn article djIn word frequency, idfiIt is word tiReverse document-frequency.
Here, tfi,jIt can be obtained by following formula:
Wherein, ni,jIt is word tiIn article djThe number of middle appearance, nk,jIndicate word k in article djThe number of middle appearance, That is ∑knk,jIndicate article djIn all words the sum of frequency of occurrence.
IDF is the measurement of a word general importance, a certain certain words tiIDF, can be by total in article library Article number D is divided by including word tiArticle number, it may be assumed thatLogarithm is taken to obtain obtained quotient again, such as Lower formula:
In some embodiments, server is determining for word included by each article, determines and obtains slitting below meeting Label of at least one the part as corresponding article: score exceeds the word of score threshold;Sequence is preceding in the descending sort of score Setting ratio word;The word for the preceding setting quantity that sorts in the descending sort of score.
Here, in practical applications, score threshold, setting ratio and setting quantity can be set according to actual needs It is fixed.
Next matrix decomposition is illustrated.One example of weight matrix is as shown in table 1, and each article includes more A vocabulary, each vocabulary have a weighted value, wherein and Tag1 to Tagt is label vector corresponding to different labels, such as It can be cuisines, sport, weight-reducing;D1 to dj is article vector corresponding to different articles, and weight matrix is article vector sum mark Sign the product of vector, i.e. [weightjt] j=[dj]T*[Tagt]。
In some embodiments, it can be obtained by the matrix decomposition mode of ALS (Alternating Least Squares) The corresponding label vector of multiple labels that each corresponding article vector of article and each article include, it is also referred to as hidden Vector.
Here, the prediction model of ALS matrix decomposition are as follows:
Loss function are as follows:By minimizing loss function, to ask Obtain parameter q (i.e. article vector) and p (i.e. label vector);Label vector represents potential similarity relation between article and label, For example an article and another article describe Similar content, then distance (such as COS distance/Europe between its corresponding vector Family name's distance) it will be closer.
Tag1 Tag2 …… Tagt
d1 Weight11 Weight12 …… Weight1t
d2 Weight21 Weight22 …… Weight2t
…… …… …… …… ……
dj WeightN1 WeightN1 …… WeightN1
Table 1
Next article vector clusters are illustrated.In some embodiments, article vector is obtained based on matrix decomposition Afterwards, clustering processing can be carried out to article vector by way of k mean cluster, obtains the master that the article in article library respectively belongs to Topic;Fig. 6 is the process schematic provided in an embodiment of the present invention clustered to article vector, referring to Fig. 6, to article vector into The process of row cluster specifically includes that (a), input data set and classification number k, and is randomly assigned the position of class center point;(b) The set each point being put into where the class center point nearest from it;(c) set where mobile class center point to it; (d) it repeats (b) and (c), until convergence.
In practical applications, by after the article cluster to multiple themes in article library, there are some labels in a theme Only occur primary (i.e. the exclusive label of certain article), there is also labels identical in a theme to occur twice or more than twice The case where, correspondingly, in some embodiments, it, will be identical when at least two articles include identical label in each theme The weight adductions of corresponding at least two articles of label be determined as new weight;By the corresponding new weight of identical label and often The corresponding weight of exclusive label of article in a theme sorts with carrying out descending, and then can be chosen based on ranking results default Label of the label of quantity as the theme is such as chosen label of preceding 100 labels as theme, is so far completed in article library The initialization of the belonged to theme of article.
Step 502: corresponding to the focused data of user according to article, determine that the concern behavior pair of the user is consistent with article Theme influence degree.
In actual implementation, the theme being consistent with article can be one or more, can be by calculating article and candidate master The similarity of topic determines that specifically the distance between article vector and candidate topics vector can be used in the theme being consistent with article The similarity for characterizing article and candidate topics, as by calculate between article vector and the obtained each theme vector of cluster away from From the theme of the smallest preset quantity of selected distance is as the theme being consistent with article.
Here, when carrying out distance calculating between article vector and theme vector, article vector can be using matrix above point The article vector that solution obtains, still, in practical application, for, the stronger article of timeliness (such as news) fast for renewal speed, The operation of above-mentioned matrix decomposition is since calculation amount is larger, and non real-time generation, it is possible to will appear that " article has been published (example Such as news, it is collected into the focused data of article, but the operation of the matrix decomposition of article does not complete also " the case where, in this case, Using the weighting of the label vector of article as article vector, can timely update close theme, and then guarantee that article is recommended Real-time.
In some embodiments, server can be determined as follows the master that the concern behavior pair of user is consistent with article The influence degree of topic:
According to the focused data of article, the degree of concern to article that the concern behavior of user is characterized is determined, and traverse The theme being consistent with article is to execute following processing:
Article is belonged between the label that the current score for traversing theme and label and current traversal theme include Similarity carries out product, and significance level of the label for including using article in article is weighted product as weight coefficient; By the product of weighted results and degree of concern, the concern behavior as user is to the current influence journey for traversing each label in theme Degree.
Here, server can be determined as follows the degree of concern to article that the concern behavior of user is characterized: According to the click data and exposure data of the correspondence article that focused data includes, the clicking rate of article is determined, alternatively, according to concern The comment data and exposure data for the correspondence article that data include, determine the Assessment Rate of article.
Illustratively, cluster obtains multiple themes, therefrom chooses the highest theme of similarity with article of preset quantity, Such as selected distance current article j, 5 nearest articles, i.e. every article are assigned in 5 themes, and article belongs to current master The similarity (such as COS distance) between article (article vector) and theme (theme vector) can be used to characterize in the score of topic, In When actual implementation, the score that article belongs to current topic can be used normalized COS distance and indicate, such as be assigned to article A X theme (T1To Tx), then article belongs to the score (quantization means of possibility) of current topic are as follows:
Illustratively, comprising k label in the theme currently traversed, then the label t that article A is includediWith theme T1Interior k The similarity of a label can pass throughCharacterization.
In actual implementation, label t that article includesiSignificance level in article A, i.e. label tiWeight and article A The ratio of the weight sum for all labels for being included, i.e.,Wherein,For label tiWeight.
By taking the degree of concern to article is the clicking rate p of article as an example, the concern behavior of user is in current traversal theme The influence degree of each label can be obtained by following formula:
In practical applications, if the light exposure of an article is very high, the point based on the article that the high light exposure obtains The confidence level for hitting rate is higher compared to the clicking rate obtained in the case of low light exposure, i.e. the big concern small compared with data volume of data volume Data are more reliable, correspondingly, in some embodiments, after the focused data according to article determines the degree of concern of article, The value of adjustment degree of concern is adapted with the confidence level of focused data;Specifically, using the degree of concern to article as article It, specifically, can be by following public using the adjusting information based on the median of Wilson's confidence interval for clicking rate p Formula be adjusted after clicking rate pw:
Wherein, n is the exposure frequency of article, and p is the initial clicking rate before adjustment, and z is constant, for adjusting confidence level, Confidence level is 95% if z=1.96.
Correspondingly, being obtained as follows to the influence degree of each label in current traversal theme according to clicking rate adjusted:
Step 503: according to the influence degree, updating the theme being consistent with the article.
In some embodiments, server can update in the following way the theme being consistent with article:
Server is by the weight of each label in the theme being consistent with article, the shadow of the corresponding determination of the theme being consistent with article The degree of sound sums up, and obtains the weight of each tag update.Specifically, the initial weight for k-th of label that theme includes is w1-k, in practical applications, it is based on formula (6), the updated power for k-th of label that theme includes is obtained by following formula Weight
Wherein, w1-kFor the weight for updating preceding k-th of label, corresponding TF-IDF score characterization, the power of update can be used Weight is determined with the concern for indicating user formed based on clicking rate to the influence degree of former weight on former weighted basis Updated value is superimposed updated value on the basis of former weight and obtains.
Here, when the theme being consistent with article does not include whole labels of article, by the label for not including and correspondence Weight be added to the theme being consistent with article;The label that the theme being consistent with article includes is dropped according to corresponding weight It sorts to sequence, deletes part labels according to ranking results.In this way, the weight for realizing label updates, updated by label weight Afterwards, again according to the weight sequencing of label, preceding N (such as 100) a label is taken to represent theme.
Wherein, for the deletion of part labels, it can delete score such as according to actual needs and be lower than the label of score threshold; Alternatively, the label for the posterior setting ratio that sorts in the descending sort of deletion score;Alternatively, being arranged in the descending sort of deletion score The label of the posterior setting quantity of sequence.
Step 504: obtaining the article list of user.
Here, server is executed when receiving target user based on the article request for reading client transmission to target The acquisition of the corresponding article list of user;It is provided with " recommendation " function for example, reading in client, client is in when the user clicks When existing " recommendation " function items, client sends article and requests to server, and server obtains target user's history and reads (point Hit) article crossed, form the article list of corresponding target user.
Step 505: in the theme that article is belonged in article library, determining the theme being consistent with article in article list.
In some embodiments, server can be determined as follows the theme being consistent with article in article list: clothes Business device determines the corresponding article vector of article in article list, theme corresponding with multiple themes that article in article library is belonged to Similarity between vector;The theme for determining that similarity meets similarity condition is the theme being consistent with article in article list.
Here, in actual implementation, the weighting of label included in theme is can be used in theme vector, and similarity meets phase Preset similarity threshold can be reached like degree condition for similarity.
Step 506: obtaining and belong to the article of the theme being consistent with article in article list in article library, and according to being obtained The article taken, which executes, recommends operation.
In some embodiments, server can be realized to be executed according to acquired article in the following way recommends operation: Server inquires article library, obtains belonging to the multiple themes being consistent with article in article list and meets having for aging condition Imitate article;The score that effective article is belonged to corresponding theme, the score for belonging to corresponding theme to effective article carry out product, obtain To the new score of effective article;It is sorted with carrying out descending according to the new score of effective article, according to ranking results selected and sorted The preceding effective article in part recommends operation to execute.
Here, executing recommends operation to refer to the effective article in part the recalling as a result, carrying out as recommender system that will be selected Subsequent sequence and filtration treatment, sorting module result will be recalled be transmitted to recommender system, be ranked up in conjunction with user's portrait and Article screening is further transferred to the module that reorders and carries out the sequence adjustment based on pattern, obtains reordering as a result, most for article The article for choosing preset quantity based on the result that reorders afterwards is sent to user side and reads client.
In actual implementation, in order to guarantee the efficiency run on line, the quick search when needing to recommend article to user can The effective article belonged in the theme being consistent in article list is cached, is such as delayed by key-vaulue mode It deposits, interim, key is the article of the full dose in article library, and the article including effective article and failure, value is to belong to key phase Effective article of theme is accorded with, for key as index, the ID including all articles in article library (includes effective article in article library ID and failure article ID);In this way, ensure that the article of recommendation is all currently valid article.
It clicks the relevant article of article with user in practical applications and (belongs to and click the master that article is consistent with user The article of topic) may be excessive, it is unfavorable for recommending on storage and line, therefore, article relevant to article can be ranked up, The foundation of sequence is under the jurisdiction of the score of theme for the score that key assignments article is under the jurisdiction of theme multiplied by contents value article;That is, for appointing The article that the user that anticipates clicks, chooses whole articles in its corresponding preset quantity (such as 5) theme, and user is clicked The score that article belongs to corresponding theme is corresponding multiplied by the score that can belong to corresponding theme in theme with article, according to score from height to Bottom sequence, the related article for taking the article of preset quantity in the top (such as 50) final as this article.
Illustratively, user clicks article A, and close theme is T1-T5, and each theme respectively has 100 similar articles, right In close article 1-100 (belonging to T1), the score that article A belongs to T1 is calculated, the score that close article 1 belongs to T1 is multiplied;With such It pushes away, obtains the score of 500 close articles, be ranked up according to score, take preceding 50 articles.
Continue to be illustrated article recommended method provided in an embodiment of the present invention, Fig. 7 is provided in an embodiment of the present invention The flow diagram of article recommended method, in some embodiments, this article recommended method can be by server implementations, or by servicing Device and terminal coordinated implementation, by taking terminal and server coordinated implementation as an example, as passed through the terminal 400-1 and server 200 in Fig. 2 Implement, is provided with reading client on terminal 400-1, in conjunction with Fig. 2 and Fig. 7, article recommended method provided in an embodiment of the present invention Include:
Step 701: server construction weight matrix.
Here, an article can belong to one or more themes, and each theme may include one or more labels, Label can be the high frequency vocabulary in article, the i.e. keyword of article;Each label has certain weight, and weight indicates label The degree of theme is expressed, i.e. label to what extent expresses theme.
In actual implementation, server obtain label multiple included by the theme of each article ownership in article library, with And the corresponding weight of label, construct weight matrix.
Illustratively, server calculates the TF-IDF score of word included by each article in article library, by TF-IDF Score reaches label of the word of preset score threshold as corresponding article, according to the TF-IDF score of label, by each master The label that article in topic includes carries out descending sort;Determine the label of the preceding setting quantity of sequence as the corresponding mark of theme Label, correspondingly, the corresponding TF-IDF score of the label is as weight.
Step 702: server decomposes weight matrix.
In actual implementation, server to weight matrix carry out operation splitting, obtain the corresponding article vector of each article, And the corresponding label vector of multiple labels that each article includes, also referred to as hidden vector.
Specifically, Fig. 8 is the flow diagram provided in an embodiment of the present invention for obtaining hidden vector, referring to Fig. 8, first from text Article data is obtained in Zhang Ku, the institute that number, the article that the word that includes including article, respective word occur in article include There is the total quantity etc. of word, the article data for being then based on acquisition obtains the label data of article, each mark for including such as article Weight corresponding to label, each label etc. finally carries out ALS decomposition to the weight of the label of article, obtains hidden vector to get arriving Label vector (label coding) and article vector (article coding).
Step 703: server carries out article cluster, obtains the theme that the article in article library respectively belongs to.
In actual implementation, server carries out clustering processing based on the corresponding article vector of article in article library, Obtain the theme that the article in article library respectively belongs to.
Illustratively, server carries out clustering processing to article vector by the way of k mean cluster, obtains in article library The theme that respectively belongs to of article.
In practical applications, by after the article cluster to multiple themes in article library, there are some labels in a theme Only occur primary (i.e. the exclusive label of certain article), there is also labels identical in a theme to occur twice or more than twice The case where, correspondingly, when at least two articles include identical label in each theme, by identical label corresponding at least two The weight adduction of a article is determined as new weight;By in the corresponding new weight of identical label and each theme article it is only The corresponding weight of some labels sorts with carrying out descending, and then the label of preset quantity can be chosen based on ranking results as should The label of theme such as chooses label of preceding 100 labels as theme, to complete the first of the belonged to theme of article in article library Beginningization.
Step 704: server obtains user's clicking rate corresponding to each article in article library, and according to user's clicking rate Update the theme being consistent with article.
Here, the theme being consistent with article can be one or more, can be similar to candidate topics by calculating article Degree determines the theme being consistent with article, specifically, the distance between article vector and candidate topics vector characterization article can be used It is chosen with the similarity of candidate topics such as by calculating article vector and clustering obtained the distance between each theme vector Theme apart from the smallest preset quantity (such as 5) is as the theme being consistent with article.
In practical applications, if the light exposure of an article is very high, the point based on the article that the high light exposure obtains The confidence level for hitting rate is higher compared to the clicking rate obtained in the case of low light exposure, i.e. the big concern small compared with data volume of data volume Data are more reliable, correspondingly, in actual implementation, it can be by median foundation formula (5) of Wilson's confidence interval to user Clicking rate is adjusted, and specifically can based on the weight that user's clicking rate adjusted updates each label in corresponding theme To update the weight of each label in corresponding theme according to formula (7), to realize the update of theme.
Step 705: terminal sends article and requests to server.
Fig. 9 is the interface schematic diagram that user provided in an embodiment of the present invention triggers article request, referring to Fig. 9, is reading visitor In the interface that family end is presented, " recommend " when the user clicks corresponding to button when, triggering read client by terminal transmission Article is requested to server, to obtain the article recommended.
Step 706: server obtains article list corresponding to target user.
In actual implementation, the article request that server parsing receives gets the entrained target of article request and uses Family mark is identified the article for obtaining target user's history and clicking based on target user, forms corresponding article list.
Step 707: server determines the theme being consistent with article in article list.
In actual implementation, server determines the corresponding article vector of article in article list, with article institute in article library Similarity between the corresponding theme vector of multiple themes of ownership;Determine similarity reach similarity threshold theme be with text The theme that article is consistent in Zhang Liebiao.
Step 708: server carries out article screening from article library, obtains article to be recommended according to the theme.
In actual implementation, server can realize that article screens in the following way:
Server inquires article library, obtains belonging to the multiple themes being consistent with article in article list and meets timeliness Effective article of condition;The score that effective article is belonged to corresponding theme, to effective article belong to the score of corresponding theme into Row product obtains the new score of effective article;It is sorted with carrying out descending according to the new score of effective article, according to ranking results Effective article of the preceding preset quantity of selected and sorted is as article to be recommended.
Step 709: server sends article to be recommended to terminal.
Step 710: the article that terminal display server is recommended.
Here, it is the interface schematic diagram of terminal provided in an embodiment of the present invention display recommendation article referring to Figure 10, Figure 10, rings The text that server is recommended should be can be seen that by Figure 10 in the article request that user triggers, the article that terminal presence server is recommended Chapter is different classes of article.
Continue to be illustrated article recommended method provided in an embodiment of the present invention.
In practical applications, an article may include one or more themes, and vocabulary is presented centainly in each theme Distribution, article, theme, vocabulary distribution relation constitute topic model;Topic model be it is a kind of it is important recall algorithm, be used for Realize that user clicks recalling for the related article of article, for example user has seen the article of A Dream of Red Mansions theme, then under current topic Other articles be the user potential article interested, as Jia Baoyu is relevant, the relevant article of the Chen Xiao rising sun.In order to improve Topic model recalls the accuracy and timeliness of article, improves the efficiency on user experience and line, and the embodiment of the present invention proposes base In the Dynamic Theme model of attention mechanism, module reality is recalled in the article recommender system that provides through the embodiment of the present invention It is existing, it is described below.
Referring to fig. 4, Dynamic Theme model, which is located at, recalls module, and the focused data of user (is clicked;Comment) it is used as feedback letter Breath is used to adjust the parameter information of all modules in recommender system.Wherein, the focused data of user reflects active user group To the focus of news, the challenge to tag weight distribution in theme (i.e. dynamic updates) is realized using attention mechanism herein. When user carries out article request on line, recall that module is as much as possible to recall the potential article liked of user, by sorting module With the article sequence and display position optimization of the module that reorders, article is finally recommended to user, so that user clicks oneself The article liked is read, and true click data will be reported to recommender system, disparate modules based on the focused data of user into The experience of user is continuously improved to promote the sequence of the article of user's concern in row optimization and update.
Fig. 8 illustrates the process for obtaining article vector and label vector (i.e. hidden vector) herein, as shown in figure 8, in reality In, each article includes multiple vocabulary, and each vocabulary has a weighted value, and weight matrix sees above middle table 1, Tagt indicates t-th of label, and by taking article includes three label Tag1 to Tag3 as an example, Tag1 to Tag3 is respectively different mark Label, such as cuisines, sport, weight-reducing;The label of article can be the relatively high word of word frequency, calculate the TF-IDF of each label in article Initial weight of the score as label;The matrix decomposition mode for being based further on ALS decomposes weight, obtains article vector And the label vector that this article is included, label vector represent potential similarity relation between article and label, such as a text Chapter and another article describe Similar content, then the distance (such as COS distance/Euclidean distance) between its corresponding vector will It can be closer.
Figure 11 is the implementation process schematic diagram of Dynamic Theme model provided in an embodiment of the present invention, dynamic based on attention mechanism The realization theme of state updates, referring to Figure 11, based on the obtained article coding (i.e. article vector) of matrix decomposition mode, using k The mode of mean cluster obtains initial subject, and the article for belonging to the same cluster at this time belongs to the same theme, packet in each theme Plurality of articles are contained, preferable theme is representative although the theme of article vector composition has, stronger for timeliness The life cycle of article (such as news category article) is shorter, this reduces the stability of theme, can in order to improve article theme Multiple label vectors can be used to represent a theme in scalability.
Illustratively, the article in article pond is clustered to after multiple themes, to each article being under the jurisdiction of in each theme Corresponding tag is extracted, to occurring multiple label (multiple articles occur) i.e. in a theme in a theme, Then its weight (i.e. the initial weight of weight of the word in each article, label can use TFIDF score) is summed up Processing, and the weight of the various words after adduction is taken to be ranked up, take label of preceding 100 words as theme.
In order to realize the dynamic adjustment to topic model, in actual implementation, can be used based on attention mechanism theme more New paragon;First using the adjusting information based on the median of Wilson's (Wilson) confidence interval to the clicking rate of article It is modified, as shown in formula (5), wherein p represents initial clicking rate, pwFor revised clicking rate, n is exposure Number, z are used to adjust confidence level, and confidence level is 95% if z=1.96;Fiducial interval range is not provided herein, is used only The median of confidence interval is as clicking rate, from formula as can be seen that if the exposure of an article is very high, revised point The rate of hitting is approximately equal to clicking rate, if exposure reduces, correcting can bring clicking rate to reduce.
Illustratively, it is assumed that every article belongs to 5 themes, to need to lead similar in each document match 5 Topic, that is, the theme vector of the article vector of article and multiple candidate topics is calculated into similarity, finds similarity highest 5 Theme, to update the weight of adjacent topics interior label according to the clicking rate of article.Theme vector by the label in theme vector Weighting;In order to guarantee the comparativity between vector, article vector weights to obtain using the label vector for belonging to article, that is, text The weighting of the corresponding label vector of the affiliated theme of chapter;The article vector that article vector can also be obtained using matrix decomposition above, But in practical application, the fast speed that article generates, and the operation of matrix decomposition is since calculation amount is larger, is not to produce in real time It is raw, it is possible to will appear that " article has been published (such as the very strong news of timeliness, and be collected into the attention number of article According to), but the operation of the matrix decomposition of article does not complete also " the case where, in this case, use the weighting of the label vector of article Guarantee the real-time that article is recommended as article vector so as to the close theme that timely updates);By calculate article to Similarity (such as COS distance) between amount and theme vector, each article are assigned in 5 themes, are under the jurisdiction of the journey of theme Degree indicated using normalized COS distance, for example 5 themes nearest apart from article A are the theme T1, T2, T3, T4, T5, away from From being distributed as Then A belongs to theme T1Subjection degree be
If an article is exposed and is clicked on line, in order to realize that click behavior updates the dynamic of theme, Label weight in the theme being subordinate to this article carries out weight update using attention mechanism, is next with an article A Example is illustrated weight update.
Illustratively, the article A for being exposed and being clicked includes 3 tag, is denoted as ti(i value is in 1-3), i.e. article A packet Label t1, t2 and t3 are included, weight is respectively w1, w2, w3.The revised clicking rate of article is pw, apart from 5 of article A recently Theme is the theme Tj(j value 1-5), i.e. T1、T2、T3、T4、T5, distance is respectivelyWith Theme T1For, theme T1Comprising 100 tag, it is distributed as t1-k(k value 1-100), respectively t1-1, t1-2... ..., t1-100, The weight of this 100 tag is respectively w1-k(k value 1-100), i.e. w1-1, w1-2... ..., w1-100;The label t of articleiWith this 100 The COS distance of a tag is respectivelyAccording to the revised clicking rate of article A, pass through aforementioned public affairs Formula (7) is realized to k-th of label t in theme1-kThe weight of (k value is between 1-100) updates.
By formula (7) it can be seen that updated weight reflects the original weight of label and current slot user is closed Explanatory notes chapter on the clicking rate of feedback (influence) article of theme by with " weight for the tag that article is included, article belong to theme Degree and its contained tag and theme in tag similarity " product, the weight of label in theme is fed back.
If theme belonging to article does not include all tag of article, the tag of article is added in theme, and Weight of the weight of the tag using in article as tag corresponding in theme, the final sequence for participating in whole tag weight in theme, process After tag weight updates, again according to the weight sequencing of tag, preceding 100 tag is taken to represent theme.
It is the theme update mode of article above, by recording the theme of article in topic model, and according to article Focused data in real time/regularly update the theme of article.It, can be by the article being clicked and close theme in actual recommendation application Article cached, with need to user recommend article when quick search.
In order to guarantee the efficiency run on line, the article of multiple themes similar in article can be buffered in Redis data In library, Redis is that all articles specifically (are included having for current article pond first by the storage system of quickly reading key-value pair Effect article and the article of no longer effective property) be mapped to it is described similar in multiple themes, and obtain article and close theme Similarity degree, when carrying out Redis storage, key assignments (key) is partially all articles (including failure and the effective text in article pond Chapter) index, contents value (value) part is then only stored does not fail with close theme (such as 5) of the article indexed Article;The article of multiple themes similar in the article represented with key is stored in key-value relationship, in value, and It is the article not failed;In this way, ensure that the article of recommendation is all currently valid article.
In actual implementation, since article relevant with user's click article may be excessive, it is unfavorable on storage and line Recommend, article relevant with article can be ranked up, the foundation of sequence be key assignments article be under the jurisdiction of the score of theme multiplied by Contents value article is under the jurisdiction of the score of theme.
Illustratively, the article clicked for any one user, chooses whole articles in its corresponding 5 theme, The score that the article that user clicks is belonged to corresponding theme is corresponding multiplied by the score that can belong to corresponding theme in theme with article, presses It sorts from high in the end according to score, takes related article of the article of preset quantity in the top (such as 50) as this article.
Illustratively, user clicks article A, and close theme is T1-T5, and each theme respectively has 100 similar articles, right In close article 1-100 (belonging to T1), the score that article A belongs to T1 is calculated, multiplies the score that close file 1 belongs to T1, with such It pushes away, obtains the score of 500 close articles, be ranked up according to score, take preceding 50 articles.
When receiving user's request the article that user clicked in the past is pulled first, then when recommended on line The article that key partial query was clicked from Redis reads in the corresponding part value and clicked the relevant default of article These related articles are returned to recommendation as the potential interested article of user as article is recalled by the article of quantity theme System;The sorting module of recommender system and reorder module combination user portrait, article content and current context information (such as net Network: wifi or honeycomb;Geographical location, the pattern that adjustment article is shown, for example, the sample relatively large using size for wifi environment Formula), all articles of recalling are ranked up and position optimization, finally formed article list is recommended into user.
Continue to article recommendation apparatus row explanation provided in an embodiment of the present invention.It is provided in an embodiment of the present invention referring to Fig. 3 Article recommendation apparatus 655 includes:
Theme distribution module 6551, for distributing the theme belonged to by the article in article library;
First determining module 6552 determines the pass of the user for corresponding to the focused data of user according to the article The influence degree for the theme that note behavior pair is consistent with the article;
Theme update module 6553, for updating the theme being consistent with the article according to the influence degree;
List obtains module 6554, for obtaining the article list of user;
Second determining module 6555, it is determining with the article in article is belonged in the article library theme The theme that article is consistent in list;
Article recommending module 6556 is consistent for obtaining to belong in the article library with article in the article list The article of theme, and executed according to acquired article and recommend operation.
In some embodiments, the theme distribution module includes:
Construction unit, for obtain label multiple included by the theme of each article ownership in the article library and The corresponding weight of the label, constructs weight matrix;
Decomposition unit, for the weight matrix carry out operation splitting, obtain the corresponding article vector of each article, with And the corresponding label vector of multiple labels that each article includes;
Cluster cell is obtained for carrying out clustering processing based on the corresponding article vector of article in the article library The theme respectively belonged to the article in the article library.
In some embodiments, the construction unit is also used to determine word included by each article in the article library The corresponding score of language;
The word for meeting score condition is selected to include as corresponding article in each article according to the score of the word Label;
The label for including by the article in each theme is arranged with carrying out descending according to the weight of the correspondence label Sequence;
Determine the label that the preceding label for setting quantity or ratio of sequence includes as the theme, and by the label pair Answer the score of corresponding article as the weight of the label.
In some embodiments, the construction unit is also used to determine word included by each article in the article library Language is in the reverse document-frequency of word frequency and the word in the article library in belonged to article;
The product for determining the word frequency and the reverse document-frequency is the corresponding score of the word.
In some embodiments, the construction unit is also used to when at least two articles include phase in each theme With label when, by the identical label correspond at least two article weight sum it up be determined as new weight;
By the corresponding power of exclusive label of article in the identical corresponding new weight of label and each theme Weight sorts with carrying out descending.
In some embodiments, first determining module, is also used to the focused data according to the article, determine described in The degree of concern to the article that the concern behavior of user is characterized;
The theme being consistent with the article is traversed to execute following processing:
The article is belonged to the current score (quantization means of possibility) for traversing theme and the label and worked as The similarity between label that preceding traversal theme includes carries out product, and the label for including with the article is in the article Significance level is weight coefficient, is weighted to the product;
By the product of weighted results and the degree of concern, the concern behavior as the user is in current traversal theme The influence degree of each label.
In some embodiments, first determining module, is also used to described in the correspondence for including according to the focused data The click data and exposure data of article determine the clicking rate of the article, or
According to the comment data and exposure data of the correspondence article that the focused data includes, the article is determined Assessment Rate.
In some embodiments, described device further include:
Module is adjusted, for after the degree of concern for determining the article according to the focused data of the article,
The value for adjusting the degree of concern is adapted with the confidence level of the focused data.
In some embodiments, the theme update module, each mark in the theme for being also used to be consistent with the article The influence degree of the weight of label, determination corresponding with the theme that the article is consistent sums up, and obtains the power of each tag update Weight.
In some embodiments, the theme update module is also used to when the theme being consistent with the article not include institute When stating whole labels of article, the label for not including and corresponding weight are added to the theme being consistent with the article;
Include label by the theme being consistent with the article to be sorted with carrying out descending according to corresponding weight, be tied according to sequence Fruit deletes part labels.
In some embodiments, second determining module is also used to determine the corresponding text of article in the article list Zhang Xiangliang, the similarity between theme vector corresponding with multiple themes that article in the article library is belonged to;
Determine that similarity meets the theme of similarity condition and is and article is consistent in the article list theme.
In some embodiments, the subject recommending module includes:
Query unit, for inquiring the article library, obtain belonging to be consistent with article in the article list it is multiple Theme and the effective article for meeting aging condition;
Computing unit belongs to phase with effective article for effective article to be belonged to the score of corresponding theme It answers the score of theme to carry out product, obtains the new score of effective article;
Sequencing unit is selected for being sorted with carrying out descending according to the new score of effective article according to ranking results The effective article in preceding part sort to execute and recommend operation.
It need to be noted that: above is referred to the description of device, be with above method description it is similar, with having for method Beneficial effect description, does not repeat them here, for undisclosed technical detail in described device of the embodiment of the present invention, please refers to present invention side The description of method embodiment.
The embodiment of the invention also provides a kind of electronic equipment, the electronic equipment includes:
Memory, for storing executable program;
Processor when for executing the executable program stored in the memory, is realized provided in an embodiment of the present invention Above-mentioned article recommended method.
The embodiment of the present invention also provides a kind of storage medium for being stored with executable instruction, wherein being stored with executable finger It enables, when executable instruction is executed by processor, processor will be caused to execute article recommended method provided in an embodiment of the present invention.
There are following advantageous effects using the embodiment of the present invention:
1) based on the Dynamic Theme model of attention mechanism, first from having pulled in current article pond article and its right on line The tag answered obtains article vector sum tag vector by the way of matrix decomposition.Article vector is then based on using k mean cluster Mode carried out theme division, the article tag in theme is extracted and is counted, with frequency it is higher (such as frequency sort Preceding setting quantity (100)/ratio) tag represent each theme, realize the accurate expression to article and label.
2) the adaptive adjustment of theme is realized, with click data actual on line (such as click volume/clicking rate) for base Plinth is updated the tag weight in theme by the way of based on attention mechanism.Each article belongs to a theme Possibility is represented by the similarity (such as COS distance) of each theme center vector of tag vector sum of article itself.
3) it when the enterprising style of writing chapter of line is recommended, reads theme belonging to (or concern) article in the past based on user, carries out (with text Similar in the affiliated theme of chapter) recommendation of theme (such as 5) interior article.Due to topic model can adaptive updates, energy Enough theme distributions for rapidly obtaining current article pond, and article recommendation can be carried out based on this.This makes the recommendation of article more Add the interest for meeting user, improves the accuracy that topic model is recalled, the user experience is improved.
This can be accomplished by hardware associated with program instructions for all or part of the steps of embodiment, and program above-mentioned can be with It is stored in a computer readable storage medium, which when being executed, executes step including the steps of the foregoing method embodiments;And Storage medium above-mentioned includes: movable storage device, random access memory (RAM, Random Access Memory), read-only The various media that can store program code such as memory (ROM, Read-Only Memory), magnetic or disk.
If alternatively, the above-mentioned integrated unit of the present invention is realized in the form of software function module and as independent product When selling or using, it also can store in a computer readable storage medium.Based on this understanding, the present invention is implemented The technical solution of example substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, The computer software product is stored in a storage medium, including some instructions are used so that computer equipment (can be with It is personal computer, server or network equipment etc.) execute all or part of each embodiment the method for the present invention. And storage medium above-mentioned includes: that movable storage device, RAM, ROM, magnetic or disk etc. are various can store program code Medium.
The above, only the embodiment of the present invention, are not intended to limit the scope of the present invention.It is all in this hair Made any modifications, equivalent replacements, and improvements etc. within bright spirit and scope, be all contained in protection scope of the present invention it It is interior.

Claims (15)

1. a kind of article recommended method, which is characterized in that the described method includes:
The theme belonged to is distributed by the article in article library;
The focused data that user is corresponded to according to the article determines the master that the concern behavior pair of the user is consistent with the article The influence degree of topic;
According to the influence degree, the theme being consistent with the article is updated;
Obtain the article list of user;
In the theme that article is belonged in the article library, the theme being consistent with article in the article list is determined;
Obtain belonged in the article library with the article for the theme that article is consistent in the article list, and according to acquired Article, which executes, recommends operation.
2. the method according to claim 1, wherein the article by article library distributes the master belonged to Topic, comprising:
Obtain label and the corresponding power of the label multiple included by the theme of each article ownership in the article library Weight constructs weight matrix;
Operation splitting is carried out to the weight matrix, the corresponding article vector of each article is obtained and each article includes Multiple corresponding label vectors of label;
Clustering processing is carried out based on the corresponding article vector of article in the article library, obtains the text in the article library The theme that chapter respectively belongs to.
3. according to the method described in claim 2, it is characterized in that, obtaining the theme institute of each article ownership in the article library Including multiple labels and the corresponding weight of the label, comprising:
Determine the corresponding score of word included by each article in the article library;
The mark for selecting the word for meeting score condition to include as corresponding article in each article according to the score of the word Label;
The label for including by the article in each theme sorts with carrying out descending according to the weight of the correspondence label;
It determines the label that the preceding label for setting quantity or ratio of sequence includes as the theme, and the label is corresponded into phase Answer the score of article as the weight of the label.
4. according to the method described in claim 3, it is characterized in that, in the determination article library included by each article The corresponding score of word, comprising:
Determine that word frequency and the word of the word included by each article in belonged to article are in institute in the article library State the reverse document-frequency in article library;
The product for determining the word frequency and the reverse document-frequency is the corresponding score of the word.
5. according to the method described in claim 3, it is characterized in that, the method also includes:
It is when at least two articles include identical label in each theme, the identical label correspondence is described at least The weight adduction of two articles is determined as new weight;
By the corresponding weight of exclusive label of article in the identical corresponding new weight of label and each theme, into Sort to row descending.
6. the method according to claim 1, wherein the focused data for corresponding to user according to the article, Determine the influence degree for the theme that the concern behavior pair of the user is consistent with the article, comprising:
According to the focused data of the article, the concern journey to the article that the concern behavior of the user is characterized is determined Degree;
The theme being consistent with the article is traversed to execute following processing:
The article is belonged into the label of the current score for traversing theme and the article and current traversal theme includes Similarity between label carries out product, and significance level of the label for including using the article in the article is weight system Number, is weighted the product;
By the product of weighted results and the degree of concern, the concern behavior as the user is to each in current traversal theme The influence degree of label.
7. according to the method described in claim 6, it is characterized in that, the focused data according to the article, determine described in The degree of concern to the article that the concern behavior of user is characterized, comprising:
According to the click data and exposure data of the correspondence article that the focused data includes, the click of the article is determined Rate;Alternatively,
According to the comment data and exposure data of the correspondence article that the focused data includes, the evaluation of the article is determined Rate.
8. according to the method described in claim 6, it is characterized in that, the method also includes:
After the degree of concern for determining the article according to the focused data of the article, the value of the degree of concern is adjusted It is adapted with the confidence level of the focused data.
9. the method according to claim 1, wherein described according to the influence degree, update and the article The theme being consistent, comprising:
By the weight of each label in the theme being consistent with the article, the influence of determination corresponding with the theme that the article is consistent Degree sums up, and obtains the weight of each tag update.
10. according to the method described in claim 9, it is characterized in that, the method also includes:
When the theme being consistent with the article does not include whole labels of the article, by the label for not including and corresponding Weight is added to the theme being consistent with the article;
Include label by the theme being consistent with the article to be sorted with carrying out descending according to corresponding weight, be deleted according to ranking results Except part labels.
11. the method according to claim 1, wherein the theme that article is belonged in the article library In, determine the theme being consistent with article in the article list, comprising:
Determine multiple themes pair that article is belonged in the corresponding article vector of article, with the article library in the article list The similarity between theme vector answered;
Determine that similarity meets the theme of similarity condition and is and article is consistent in the article list theme.
12. the method according to claim 1, wherein described obtain belongs to described be consistent in the article library Theme article, and according to the article of the acquired theme being consistent execute recommend operation, comprising:
The article library is inquired, obtain belonging to the multiple themes being consistent with article in the article list and meets timeliness item Effective article of part;
The score that effective article is belonged to corresponding theme, the score for belonging to corresponding theme to effective article are multiplied Product, obtains the new score of effective article;
It is sorted with carrying out descending according to the new score of effective article, it is effective according to the preceding part of ranking results selected and sorted Article recommends operation to execute.
13. a kind of article recommendation apparatus, which is characterized in that described device includes:
Theme distribution module, for distributing the theme belonged to by the article in article library;
First determining module determines the concern behavior pair of the user for corresponding to the focused data of user according to the article The influence degree for the theme being consistent with the article;
Theme update module, for updating the theme being consistent with the article according to the influence degree;
List obtains module, for obtaining the article list of user;
Second determining module, it is determining with the article list Chinese in article is belonged in the article library theme The theme of Zhang Xiangfu;
Article recommending module belongs to and the text for the theme that article is consistent in the article list for obtaining in the article library Chapter, and executed according to acquired article and recommend operation.
14. a kind of electronic equipment, which is characterized in that the electronic equipment includes:
Memory, for storing executable instruction;
Processor when for executing the executable instruction stored in the memory, realizes any one of claims 1 to 12 institute The article recommended method stated.
15. a kind of storage medium, which is characterized in that the storage medium is stored with executable instruction, for causing processor to be held When row, article recommended method described in any one of claims 1 to 12 is realized.
CN201910759959.6A 2019-08-16 2019-08-16 Article recommendation method and device, electronic equipment and storage medium Active CN110472016B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910759959.6A CN110472016B (en) 2019-08-16 2019-08-16 Article recommendation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910759959.6A CN110472016B (en) 2019-08-16 2019-08-16 Article recommendation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110472016A true CN110472016A (en) 2019-11-19
CN110472016B CN110472016B (en) 2024-04-12

Family

ID=68510879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910759959.6A Active CN110472016B (en) 2019-08-16 2019-08-16 Article recommendation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110472016B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651584A (en) * 2020-04-17 2020-09-11 世纪保众(北京)网络科技有限公司 Insurance article recommendation method based on user behavior characteristics and article attributes
CN111753151A (en) * 2020-06-24 2020-10-09 广东科杰通信息科技有限公司 Service recommendation method based on internet user behaviors
CN112182414A (en) * 2020-08-13 2021-01-05 亿存(北京)信息科技有限公司 Article recommendation method and device and electronic equipment
CN113032556A (en) * 2019-12-25 2021-06-25 厦门铠甲网络股份有限公司 Method for forming user portrait based on natural language processing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130218960A1 (en) * 2012-02-20 2013-08-22 Yahoo! Inc. Method and system for providing a structured topic drift for a displayed set of user comments on an article
CN107103049A (en) * 2017-03-31 2017-08-29 努比亚技术有限公司 A kind of recommendation method and the network equipment
CN107992542A (en) * 2017-11-27 2018-05-04 中山大学 A kind of similar article based on topic model recommends method
CN109885773A (en) * 2019-02-28 2019-06-14 广州寄锦教育科技有限公司 A kind of article personalized recommendation method, system, medium and equipment
CN109885674A (en) * 2019-02-14 2019-06-14 腾讯科技(深圳)有限公司 A kind of determination of theme label, information recommendation method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130218960A1 (en) * 2012-02-20 2013-08-22 Yahoo! Inc. Method and system for providing a structured topic drift for a displayed set of user comments on an article
CN107103049A (en) * 2017-03-31 2017-08-29 努比亚技术有限公司 A kind of recommendation method and the network equipment
CN107992542A (en) * 2017-11-27 2018-05-04 中山大学 A kind of similar article based on topic model recommends method
CN109885674A (en) * 2019-02-14 2019-06-14 腾讯科技(深圳)有限公司 A kind of determination of theme label, information recommendation method and device
CN109885773A (en) * 2019-02-28 2019-06-14 广州寄锦教育科技有限公司 A kind of article personalized recommendation method, system, medium and equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113032556A (en) * 2019-12-25 2021-06-25 厦门铠甲网络股份有限公司 Method for forming user portrait based on natural language processing
CN111651584A (en) * 2020-04-17 2020-09-11 世纪保众(北京)网络科技有限公司 Insurance article recommendation method based on user behavior characteristics and article attributes
CN111753151A (en) * 2020-06-24 2020-10-09 广东科杰通信息科技有限公司 Service recommendation method based on internet user behaviors
CN111753151B (en) * 2020-06-24 2023-09-15 广东科杰通信息科技有限公司 Service recommendation method based on Internet user behavior
CN112182414A (en) * 2020-08-13 2021-01-05 亿存(北京)信息科技有限公司 Article recommendation method and device and electronic equipment

Also Published As

Publication number Publication date
CN110472016B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
Reddy et al. Content-based movie recommendation system using genre correlation
WO2021159776A1 (en) Artificial intelligence-based recommendation method and apparatus, electronic device, and storage medium
CN110472016A (en) Article recommended method, device, electronic equipment and storage medium
US9552555B1 (en) Methods, systems, and media for recommending content items based on topics
Balakrishnan et al. Collaborative ranking
CN109902708A (en) A kind of recommended models training method and relevant apparatus
CN102591942B (en) Method and device for automatic application recommendation
CN110532479A (en) A kind of information recommendation method, device and equipment
CN110321422A (en) Method, method for pushing, device and the equipment of on-line training model
CN108804567A (en) Improve method, equipment, storage medium and the device of intelligent customer service response rate
CN110378434A (en) Training method, recommended method, device and the electronic equipment of clicking rate prediction model
CN105723402A (en) Systems and methods for determining influencers in a social data network
CN110909182A (en) Multimedia resource searching method and device, computer equipment and storage medium
CN107038184B (en) A kind of news recommended method based on layering latent variable model
CN112052387B (en) Content recommendation method, device and computer readable storage medium
Tsai et al. Mobile social media networks caching with convolutional neural network
CN106815310A (en) A kind of hierarchy clustering method and system to magnanimity document sets
CN110008397A (en) A kind of recommended models training method and device
CN109690581A (en) User guided system and method
CN106484889A (en) The flooding method and apparatus of Internet resources
Huang et al. Personalized micro-video recommendation via hierarchical user interest modeling
Tran et al. CupMar: A deep learning model for personalized news recommendation based on contextual user-profile and multi-aspect article representation
CN114254615A (en) Volume assembling method and device, electronic equipment and storage medium
CN106844365A (en) The application message method for pushing and device of a kind of application distribution platform
Kang et al. Friend relationships recommendation algorithm in online education platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20221118

Address after: 1402, Floor 14, Block A, Haina Baichuan Headquarters Building, No. 6, Baoxing Road, Haibin Community, Xin'an Street, Bao'an District, Shenzhen, Guangdong 518133

Applicant after: Shenzhen Yayue Technology Co.,Ltd.

Address before: Room 1601-1608, Floor 16, Yinke Building, 38 Haidian Street, Haidian District, Beijing

Applicant before: Tencent Technology (Beijing) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant