CN112801760A - Sequencing optimization method and system of content personalized recommendation system - Google Patents

Sequencing optimization method and system of content personalized recommendation system Download PDF

Info

Publication number
CN112801760A
CN112801760A CN202110338178.7A CN202110338178A CN112801760A CN 112801760 A CN112801760 A CN 112801760A CN 202110338178 A CN202110338178 A CN 202110338178A CN 112801760 A CN112801760 A CN 112801760A
Authority
CN
China
Prior art keywords
content
user
vector
sorting
recall
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110338178.7A
Other languages
Chinese (zh)
Inventor
崔成龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Lanjingren Network Technology Co ltd
Original Assignee
Nanjing Lanjingren Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Lanjingren Network Technology Co ltd filed Critical Nanjing Lanjingren Network Technology Co ltd
Priority to CN202110338178.7A priority Critical patent/CN112801760A/en
Publication of CN112801760A publication Critical patent/CN112801760A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • General Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a sequencing optimization method and a sequencing optimization system of a content personalized recommendation system, wherein the method comprises the following steps: acquiring a user click operation, recalling and generating a list of contents to be sorted for preliminary screening; (II) scoring the list of the contents to be sorted of the initial screening according to a sorting model to generate an initial content-sorting score association vector; and thirdly, performing secondary sorting on the initial content-sorting score association vector based on a self-adaptive strategy to obtain a final sorting result. The method solves the problem of viscosity accuracy between users and contents, and performs adaptive sampling and aggregation aiming at the push content list of various upstream recall strategies to generate the content push list with richer varieties and more accurate individuality, thereby realizing the diversity of the individuality and accurate recommendation content varieties. The invention improves the accuracy of the product recommendation system.

Description

Sequencing optimization method and system of content personalized recommendation system
Technical Field
The invention relates to a recommendation system sorting method and a recommendation system sorting system, in particular to a sorting optimization method and a sorting optimization system of a content personalized recommendation system.
Background
At present, in a content community platform product of the internet, a personalized accurate recommendation system is the technical core of the product. In order to improve the use experience of the product in the community for the old users, the content which can be positively fed back by the users needs to be pushed. The target is realized by the cooperation of key links such as recall, rough arrangement, fine arrangement, rearrangement and the like in the recommendation system.
In the content ordering link, a content list which is really interested in is pushed to each user on the premise that the user does not have explicit behavior. To achieve the effect of personalized and accurate pushing of users, three service requirements need to be met: firstly, the characteristic rule of the historical click behavior of the user needs to be considered; second, there is a need to avoid pushing results for banner party content or single categories; third, it is important to ensure diversity of recommended content, so that users feel a novel experience of "familiar and strange" with respect to content push.
Most of the current sorting methods have the following problems: 1. only user behavior data (such as reading duration, reading completion rate, praise appreciation and the like) are considered; 2. obtaining a relatively rough user-content association vector in a relatively single depth model mode; 3. the upstream recall strategy is too single to carry out strategy modification of content diversity in the sequencing link.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a recommendation system sequencing optimization method with high user-content viscosity accuracy. The invention also aims to provide a sequencing optimization system based on the sequencing optimization method.
The technical scheme is as follows: the sequencing optimization method of the content personalized recommendation system comprises the following steps:
acquiring a user click operation, recalling and generating a list of contents to be sorted for preliminary screening;
(II) scoring the list of the contents to be sorted of the initial screening according to a sorting model to generate an initial content-sorting score association vector;
and thirdly, performing secondary sorting on the initial content-sorting score association vector based on a self-adaptive strategy to obtain a final sorting result.
Further, in the step (a), the list of contents to be initially screened is a list of content ids related to the historical click data of the user.
Further, in the step (two), the ranking model includes a double tower model.
Preferably, the step (two) includes:
(21) extracting user characteristic information and content characteristic information according to the list of the contents to be sorted which are preliminarily screened;
(22) according to different sorting models, evaluating the metadata or respectively evaluating the metadata after the user characteristic information and the content characteristic information are combined, and selecting the sorting model with the highest score as an actual sorting model;
(23) in an off-line training stage of the recommendation system, the user characteristic information and the content characteristic information are respectively input into the actual sequencing model to obtain a user embedded vector and a content embedded vector with the same dimension;
(24) performing dot product calculation on the user embedded vector and the content embedded vector, performing cross entropy loss calculation on a dot product value and a sample label value clicked by the user, and performing backward propagation to optimize network parameters of an actual sequencing model;
(25) inputting the user characteristic information and the content characteristic information to be sorted into the optimized actual sorting model, and taking the dot product result of the model output vector as a sorting score to obtain an initial content-sorting score association vector.
Further, the user feature information includes: the content feature vector of the user click sequence, the content feature vector of the user portrait index and the content feature vector of the user favorite sequence.
Further, the content embedding vector is calculated by continuously calling a deep network at the content side of the actual sequencing model, an embedding layer is output, and the actual sequencing model is updated and stored for the on-line prediction of a new content sequence to be inquired and used.
Further, when the content embedding vector is predicted on line, calculation is performed by calling a deep network on the user side of the actual sequencing model.
Further, the step (three) includes:
(31) acquiring the initial content-sorting score association vector, counting all vector sources, and classifying each vector into a corresponding recall group;
(32) the adaptive sampling weight is calculated according to the following formula:
Figure 371369DEST_PATH_IMAGE001
(1)
wherein,
Figure 488230DEST_PATH_IMAGE002
the sampling coefficient representing the recall packet i,nindicates the number of recall packets and the number of recall packets,
Figure 652495DEST_PATH_IMAGE003
indicating the click rate of the ith recall packet;
(33) generating a Top-K recommended content vector list according to the following formula, wherein the number of the actual recommended content vector lists of the ith recall group is as follows:
Figure 628541DEST_PATH_IMAGE004
(2)
wherein,
Figure 762719DEST_PATH_IMAGE005
indicating the number of recalls configured for the ith recall group,
Figure 140611DEST_PATH_IMAGE006
the number of recalls after the weighted calculation of the ith recall group is represented, and the following conditions are met:
Figure 210198DEST_PATH_IMAGE007
(3)
wherein m is the total number of recalled content ids;
(34) number of recalls grouped according to recall
Figure 532595DEST_PATH_IMAGE006
Executing a sample balance processing strategy to average the content samples according to the deviation from the actual recalling number;
(35) and performing secondary sorting on the Top-K recommended content vector list by the fusion service logic to obtain a final sorting result.
Further, the sample balancing processing strategy is: when a certain number of recall packets is insufficient
Figure 345830DEST_PATH_IMAGE006
The recommender system recalculates the sampling coefficients based on the number of recall misses
Figure 578228DEST_PATH_IMAGE002
According to
Figure 943351DEST_PATH_IMAGE002
Preferably, a corresponding amount of content is extracted from the recall packet having a large value to supplement the content missing from the other recall packets.
The sequencing optimization system of the content personalized recommendation system comprises the following components:
the rough screening module is used for acquiring the clicking operation of the user, recalling and generating a list of contents to be sorted of the primary screening;
the first sequencing module is used for scoring the list of the contents to be sequenced by the sequencing model to generate an initial content-sequencing score association vector;
and the second sequencing module is used for carrying out secondary sequencing on the initial content-sequencing score association vector output by the first sequencing module based on a self-adaptive strategy to obtain a final sequencing result.
Has the advantages that: the invention has the following advantages:
1. the viscosity between each content of the product list and the user is calculated for initialization sequencing, so that individual real-time recommendation for each user is accurately realized;
2. based on the diversity of the recall groups, the self-adaptive sampling is carried out, so that products recommended to a single user come from different categories, and the intelligent effect of the recommendation system is improved;
3. adding a praise sequence in the user behavior data to ensure the robustness of the user behavior vector characteristics;
4. the introduction of user portrait information increases the richness and representativeness of features.
Drawings
FIG. 1 is a flow chart of a ranking optimization method of the content personalized recommendation system of the present invention;
FIG. 2 is a first ranking module framework diagram of the ranking optimization system of the content personalized recommendation system of the present invention;
fig. 3 is a frame diagram of a second ranking module of the ranking optimization system of the content personalized recommendation system of the invention.
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Referring to fig. 1, a flowchart of a ranking optimization method of a content personalized recommendation system according to the present invention is shown, where the method includes:
the method comprises the following steps of (I) acquiring a user click operation, recalling upstream and generating a list of contents to be sorted for preliminary screening, wherein the method specifically comprises the following steps:
and acquiring m pieces of content information data provided by a recall system each time a single behavior operation clicked by a user is captured, and combining a user id and a content id corresponding to the acquired content information data into a binary group to be recorded as (user id, content id).
And (II) scoring the list of the contents to be sorted of the initial screening according to a sorting model to generate an initial content-sorting score association vector.
Extracting user characteristic information and content characteristic information according to the list of the contents to be sorted which are preliminarily screened; wherein,
the user characteristic information comprises three types of data information: 1. preprocessing data of a feature vector of a user click sequence; 2. the basic attributes of the user comprise multi-dimensional user portrait indexes such as gender, age, consumption level and the like; 3. the user approves the preprocessed data of the feature vector of the sequence, and the processing mode is consistent with that of the class 1 data. The preprocessed data is an average or weighted average.
The content characteristic information comprises information such as primary classification, secondary classification and content author based on the content.
How to evaluate whether the pushed content is accurate or not, in the feature engineering of other current recommendation systems, the preference degree of the user for the content is generally comprehensively measured through data such as clicking, praise, commenting, forwarding, collecting, browsing and playing time of the recommended content by the user; or the satisfaction degree of the user to the APP is measured through the times of opening the APP by the user, the time interval of returning the APP by the user, the one-time stay time of the user and the like, and the satisfaction degree of the user to the recommended content can be reflected to some extent.
Therefore, in conjunction with the above industry experience, user portrait considerations are taken into account. The feature vector of the user click sequence contains the click sequence of the user's last 50 contents, the praise sequence of the last 50 contents, and fixed attributes of the user, such as gender, age, and consumption level. After content feature vectors of nearly 50 user click sequences are obtained, dimension reduction processing needs to be performed on the 50 x 32 dimensional matrix, and the content feature vectors are compressed into a 32 dimensional first group of user vectors after average calculation. And the recent 50 user praise sequences are subjected to dimensionality reduction by adopting the same method to generate a 32-dimensional second group of user vectors. The basic characteristics of the user comprise multi-dimensional user portrait indexes such as gender, age, consumption level and the like, and the discrete characteristics are processed in a single-hot coding mode to generate a third group of user vectors. And combining the three groups of user vectors to be used as the splicing vector input of the deep neural network at the user side. Similarly, the content feature information comprises content-based first-level classification information and second-level classification information, and after the content feature information is combined into vectors, the vectors are averaged with the user vectors to generate one-dimensional spliced vectors which are sent to the deep network of the content side.
As shown in FIG. 2, model training of the recommendation system is performed in an off-line stage, and the user combination features and the content combination features are respectively fed into the selected deep neural network model. The method disclosed by the invention tests classical sequencing models such as a double-tower model, a Google Wide and Deep model (Wide & Deep), a DIN model in Ali, a DIEN model and the like. And selecting a model with the highest accuracy rate and F1, namely a double-tower model as a base line of the sequencing model according to the accuracy rate of the data verification set, indexes such as F1 and the like. And calculating the user and the content to the combined feature vector, and calculating to obtain two units of a user embedded vector and a content embedded vector, namely respectively serving as low-dimensional semantic representations of the user and the content.
The two methods calculate cross entropy loss through the label value of the dot product result sample, and carry out backward propagation to optimize network parameters. In addition, the content embedding vector calls a deep tower network on the content side of the model to calculate, and the model is stored in an online environment for the sequential query of new content characteristic information predicted online.
Meanwhile, in the online prediction stage, the combined feature vector obtained by combining the user feature information and the content feature information of the new user also needs to be calculated by calling a model user side deep network, after the user embedded vector is generated, click operation is carried out on the combined feature vector and the content embedded vector of each content stored in the model, and finally the logit is taken as the score of the content-ranking score association vector, and the output format of the process is (user id, content id, ranking score).
And (III) as shown in FIG. 3, the module obtains the (user id, content id, ranking score) vector set predicted on line by the ranking double-tower model. And counting the content sources in all the triples, namely determining which recall strategy the content is pushed by, classifying each triplet according to the counting result, and marking the corresponding identification of the corresponding recall strategy group.
Here, there are a total of five recall groups, as shown in table 1 below.
TABLE 1
Recall group name Principle of grouping
i2i Similar computing mode recalls between contents
u2i Preference calculation mode recall between user and content
up User portrait calculation recall
hot Computing-based recall of hot content
u2u Similarity calculation mode recalling between users
(IV) according to the click rate index of the latest 30 days of each group, carrying out self-adaptive sampling weight calculation to obtain a sampling coefficient corresponding to each recall group
Figure 362831DEST_PATH_IMAGE002
Figure 838811DEST_PATH_IMAGE001
(1)
Wherein,
Figure 394558DEST_PATH_IMAGE002
a sampling coefficient representing the recall packet i, n represents the number of recall packets,
Figure 665002DEST_PATH_IMAGE003
indicating the click rate of the ith recall packet.
The sampling weight for each recall packet will be calculated
Figure 696412DEST_PATH_IMAGE002
And total number of content items pushed by the recall packet
Figure 851450DEST_PATH_IMAGE006
Substituting the following formula to generate a Top-K recommended content vector list:
Figure 792861DEST_PATH_IMAGE004
(2)
wherein,
Figure 499786DEST_PATH_IMAGE005
indicating the number of recalls configured for the ith recall group,
Figure 159437DEST_PATH_IMAGE006
shows the number of recalls weighted by the ith recall group (according to the table)
Figure 118166DEST_PATH_IMAGE006
5), the following conditions are satisfied:
Figure 38717DEST_PATH_IMAGE007
(3)
where m is the total number of recalled content ids.
Although topk content sequences pushed to the downstream can be extracted accurately according to the above strategy in an ideal situation, in reality, the situation that the number of recalls in a content list pushed by an upstream recall system is uneven is likely to exist, for example, the number of recalls in a certain recall group is not enough. Therefore, recall balance evaluation is needed, and a corresponding processing strategy is carried out. When the number of certain recall groups is not enough, the recommendation system can be used for recalling the missing numberCalculating a sampling coefficient
Figure 57489DEST_PATH_IMAGE002
According to
Figure 188125DEST_PATH_IMAGE002
Preferably, a corresponding amount of content is extracted from the recall packet having a large value to supplement the content missing from the other recall packets. And finally, the business logic of the product is fused, and the Top-K recommended content list is sent to a downstream link after secondary sequencing.
The sequencing optimization system of the content personalized recommendation system comprises the following components:
the rough screening module is used for acquiring the clicking operation of the user, recalling and generating a list of contents to be sorted of the primary screening;
the first sequencing module is used for scoring the list of the contents to be sequenced by the sequencing model to generate an initial content-sequencing score association vector;
and the second sequencing module is used for carrying out secondary sequencing on the initial content-sequencing score association vector output by the first sequencing module based on a self-adaptive strategy to obtain a final sequencing result.
And the list of the contents to be sorted in the preliminary screening is a content id list related to the historical click data of the user.
The first sequencing module further comprises:
the preprocessing subunit is used for extracting user characteristic information and content characteristic information according to the preliminary screening content list to be sorted; according to different sorting models, evaluating the metadata or respectively evaluating the metadata after the user characteristic information and the content characteristic information are combined, and selecting the sorting model with the highest score as an actual sorting model;
the calculation subunit is used for respectively inputting the user characteristic information and the content characteristic information into the actual sequencing model in an offline training stage of the recommendation system to obtain a user embedded vector and a content embedded vector with the same dimension; performing dot product calculation on the user embedded vector and the content embedded vector, performing cross entropy loss calculation on a dot product value and a sample label value clicked by the user, and performing backward propagation to optimize network parameters of an actual sequencing model; inputting the user characteristic information and the content characteristic information to be sorted into the optimized actual sorting model, and taking the dot product result of the model output vector as a sorting score to obtain an initial content-sorting score association vector.
Further, the user feature information includes: the content feature vector of the user click sequence, the user portrait index and the content feature vector of the user approval sequence.
Further, the content embedding vector is calculated by continuously calling a deep network at the content side of the actual sequencing model, an embedding layer is output, and the actual sequencing model is updated and stored for the on-line prediction of a new content sequence to be inquired and used.
Further, when the content embedding vector is predicted on line, calculation is performed by calling a deep network on the user side of the actual sequencing model.
The second sorting module is specifically configured to obtain the initial content-sorting score association vectors, count all vector sources, and classify each vector into a corresponding recall group; the adaptive sampling weight is calculated according to the following formula:
Figure 137495DEST_PATH_IMAGE001
(1)
wherein,
Figure 584657DEST_PATH_IMAGE002
the sampling coefficient representing the recall packet i,nindicates the number of recall packets and the number of recall packets,
Figure 367805DEST_PATH_IMAGE003
representing the click rate representing the ith recall packet;
generating a Top-K recommended content vector list according to the following formula, wherein the number of the actual recommended content vector lists of the ith recall group is as follows:
Figure 595524DEST_PATH_IMAGE004
(2)
wherein,
Figure 692793DEST_PATH_IMAGE005
indicating the number of recalls configured for the ith recall group,
Figure 587937DEST_PATH_IMAGE006
the number of recalls after the weighted calculation of the ith recall group is represented, and the following conditions are met:
Figure 417353DEST_PATH_IMAGE007
(3)
wherein m is the total number of recalled content ids;
number of recalls grouped according to recall
Figure 538893DEST_PATH_IMAGE006
Executing a sample balance processing strategy to average the content samples according to the deviation from the actual recalling number; and performing secondary sorting on the Top-K recommended content vector list by the fusion service logic to obtain a final sorting result.
Further, the sample balancing processing strategy is: when a certain number of recall packets is insufficient
Figure 767749DEST_PATH_IMAGE006
The recommender system recalculates the sampling coefficients based on the number of recall misses
Figure 392765DEST_PATH_IMAGE002
According to
Figure 924240DEST_PATH_IMAGE002
Preferably, a corresponding amount of content is extracted from the recall packet having a large value to supplement the content missing from the other recall packets.

Claims (10)

1. A sequencing optimization method of a content personalized recommendation system is characterized by comprising the following steps:
acquiring a user click operation, recalling and generating a list of contents to be sorted for preliminary screening;
(II) scoring the list of the contents to be sorted of the initial screening according to a sorting model to generate an initial content-sorting score association vector;
and thirdly, performing secondary sorting on the initial content-sorting score association vector based on a self-adaptive strategy to obtain a final sorting result.
2. The ranking optimization method of the content personalized recommendation system according to claim 1, wherein in step (one), the list of the content to be initially screened is a list of content ids related to historical click data of the user.
3. The ranking optimization method of the content personalized recommendation system according to claim 1, wherein in the step (two), the ranking model comprises a double tower model.
4. The ranking optimization method of the content personalized recommendation system according to claim 1, wherein the step (two) includes:
(21) extracting user characteristic information and content characteristic information according to the list of the contents to be sorted which are preliminarily screened;
(22) according to different sorting models, evaluating the metadata or respectively evaluating the metadata after the user characteristic information and the content characteristic information are combined, and selecting the sorting model with the highest score as an actual sorting model;
(23) in an off-line training stage of the recommendation system, the user characteristic information and the content characteristic information are respectively input into the actual sequencing model to obtain a user embedded vector and a content embedded vector with the same dimension;
(24) performing dot product calculation on the user embedded vector and the content embedded vector, performing cross entropy loss calculation on a dot product value and a sample label value clicked by the user, and performing backward propagation to optimize network parameters of an actual sequencing model;
(25) inputting the user characteristic information and the content characteristic information to be sorted into the optimized actual sorting model, and taking the dot product result of the model output vector as a sorting score to obtain an initial content-sorting score association vector.
5. The ranking optimization method of the content personalized recommendation system according to claim 4, wherein in the step (21), the user feature information includes a feature vector of a user click sequence, a feature vector of a user portrait indicator, and a feature vector of a user favorite sequence.
6. The ranking optimization method of the content personalized recommendation system according to claim 4, wherein the content embedding vector is calculated by continuously calling a deep network on a content side of the actual ranking model, and the actual ranking model is updated and saved for use in online predicted new content sequence query.
7. The ranking optimization method of the content personalized recommendation system according to claim 4, wherein the content embedding vector is calculated by calling a deep network of a user side of the actual ranking model when predicted on line.
8. The ranking optimization method of the content personalized recommendation system according to claim 1, wherein the step (three) includes:
(31) acquiring the initial content-sorting score association vector, counting all vector sources, and classifying each vector into a corresponding recall group;
(32) the adaptive sampling weight is calculated according to the following formula:
Figure 798840DEST_PATH_IMAGE001
(1)
wherein,
Figure 450401DEST_PATH_IMAGE002
the sampling coefficient representing the recall packet i,nindicates the number of recall packets and the number of recall packets,
Figure 964559DEST_PATH_IMAGE003
indicating the click rate of the ith recall packet;
(33) generating a Top-K recommended content vector list according to the following formula, wherein the number of the actual recommended content vector lists of the ith recall group is as follows:
Figure 953243DEST_PATH_IMAGE004
(2)
wherein,
Figure 502037DEST_PATH_IMAGE005
indicating the number of recalls configured for the ith recall group,
Figure 58920DEST_PATH_IMAGE006
the number of recalls after the weighted calculation of the ith recall group is represented, and the following conditions are met:
Figure 185008DEST_PATH_IMAGE007
(3)
wherein m is the total number of recalled content ids;
(34) number of recalls grouped according to recall
Figure 118329DEST_PATH_IMAGE006
Executing a sample balance processing strategy to average the content samples according to the deviation from the actual recalling number;
(35) and performing secondary sorting on the Top-K recommended content vector list by the fusion service logic to obtain a final sorting result.
9. The ranking optimization method of the content personalized recommendation system according to claim 8, wherein the sample balancing processing policy is: when a certain number of recall packets is insufficient
Figure 521628DEST_PATH_IMAGE006
The recommender system recalculates the sampling coefficients based on the number of recall misses
Figure 249413DEST_PATH_IMAGE002
According to
Figure 597217DEST_PATH_IMAGE002
Preferably, a corresponding amount of content is extracted from the recall packet having a large value to supplement the content missing from the other recall packets.
10. A ranking optimization system for a content personalized recommendation system, the system comprising:
the rough screening module is used for acquiring the clicking operation of the user, recalling and generating a list of contents to be sorted of the primary screening;
the first sequencing module is used for scoring the list of the contents to be sequenced by the sequencing model to generate an initial content-sequencing score association vector;
and the second sequencing module is used for carrying out secondary sequencing on the initial content-sequencing score association vector output by the first sequencing module based on a self-adaptive strategy to obtain a final sequencing result.
CN202110338178.7A 2021-03-30 2021-03-30 Sequencing optimization method and system of content personalized recommendation system Withdrawn CN112801760A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110338178.7A CN112801760A (en) 2021-03-30 2021-03-30 Sequencing optimization method and system of content personalized recommendation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110338178.7A CN112801760A (en) 2021-03-30 2021-03-30 Sequencing optimization method and system of content personalized recommendation system

Publications (1)

Publication Number Publication Date
CN112801760A true CN112801760A (en) 2021-05-14

Family

ID=75815855

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110338178.7A Withdrawn CN112801760A (en) 2021-03-30 2021-03-30 Sequencing optimization method and system of content personalized recommendation system

Country Status (1)

Country Link
CN (1) CN112801760A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113722537A (en) * 2021-08-11 2021-11-30 北京奇艺世纪科技有限公司 Short video sequencing and model training method and device, electronic equipment and storage medium
CN113868466A (en) * 2021-12-06 2021-12-31 北京搜狐新媒体信息技术有限公司 Video recommendation method, device, equipment and storage medium
CN113889209A (en) * 2021-09-26 2022-01-04 浙江禾连网络科技有限公司 Recommendation system and storage medium for health management service products
CN114139046A (en) * 2021-10-29 2022-03-04 北京达佳互联信息技术有限公司 Object recommendation method and device, electronic equipment and storage medium
CN114547417A (en) * 2022-02-25 2022-05-27 北京百度网讯科技有限公司 Media resource ordering method and electronic equipment
CN114997532A (en) * 2022-07-29 2022-09-02 江苏新视云科技股份有限公司 Civil telephone delivery scheduling method under uncertain environment, terminal and storage medium
CN117290608A (en) * 2023-11-23 2023-12-26 深圳数拓科技有限公司 Marketing scheme intelligent pushing method, system and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825974A (en) * 2019-11-22 2020-02-21 厦门美柚股份有限公司 Recommendation system content ordering method and device
WO2020044098A2 (en) * 2018-08-30 2020-03-05 优视科技新加坡有限公司 Method and apparatus for sorting in information stream, and device/terminal/server

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020044098A2 (en) * 2018-08-30 2020-03-05 优视科技新加坡有限公司 Method and apparatus for sorting in information stream, and device/terminal/server
CN110825974A (en) * 2019-11-22 2020-02-21 厦门美柚股份有限公司 Recommendation system content ordering method and device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113722537A (en) * 2021-08-11 2021-11-30 北京奇艺世纪科技有限公司 Short video sequencing and model training method and device, electronic equipment and storage medium
CN113722537B (en) * 2021-08-11 2024-04-26 北京奇艺世纪科技有限公司 Short video ordering and model training method and device, electronic equipment and storage medium
CN113889209A (en) * 2021-09-26 2022-01-04 浙江禾连网络科技有限公司 Recommendation system and storage medium for health management service products
CN114139046A (en) * 2021-10-29 2022-03-04 北京达佳互联信息技术有限公司 Object recommendation method and device, electronic equipment and storage medium
CN113868466A (en) * 2021-12-06 2021-12-31 北京搜狐新媒体信息技术有限公司 Video recommendation method, device, equipment and storage medium
CN114547417A (en) * 2022-02-25 2022-05-27 北京百度网讯科技有限公司 Media resource ordering method and electronic equipment
CN114997532A (en) * 2022-07-29 2022-09-02 江苏新视云科技股份有限公司 Civil telephone delivery scheduling method under uncertain environment, terminal and storage medium
CN114997532B (en) * 2022-07-29 2023-02-03 江苏新视云科技股份有限公司 Civil telephone delivery scheduling method under uncertain environment, terminal and storage medium
CN117290608A (en) * 2023-11-23 2023-12-26 深圳数拓科技有限公司 Marketing scheme intelligent pushing method, system and storage medium

Similar Documents

Publication Publication Date Title
CN112801760A (en) Sequencing optimization method and system of content personalized recommendation system
CN110162693B (en) Information recommendation method and server
CN104281622B (en) Information recommendation method and device in a kind of social media
TWI591556B (en) Search engine results sorting method and system
CN111523055B (en) Collaborative recommendation method and system based on agricultural product characteristic attribute comment tendency
CN110619540A (en) Click stream estimation method of neural network
CN117522479B (en) Accurate Internet advertisement delivery method and system
CN112749330A (en) Information pushing method and device, computer equipment and storage medium
CN117874347A (en) Content recommendation technology based on business characteristics
CN113326432A (en) Model optimization method based on decision tree and recommendation method
CN112148994A (en) Information push effect evaluation method and device, electronic equipment and storage medium
CN115829683A (en) Power integration commodity recommendation method and system based on inverse reward learning optimization
CN115438787A (en) Training method and device of behavior prediction system
CN117541322B (en) Advertisement content intelligent generation method and system based on big data analysis
CN104572915A (en) User event relevance calculation method based on content environment enhancement
CN112651790B (en) OCPX self-adaptive learning method and system based on user touch in quick-elimination industry
CN111339428B (en) Interactive personalized search method based on limited Boltzmann machine drive
CN117726412A (en) AI recommendation system and method based on big data
CN112541010A (en) User gender prediction method based on logistic regression
CN116362810A (en) Advertisement putting effect evaluation method
CN114399352B (en) Information recommendation method and device, electronic equipment and storage medium
CN114358813B (en) Improved advertisement putting method and system based on field matrix factorization machine
CN112215629A (en) Multi-target advertisement generation system and method based on construction countermeasure sample
CN114912031A (en) Mixed recommendation method and system based on clustering and collaborative filtering
CN110147497B (en) Individual content recommendation method for teenager group

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210514