CN115795173A

CN115795173A - Method for improving recommendation system to calculate related recommendations

Info

Publication number: CN115795173A
Application number: CN202310076252.1A
Authority: CN
Inventors: 张鹏; 朱国晓; 李鑫斌; 彭渝; 隆龙; 王光永
Original assignee: Haikan Network Technology Shandong Co ltd
Current assignee: Haikan Network Technology Shandong Co ltd
Priority date: 2023-02-08
Filing date: 2023-02-08
Publication date: 2023-03-14

Abstract

The invention belongs to the technical field of recommendation systems based on specific calculation models, and particularly relates to a method for improving a recommendation system to calculate related recommendations. According to the method and the device, the weight of the label is increased, the words with high weights can better reflect the characteristics of the recommended article, after weight calculation is carried out on all labels in the label library, the label can better reflect the characteristics of the program, and then the characteristics of the recommended article can be reflected better according to the calculation result of the content similarity.

Description

Method for improving recommendation system to calculate related recommendations

Technical Field

The invention relates to the technical field of recommendation systems based on specific calculation models, in particular to a method for improving a recommendation system to calculate relevant recommendations.

Background

With the rapid iteration of internet products, personalized recommendation plates such as 'guess you like' or 'recommend for you' are gradually used in related internet fields such as short videos, video on demand, live broadcasts, news clients and electronic commerce to replace the original manual typesetting mode, and the personalized recommendation mode silently changes the production and living modes of people. Meanwhile, the development of subdivision industries in various fields of the Internet is promoted, the viscosity of the product is increased, and information wanted by people can be quickly found or found in fragmentation time, so that the value of the product is improved.

Electronic commerce, video websites, news clients and the like are all scenes applied to the recommendation system, and the accuracy of relevant recommendation of the recommendation system is high. The related recommendation of the recommendation system is to generate one-hot codes through tags, and then perform similarity calculation through vectors generated by the one-hot codes, however, when some tags construct a program portrait by using 0-1 codes, the content similarity calculation result is more inclined to large words with higher word frequency, such as china in a movie, a drama, a comedy, etc., tags of a place of origin in an e-commerce, etc., after 0-1 codes, all tag weights are the same, and the importance degree of a certain tag to the program cannot be reflected. Calculating the similarity of the two programs according to a cosine similarity formula, wherein the cosine similarity formula is as follows:

wherein the content of the first and second substances,

calculating the cosine similarity of two programs with program numbers of JA and JBThe fruit is obtained by mixing the raw materials,

is a value of one-hot encoding of the ith bit in the program of program number JA,

the value of the one-hot code at the ith position in the program with the program number of JB, and n is the number of the labels in the corresponding program, namely the number of bits of the one-hot code in the cosine similarity formula.

For example, the labels of the three programs with program numbers J1, J2, and J3 and the results of One-hot encoding of the programs are shown in table 1.

TABLE 1

The similarity calculation formula of two programs with program numbers J1 and J2 is as follows:

wherein the content of the first and second substances,

as the cosine similarity calculation results of the two programs of program number J1 and program number J2,

the value of the one-hot code for the ith bit in the program with the program number J1,

the value of the one-hot code at the ith position in the program with the program number J2 is shown, and n is the number of the labels in the corresponding program, namely the number of bits of the one-hot code in the cosine similarity formula.

The similarity calculation formula of two programs with program numbers J1 and J3 is as follows:

wherein the content of the first and second substances,

as the cosine similarity calculation results of the two programs of program number J1 and program number J3,

is the value of the one-hot code of the ith bit in the program with the program number J1,

the value of the one-hot code at the ith position in the program with the program number J3 is shown, and n is the number of the labels in the program corresponding to the program number, i.e. the number of bits of the one-hot code in the cosine similarity formula.

Comparison of

And

it can be seen that two programs with program numbers J1 and J2 are more similar, and if the user watches J1 or likes to watch J1, it is more likely to recommend J2 to the user than J3, so according to the cosine formula, the similarity recommendation is more likely to recommend programs with fewer labels, and in the past, the result of the similar recommendation of the recommendation system enters a particularly single state. Although some recommendation systems correct for tag selection and tag weight, the selection of tag weight is not objective and scientific enough, so that relevant recommendations are overused.

Disclosure of Invention

The invention aims to provide a method for improving a recommendation system to calculate related recommendations, which is combined with a cosine similarity calculation algorithm to solve the problem that more articles with few recommended labels are recommended in the recommendation system.

The technical scheme adopted by the invention for solving the technical problem is as follows: a method for improving a recommendation system to calculate related recommendations is characterized in that a mobile terminal director is used as a basic platform for research and development construction, related functions of director, live broadcast stream pushing and live broadcast watching are integrated into the same application, a plurality of mobile terminals are provided with the application, and specific implementation modes, configurations and terms are explained according to actual use. The specific implementation mode is as follows:

s1, collecting tags of the media asset contents on each large website by using a crawler technology, taking the tags as original tags, and forming a media asset library;

and S2, combining all original labels of the label media asset library according to the existing word stock to form a label library with unified labels.

S3, manually classifying and classifying each unified label in a label library according to the category of the unified label to obtain a label grade, wherein the classification principle is to use a reverse order mode for classification, the position of a label node is determined according to the inclusion relation of the unified label, the larger the label range is, the lower the label grade is, the smaller the label weight is, the unified label classification is set as a multi-grade classification, each multi-grade sub-classification of the unified label is reduced by 1 until the unified label is reduced to 1, for example, the unified label is set as three after being classified, if the unified label has no sub-classification, the label grade of the unified label is 3, if the unified label comprises a first-grade sub-classification, the label grade of the unified label is 2, if the first-grade sub-classification under the unified label also comprises a second-grade sub-classification, the label grade of the unified label is 1, namely, each multi-grade sub-classification of the unified label is reduced by 1 until the unified label is reduced to 1, and the unified label is not classified, so that the label grade of each unified label is more than or equal to 1 and less than or equal to 3;

s4, converting each original label of each program into a uniform label in a label library, wherein the generation step of the uniform label of each program is completed, and the replaced or modified uniform label is stored in the media asset library in real time and becomes a new original label in the media asset library;

s5, counting the exposure times and the total program number of each uniform label in the label library, and counting the total program number containing the uniform labels according to the exposure times of the uniform labels;

s6, calculating the weight of each unified label in the label library, wherein the calculation formula is as follows:

wherein, W _i Is the weight of the uniform label i;

step S7, after weights are sequentially calculated for all the uniform labels in the label library, a uniform label weight vector W is generated according to a fixed sequence, namely:

wherein W is a uniform label weight vector, W ₁ W2, W3 … Wn are the weight of the n-th unified tag in the tags;

and S8, calculating a new article vector, and multiplying each label by the label weight W in sequence in the process of calculating the article vector to serve as the new article vector.

The invention has the following beneficial effects: by using an algorithm combined with cosine similarity calculation, the problem that articles with few labels are frequently recommended by related recommendation functions in a recommendation system is solved, the weight of the labels is increased, words with high weights can better embody the characteristics of the recommended articles, after weight calculation is carried out on all the labels in a label library, the labels can better embody the characteristics of programs, and then the characteristics of the recommended articles can be better reflected according to the calculation result of content similarity.

Drawings

Fig. 1 is a schematic flow chart of the present application.

FIG. 2 is a timing diagram of one embodiment of the present application.

FIG. 3 is a schematic diagram of the tag categorization hierarchy of the present application.

FIG. 4 is a schematic diagram of an example of a unified tag categorization hierarchy in the present application.

FIG. 5 is a diagram of an example two of the unified tag categorization hierarchy of the present application.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings.

According to the method for calculating the relevant recommendation of the improved recommendation system shown in fig. 1, fig. 2, fig. 3, fig. 4, and fig. 5, a mobile terminal director is used as a basic platform for research and development construction, the functions related to director, live streaming, and live viewing are all integrated into the same application, and a plurality of mobile terminals install the application, and specific implementation modes, configurations, and terms are explained according to actual use. The specific implementation mode is as follows:

s1, using a crawler technology to collect tags of the media asset contents in each large website as original tags and form a media asset library, wherein a pyrapy frame based on Python is used for crawling in the specific implementation process, and UGC tags marked by users are generally prone to being crawled in the crawling process and have more representative significance;

and S2, combining all original labels of the label media asset library according to the existing near-sense word library to form a label library with unified labels, such as scene/landscape, yellow plum tone/drama movie/Beijing opera/Qunqiang, and the like, combining the labels containing the traditional words and the corresponding simple words to form unified labels, and finally forming a unified label library, wherein the label library and the near-sense word library are shown in a table 2.

TABLE 2

S3, manually classifying and classifying each unified label in a label library according to the category of the unified label to obtain a label level, wherein the classification principle is to use a reverse order mode for classification, the position of a label node is determined according to the inclusion relation of the unified label, the label range is larger, the label level is lower, the label weight is smaller, the unified label is classified into three levels, if the unified label has no subcategories, the label level of the unified label is 3, if the unified label comprises one-level subcategories, the label level of the unified label is 2, if the one-level subcategories under the unified label also comprise two-level subcategories, the label level of the unified label is 1, namely, if the unified label comprises more one-level subcategories, the unified label is reduced by 1 until the unified label is reduced to 1, and the unified label is not classified again, so that the label level of each label is more than or equal to 1 and less than or equal to 3, and the classification reasons are two: firstly, the weight problem brought by the unified tags in a large range is further reduced, the formula of the step S6 is referred, secondly, after the unified tags are classified, the characteristics of long-tail favorites of the user are further highlighted through similarity calculation, and the recommendation result is more accurate.

For example, there are 9 unified tags in the tag library, which are: [ basketball ], [ football ], [ basket-filling high-hand ], [ sports ], [ drama ], [ network art ], [ education ], [ college ] and [ child care ], as shown in figure 3, when the label classification and grading are not carried out, the label level of each unified label is 3, the unified labels are classified and graded, as the [ sports ] comprises a first-level subcategory [ basketball ], [ football ], [ tennis ], wherein the [ basketball ] of the first-level subcategory also comprises a second-level subcategory [ basket-filling high-hand ], [ football ], [ tennis ] and a non-subcategory below the [ tennis ], and the relationship thereof is shown in figure 4, then there are two levels of sub-classifications under [ sports ], then the label level of [ sports ] is 3-2=1, there is one level of sub-classification under [ basketball ], then the label level of [ basketball ] is 3-1=2, [ football ], [ tennis ] is no sub-classification, then [ football ], [ tennis ] is 3-0=3, and if [ education ] contains one level of sub-classifications [ gao chou ], [ child bearing ], but [ gao chou ], [ child bearing ] is no sub-classification, then the label level of [ education ] is 3-1=2, [ gao ], [ child bearing ] is 3-0=3.

S4, converting each original label of each program into a uniform label in a label library, finishing the generation step of the uniform label of each program, storing the replaced or modified uniform label in the media resource library in real time to form a new original label in the media resource library;

wherein, W _i Is the weight of the uniform label i;

as can be seen from the above formula, the more times [ uniform label i ] is labeled in the label library, and the higher the grade of the uniform label i is, the smaller the weight value of the uniform label i is; on the contrary, if the number of times of labeling the uniform label i is less, and the label level of the uniform label is lower, the weight value of the uniform label i is higher, and the weight of the uniform label with a large range and less number of times of labeling can be smoothly reduced through the level of the uniform label i, so that the personalized favor of the user is highlighted;

in selecting the degree of influence of the label grading, in actual production, such situations are often encountered, some labels with small exposure but high label grading, such as labels [ sports ], in calculating the exposure

In (3), the unified tag weight 5.51513, but [ sports ]]The label level of (1) is a large-class label including two-level sub-classifications, and after multiplying by a label classification formula ln (label level +1 of uniform label i), the weight of the uniform label is reduced to 3.822537, so that the label classification formula ln (label level +1 of uniform label i) represents the influence of the label level on the weight of the uniform label, and the partial result calculated by using a label weight calculation formula is listed in table 3.

TABLE 3

wherein W is a uniform label weight vector, W ₁ W2, W3 … Wn is the nth unified tag in the tag libraryThe weight of the label;

and S8, calculating a new article vector, and multiplying each label by the label weight W in sequence in the process of calculating the article vector to obtain the new article vector.

The present invention is not limited to the above embodiments, and any structural changes made under the teaching of the present invention shall fall within the protection scope of the present invention, which is similar or similar to the technical solutions of the present invention.

The techniques, shapes, and configurations not described in detail in the present invention are all known techniques.

Claims

1. A method for improving a recommendation system to compute relevant recommendations, the method comprising: the method comprises the following steps:

s1, collecting tags of the medium resource contents in each large website by using a crawler technology to serve as original tags and form a medium resource library;

s2, combining all original labels of the label media asset library according to the existing near-sense word library to form a label library with uniform labels;

s3, manually classifying and classifying each unified label in the label library according to the category of the label to obtain a label grade, wherein the classification principle is that a reverse order mode is used for classification, the classification of the unified label is set as multi-grade classification, and each multi-grade sub-classification of the unified label reduces 1 of the unified label until the unified label is reduced to 1;

s4, converting each original label of each program into a uniform label in a label library;

s5, counting the exposure times and the total program number of each unified tag in a tag library;

s6, calculating the weight of each unified tag in the tag library;

s7, calculating weights of all unified tags in the tag library in sequence, and generating a unified tag weight vector W according to a fixed sequence;

and S8, calculating a new article vector.

2. The method of claim 1, wherein the method comprises the steps of: the unified tag classification is set to three levels, if the unified tag has no sub-classification, the tag level of the unified tag is 3, if the unified tag comprises a first-level sub-classification, the tag level of the unified tag is 2, if the first-level sub-classification under the unified tag further comprises a second-level sub-classification, and the tag level of the unified tag is 1.

3. The method of claim 1, wherein the method comprises: and converting the original label in the step S4 into a uniform label in the label library, and storing the converted uniform label in the media asset library in real time to obtain the original label in the media asset library.

4. The method of claim 1, wherein the method comprises: in step S6, the formula for calculating the weight of the uniform label is:

，

wherein, W _i Is the weight of the uniform label i.

5. The method of claim 1, wherein the method comprises: in step S8, in the process of calculating a new item vector, the label weight W is multiplied to each label in turn as a new item vector.