CN115795173A - Method for improving recommendation system to calculate related recommendations - Google Patents

Method for improving recommendation system to calculate related recommendations Download PDF

Info

Publication number
CN115795173A
CN115795173A CN202310076252.1A CN202310076252A CN115795173A CN 115795173 A CN115795173 A CN 115795173A CN 202310076252 A CN202310076252 A CN 202310076252A CN 115795173 A CN115795173 A CN 115795173A
Authority
CN
China
Prior art keywords
label
unified
tag
library
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310076252.1A
Other languages
Chinese (zh)
Inventor
张鹏
朱国晓
李鑫斌
彭渝
隆龙
王光永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haikan Network Technology Shandong Co ltd
Original Assignee
Haikan Network Technology Shandong Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haikan Network Technology Shandong Co ltd filed Critical Haikan Network Technology Shandong Co ltd
Priority to CN202310076252.1A priority Critical patent/CN115795173A/en
Publication of CN115795173A publication Critical patent/CN115795173A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of recommendation systems based on specific calculation models, and particularly relates to a method for improving a recommendation system to calculate related recommendations. According to the method and the device, the weight of the label is increased, the words with high weights can better reflect the characteristics of the recommended article, after weight calculation is carried out on all labels in the label library, the label can better reflect the characteristics of the program, and then the characteristics of the recommended article can be reflected better according to the calculation result of the content similarity.

Description

Method for improving recommendation system to calculate related recommendations
Technical Field
The invention relates to the technical field of recommendation systems based on specific calculation models, in particular to a method for improving a recommendation system to calculate relevant recommendations.
Background
With the rapid iteration of internet products, personalized recommendation plates such as 'guess you like' or 'recommend for you' are gradually used in related internet fields such as short videos, video on demand, live broadcasts, news clients and electronic commerce to replace the original manual typesetting mode, and the personalized recommendation mode silently changes the production and living modes of people. Meanwhile, the development of subdivision industries in various fields of the Internet is promoted, the viscosity of the product is increased, and information wanted by people can be quickly found or found in fragmentation time, so that the value of the product is improved.
Electronic commerce, video websites, news clients and the like are all scenes applied to the recommendation system, and the accuracy of relevant recommendation of the recommendation system is high. The related recommendation of the recommendation system is to generate one-hot codes through tags, and then perform similarity calculation through vectors generated by the one-hot codes, however, when some tags construct a program portrait by using 0-1 codes, the content similarity calculation result is more inclined to large words with higher word frequency, such as china in a movie, a drama, a comedy, etc., tags of a place of origin in an e-commerce, etc., after 0-1 codes, all tag weights are the same, and the importance degree of a certain tag to the program cannot be reflected. Calculating the similarity of the two programs according to a cosine similarity formula, wherein the cosine similarity formula is as follows:
Figure SMS_1
wherein the content of the first and second substances,
Figure SMS_2
calculating the cosine similarity of two programs with program numbers of JA and JBThe fruit is obtained by mixing the raw materials,
Figure SMS_3
is a value of one-hot encoding of the ith bit in the program of program number JA,
Figure SMS_4
the value of the one-hot code at the ith position in the program with the program number of JB, and n is the number of the labels in the corresponding program, namely the number of bits of the one-hot code in the cosine similarity formula.
For example, the labels of the three programs with program numbers J1, J2, and J3 and the results of One-hot encoding of the programs are shown in table 1.
TABLE 1
Figure SMS_5
The similarity calculation formula of two programs with program numbers J1 and J2 is as follows:
Figure SMS_6
wherein the content of the first and second substances,
Figure SMS_7
as the cosine similarity calculation results of the two programs of program number J1 and program number J2,
Figure SMS_8
the value of the one-hot code for the ith bit in the program with the program number J1,
Figure SMS_9
the value of the one-hot code at the ith position in the program with the program number J2 is shown, and n is the number of the labels in the corresponding program, namely the number of bits of the one-hot code in the cosine similarity formula.
The similarity calculation formula of two programs with program numbers J1 and J3 is as follows:
Figure SMS_10
wherein the content of the first and second substances,
Figure SMS_11
as the cosine similarity calculation results of the two programs of program number J1 and program number J3,
Figure SMS_12
is the value of the one-hot code of the ith bit in the program with the program number J1,
Figure SMS_13
the value of the one-hot code at the ith position in the program with the program number J3 is shown, and n is the number of the labels in the program corresponding to the program number, i.e. the number of bits of the one-hot code in the cosine similarity formula.
Comparison of
Figure SMS_14
And
Figure SMS_15
it can be seen that two programs with program numbers J1 and J2 are more similar, and if the user watches J1 or likes to watch J1, it is more likely to recommend J2 to the user than J3, so according to the cosine formula, the similarity recommendation is more likely to recommend programs with fewer labels, and in the past, the result of the similar recommendation of the recommendation system enters a particularly single state. Although some recommendation systems correct for tag selection and tag weight, the selection of tag weight is not objective and scientific enough, so that relevant recommendations are overused.
Disclosure of Invention
The invention aims to provide a method for improving a recommendation system to calculate related recommendations, which is combined with a cosine similarity calculation algorithm to solve the problem that more articles with few recommended labels are recommended in the recommendation system.
The technical scheme adopted by the invention for solving the technical problem is as follows: a method for improving a recommendation system to calculate related recommendations is characterized in that a mobile terminal director is used as a basic platform for research and development construction, related functions of director, live broadcast stream pushing and live broadcast watching are integrated into the same application, a plurality of mobile terminals are provided with the application, and specific implementation modes, configurations and terms are explained according to actual use. The specific implementation mode is as follows:
s1, collecting tags of the media asset contents on each large website by using a crawler technology, taking the tags as original tags, and forming a media asset library;
and S2, combining all original labels of the label media asset library according to the existing word stock to form a label library with unified labels.
S3, manually classifying and classifying each unified label in a label library according to the category of the unified label to obtain a label grade, wherein the classification principle is to use a reverse order mode for classification, the position of a label node is determined according to the inclusion relation of the unified label, the larger the label range is, the lower the label grade is, the smaller the label weight is, the unified label classification is set as a multi-grade classification, each multi-grade sub-classification of the unified label is reduced by 1 until the unified label is reduced to 1, for example, the unified label is set as three after being classified, if the unified label has no sub-classification, the label grade of the unified label is 3, if the unified label comprises a first-grade sub-classification, the label grade of the unified label is 2, if the first-grade sub-classification under the unified label also comprises a second-grade sub-classification, the label grade of the unified label is 1, namely, each multi-grade sub-classification of the unified label is reduced by 1 until the unified label is reduced to 1, and the unified label is not classified, so that the label grade of each unified label is more than or equal to 1 and less than or equal to 3;
s4, converting each original label of each program into a uniform label in a label library, wherein the generation step of the uniform label of each program is completed, and the replaced or modified uniform label is stored in the media asset library in real time and becomes a new original label in the media asset library;
s5, counting the exposure times and the total program number of each uniform label in the label library, and counting the total program number containing the uniform labels according to the exposure times of the uniform labels;
s6, calculating the weight of each unified label in the label library, wherein the calculation formula is as follows:
Figure SMS_16
wherein, W i Is the weight of the uniform label i;
step S7, after weights are sequentially calculated for all the uniform labels in the label library, a uniform label weight vector W is generated according to a fixed sequence, namely:
Figure SMS_17
wherein W is a uniform label weight vector, W 1 W2, W3 … Wn are the weight of the n-th unified tag in the tags;
and S8, calculating a new article vector, and multiplying each label by the label weight W in sequence in the process of calculating the article vector to serve as the new article vector.
The invention has the following beneficial effects: by using an algorithm combined with cosine similarity calculation, the problem that articles with few labels are frequently recommended by related recommendation functions in a recommendation system is solved, the weight of the labels is increased, words with high weights can better embody the characteristics of the recommended articles, after weight calculation is carried out on all the labels in a label library, the labels can better embody the characteristics of programs, and then the characteristics of the recommended articles can be better reflected according to the calculation result of content similarity.
Drawings
Fig. 1 is a schematic flow chart of the present application.
FIG. 2 is a timing diagram of one embodiment of the present application.
FIG. 3 is a schematic diagram of the tag categorization hierarchy of the present application.
FIG. 4 is a schematic diagram of an example of a unified tag categorization hierarchy in the present application.
FIG. 5 is a diagram of an example two of the unified tag categorization hierarchy of the present application.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
According to the method for calculating the relevant recommendation of the improved recommendation system shown in fig. 1, fig. 2, fig. 3, fig. 4, and fig. 5, a mobile terminal director is used as a basic platform for research and development construction, the functions related to director, live streaming, and live viewing are all integrated into the same application, and a plurality of mobile terminals install the application, and specific implementation modes, configurations, and terms are explained according to actual use. The specific implementation mode is as follows:
s1, using a crawler technology to collect tags of the media asset contents in each large website as original tags and form a media asset library, wherein a pyrapy frame based on Python is used for crawling in the specific implementation process, and UGC tags marked by users are generally prone to being crawled in the crawling process and have more representative significance;
and S2, combining all original labels of the label media asset library according to the existing near-sense word library to form a label library with unified labels, such as scene/landscape, yellow plum tone/drama movie/Beijing opera/Qunqiang, and the like, combining the labels containing the traditional words and the corresponding simple words to form unified labels, and finally forming a unified label library, wherein the label library and the near-sense word library are shown in a table 2.
TABLE 2
Figure SMS_18
S3, manually classifying and classifying each unified label in a label library according to the category of the unified label to obtain a label level, wherein the classification principle is to use a reverse order mode for classification, the position of a label node is determined according to the inclusion relation of the unified label, the label range is larger, the label level is lower, the label weight is smaller, the unified label is classified into three levels, if the unified label has no subcategories, the label level of the unified label is 3, if the unified label comprises one-level subcategories, the label level of the unified label is 2, if the one-level subcategories under the unified label also comprise two-level subcategories, the label level of the unified label is 1, namely, if the unified label comprises more one-level subcategories, the unified label is reduced by 1 until the unified label is reduced to 1, and the unified label is not classified again, so that the label level of each label is more than or equal to 1 and less than or equal to 3, and the classification reasons are two: firstly, the weight problem brought by the unified tags in a large range is further reduced, the formula of the step S6 is referred, secondly, after the unified tags are classified, the characteristics of long-tail favorites of the user are further highlighted through similarity calculation, and the recommendation result is more accurate.
For example, there are 9 unified tags in the tag library, which are: [ basketball ], [ football ], [ basket-filling high-hand ], [ sports ], [ drama ], [ network art ], [ education ], [ college ] and [ child care ], as shown in figure 3, when the label classification and grading are not carried out, the label level of each unified label is 3, the unified labels are classified and graded, as the [ sports ] comprises a first-level subcategory [ basketball ], [ football ], [ tennis ], wherein the [ basketball ] of the first-level subcategory also comprises a second-level subcategory [ basket-filling high-hand ], [ football ], [ tennis ] and a non-subcategory below the [ tennis ], and the relationship thereof is shown in figure 4, then there are two levels of sub-classifications under [ sports ], then the label level of [ sports ] is 3-2=1, there is one level of sub-classification under [ basketball ], then the label level of [ basketball ] is 3-1=2, [ football ], [ tennis ] is no sub-classification, then [ football ], [ tennis ] is 3-0=3, and if [ education ] contains one level of sub-classifications [ gao chou ], [ child bearing ], but [ gao chou ], [ child bearing ] is no sub-classification, then the label level of [ education ] is 3-1=2, [ gao ], [ child bearing ] is 3-0=3.
S4, converting each original label of each program into a uniform label in a label library, finishing the generation step of the uniform label of each program, storing the replaced or modified uniform label in the media resource library in real time to form a new original label in the media resource library;
s5, counting the exposure times and the total program number of each uniform label in the label library, and counting the total program number containing the uniform labels according to the exposure times of the uniform labels;
s6, calculating the weight of each unified label in the label library, wherein the calculation formula is as follows:
Figure SMS_19
wherein, W i Is the weight of the uniform label i;
as can be seen from the above formula, the more times [ uniform label i ] is labeled in the label library, and the higher the grade of the uniform label i is, the smaller the weight value of the uniform label i is; on the contrary, if the number of times of labeling the uniform label i is less, and the label level of the uniform label is lower, the weight value of the uniform label i is higher, and the weight of the uniform label with a large range and less number of times of labeling can be smoothly reduced through the level of the uniform label i, so that the personalized favor of the user is highlighted;
in selecting the degree of influence of the label grading, in actual production, such situations are often encountered, some labels with small exposure but high label grading, such as labels [ sports ], in calculating the exposure
Figure SMS_20
In (3), the unified tag weight 5.51513, but [ sports ]]The label level of (1) is a large-class label including two-level sub-classifications, and after multiplying by a label classification formula ln (label level +1 of uniform label i), the weight of the uniform label is reduced to 3.822537, so that the label classification formula ln (label level +1 of uniform label i) represents the influence of the label level on the weight of the uniform label, and the partial result calculated by using a label weight calculation formula is listed in table 3.
TABLE 3
Figure SMS_21
Step S7, after weights are sequentially calculated for all the uniform labels in the label library, a uniform label weight vector W is generated according to a fixed sequence, namely:
Figure SMS_22
wherein W is a uniform label weight vector, W 1 W2, W3 … Wn is the nth unified tag in the tag libraryThe weight of the label;
and S8, calculating a new article vector, and multiplying each label by the label weight W in sequence in the process of calculating the article vector to obtain the new article vector.
The present invention is not limited to the above embodiments, and any structural changes made under the teaching of the present invention shall fall within the protection scope of the present invention, which is similar or similar to the technical solutions of the present invention.
The techniques, shapes, and configurations not described in detail in the present invention are all known techniques.

Claims (5)

1. A method for improving a recommendation system to compute relevant recommendations, the method comprising: the method comprises the following steps:
s1, collecting tags of the medium resource contents in each large website by using a crawler technology to serve as original tags and form a medium resource library;
s2, combining all original labels of the label media asset library according to the existing near-sense word library to form a label library with uniform labels;
s3, manually classifying and classifying each unified label in the label library according to the category of the label to obtain a label grade, wherein the classification principle is that a reverse order mode is used for classification, the classification of the unified label is set as multi-grade classification, and each multi-grade sub-classification of the unified label reduces 1 of the unified label until the unified label is reduced to 1;
s4, converting each original label of each program into a uniform label in a label library;
s5, counting the exposure times and the total program number of each unified tag in a tag library;
s6, calculating the weight of each unified tag in the tag library;
s7, calculating weights of all unified tags in the tag library in sequence, and generating a unified tag weight vector W according to a fixed sequence;
and S8, calculating a new article vector.
2. The method of claim 1, wherein the method comprises the steps of: the unified tag classification is set to three levels, if the unified tag has no sub-classification, the tag level of the unified tag is 3, if the unified tag comprises a first-level sub-classification, the tag level of the unified tag is 2, if the first-level sub-classification under the unified tag further comprises a second-level sub-classification, and the tag level of the unified tag is 1.
3. The method of claim 1, wherein the method comprises: and converting the original label in the step S4 into a uniform label in the label library, and storing the converted uniform label in the media asset library in real time to obtain the original label in the media asset library.
4. The method of claim 1, wherein the method comprises: in step S6, the formula for calculating the weight of the uniform label is:
Figure QLYQS_1
wherein, W i Is the weight of the uniform label i.
5. The method of claim 1, wherein the method comprises: in step S8, in the process of calculating a new item vector, the label weight W is multiplied to each label in turn as a new item vector.
CN202310076252.1A 2023-02-08 2023-02-08 Method for improving recommendation system to calculate related recommendations Pending CN115795173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310076252.1A CN115795173A (en) 2023-02-08 2023-02-08 Method for improving recommendation system to calculate related recommendations

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310076252.1A CN115795173A (en) 2023-02-08 2023-02-08 Method for improving recommendation system to calculate related recommendations

Publications (1)

Publication Number Publication Date
CN115795173A true CN115795173A (en) 2023-03-14

Family

ID=85430276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310076252.1A Pending CN115795173A (en) 2023-02-08 2023-02-08 Method for improving recommendation system to calculate related recommendations

Country Status (1)

Country Link
CN (1) CN115795173A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116204737A (en) * 2023-05-04 2023-06-02 海看网络科技(山东)股份有限公司 Recommendation method, system, equipment and medium based on user behavior codes

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389988A (en) * 2012-05-10 2013-11-13 腾讯科技(深圳)有限公司 Method and device for guiding user to carry out information search
CN105045907A (en) * 2015-08-10 2015-11-11 北京工业大学 Method for constructing visual attention-label-user interest tree for personalized social image recommendation
CN112232524A (en) * 2020-12-14 2021-01-15 北京沃东天骏信息技术有限公司 Multi-label information identification method and device, electronic equipment and readable storage medium
CN113873333A (en) * 2021-09-30 2021-12-31 海看网络科技(山东)股份有限公司 Method for calculating program portrait on IPTV

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389988A (en) * 2012-05-10 2013-11-13 腾讯科技(深圳)有限公司 Method and device for guiding user to carry out information search
CN105045907A (en) * 2015-08-10 2015-11-11 北京工业大学 Method for constructing visual attention-label-user interest tree for personalized social image recommendation
CN112232524A (en) * 2020-12-14 2021-01-15 北京沃东天骏信息技术有限公司 Multi-label information identification method and device, electronic equipment and readable storage medium
CN113873333A (en) * 2021-09-30 2021-12-31 海看网络科技(山东)股份有限公司 Method for calculating program portrait on IPTV

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
熊回香等: "基于用户兴趣主题模型的个性化推荐研究" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116204737A (en) * 2023-05-04 2023-06-02 海看网络科技(山东)股份有限公司 Recommendation method, system, equipment and medium based on user behavior codes

Similar Documents

Publication Publication Date Title
Reddy et al. Content-based movie recommendation system using genre correlation
CN107220365B (en) Accurate recommendation system and method based on collaborative filtering and association rule parallel processing
CN106028071A (en) Video recommendation method and system
CN104008139B (en) The creation method and device of video index table, the recommendation method and apparatus of video
CN108965938B (en) Method and system for predicting potential pay users in smart television
CN104063481A (en) Film individuation recommendation method based on user real-time interest vectors
CN112364204B (en) Video searching method, device, computer equipment and storage medium
CN104063383A (en) Information recommendation method and device
CN103092958A (en) Display method and device for search result
CN115168744A (en) Radio and television technology knowledge recommendation method based on user portrait and knowledge graph
CN115795173A (en) Method for improving recommendation system to calculate related recommendations
CN104933135A (en) Method and device for clustering multimedia data
CN107493467A (en) A kind of video quality evaluation method and device
CN104854588B (en) System and method for searching for the predominantly non-textual project of label
CN112749330A (en) Information pushing method and device, computer equipment and storage medium
CN108109058A (en) A kind of single classification collaborative filtering method for merging personal traits and article tag
CN108540860B (en) Video recall method and device
Lee et al. Dynamic item recommendation by topic modeling for social networks
Kowald et al. Popularity bias in collaborative filtering-based multimedia recommender systems
CN113873333A (en) Method for calculating program portrait on IPTV
Chen et al. Research and implementation of movie recommendation system based on deep learning
CN114912031A (en) Mixed recommendation method and system based on clustering and collaborative filtering
Tewari et al. Efficient tag based personalised collaborative movie reccommendation system
Ye et al. A collaborative neural model for rating prediction by leveraging user reviews and product images
CN108805628B (en) Electronic commerce recommendation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230314