CN115795173A - Method for improving recommendation system to calculate related recommendations - Google Patents
Method for improving recommendation system to calculate related recommendations Download PDFInfo
- Publication number
- CN115795173A CN115795173A CN202310076252.1A CN202310076252A CN115795173A CN 115795173 A CN115795173 A CN 115795173A CN 202310076252 A CN202310076252 A CN 202310076252A CN 115795173 A CN115795173 A CN 115795173A
- Authority
- CN
- China
- Prior art keywords
- label
- unified
- tag
- library
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 239000013598 vector Substances 0.000 claims description 15
- 238000005516 engineering process Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 17
- 238000010586 diagram Methods 0.000 description 4
- 239000000126 substance Substances 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000009193 crawling Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012827 research and development Methods 0.000 description 2
- 235000011229 Prunus domestica subsp. syriaca Nutrition 0.000 description 1
- 240000005462 Prunus umbellata var. umbellata Species 0.000 description 1
- 235000005138 Spondias dulcis Nutrition 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 239000002994 raw material Substances 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field of recommendation systems based on specific calculation models, and particularly relates to a method for improving a recommendation system to calculate related recommendations. According to the method and the device, the weight of the label is increased, the words with high weights can better reflect the characteristics of the recommended article, after weight calculation is carried out on all labels in the label library, the label can better reflect the characteristics of the program, and then the characteristics of the recommended article can be reflected better according to the calculation result of the content similarity.
Description
Technical Field
The invention relates to the technical field of recommendation systems based on specific calculation models, in particular to a method for improving a recommendation system to calculate relevant recommendations.
Background
With the rapid iteration of internet products, personalized recommendation plates such as 'guess you like' or 'recommend for you' are gradually used in related internet fields such as short videos, video on demand, live broadcasts, news clients and electronic commerce to replace the original manual typesetting mode, and the personalized recommendation mode silently changes the production and living modes of people. Meanwhile, the development of subdivision industries in various fields of the Internet is promoted, the viscosity of the product is increased, and information wanted by people can be quickly found or found in fragmentation time, so that the value of the product is improved.
Electronic commerce, video websites, news clients and the like are all scenes applied to the recommendation system, and the accuracy of relevant recommendation of the recommendation system is high. The related recommendation of the recommendation system is to generate one-hot codes through tags, and then perform similarity calculation through vectors generated by the one-hot codes, however, when some tags construct a program portrait by using 0-1 codes, the content similarity calculation result is more inclined to large words with higher word frequency, such as china in a movie, a drama, a comedy, etc., tags of a place of origin in an e-commerce, etc., after 0-1 codes, all tag weights are the same, and the importance degree of a certain tag to the program cannot be reflected. Calculating the similarity of the two programs according to a cosine similarity formula, wherein the cosine similarity formula is as follows:
wherein the content of the first and second substances,calculating the cosine similarity of two programs with program numbers of JA and JBThe fruit is obtained by mixing the raw materials,is a value of one-hot encoding of the ith bit in the program of program number JA,the value of the one-hot code at the ith position in the program with the program number of JB, and n is the number of the labels in the corresponding program, namely the number of bits of the one-hot code in the cosine similarity formula.
For example, the labels of the three programs with program numbers J1, J2, and J3 and the results of One-hot encoding of the programs are shown in table 1.
TABLE 1
The similarity calculation formula of two programs with program numbers J1 and J2 is as follows:
wherein the content of the first and second substances,as the cosine similarity calculation results of the two programs of program number J1 and program number J2,the value of the one-hot code for the ith bit in the program with the program number J1,the value of the one-hot code at the ith position in the program with the program number J2 is shown, and n is the number of the labels in the corresponding program, namely the number of bits of the one-hot code in the cosine similarity formula.
The similarity calculation formula of two programs with program numbers J1 and J3 is as follows:
wherein the content of the first and second substances,as the cosine similarity calculation results of the two programs of program number J1 and program number J3,is the value of the one-hot code of the ith bit in the program with the program number J1,the value of the one-hot code at the ith position in the program with the program number J3 is shown, and n is the number of the labels in the program corresponding to the program number, i.e. the number of bits of the one-hot code in the cosine similarity formula.
Comparison ofAndit can be seen that two programs with program numbers J1 and J2 are more similar, and if the user watches J1 or likes to watch J1, it is more likely to recommend J2 to the user than J3, so according to the cosine formula, the similarity recommendation is more likely to recommend programs with fewer labels, and in the past, the result of the similar recommendation of the recommendation system enters a particularly single state. Although some recommendation systems correct for tag selection and tag weight, the selection of tag weight is not objective and scientific enough, so that relevant recommendations are overused.
Disclosure of Invention
The invention aims to provide a method for improving a recommendation system to calculate related recommendations, which is combined with a cosine similarity calculation algorithm to solve the problem that more articles with few recommended labels are recommended in the recommendation system.
The technical scheme adopted by the invention for solving the technical problem is as follows: a method for improving a recommendation system to calculate related recommendations is characterized in that a mobile terminal director is used as a basic platform for research and development construction, related functions of director, live broadcast stream pushing and live broadcast watching are integrated into the same application, a plurality of mobile terminals are provided with the application, and specific implementation modes, configurations and terms are explained according to actual use. The specific implementation mode is as follows:
s1, collecting tags of the media asset contents on each large website by using a crawler technology, taking the tags as original tags, and forming a media asset library;
and S2, combining all original labels of the label media asset library according to the existing word stock to form a label library with unified labels.
S3, manually classifying and classifying each unified label in a label library according to the category of the unified label to obtain a label grade, wherein the classification principle is to use a reverse order mode for classification, the position of a label node is determined according to the inclusion relation of the unified label, the larger the label range is, the lower the label grade is, the smaller the label weight is, the unified label classification is set as a multi-grade classification, each multi-grade sub-classification of the unified label is reduced by 1 until the unified label is reduced to 1, for example, the unified label is set as three after being classified, if the unified label has no sub-classification, the label grade of the unified label is 3, if the unified label comprises a first-grade sub-classification, the label grade of the unified label is 2, if the first-grade sub-classification under the unified label also comprises a second-grade sub-classification, the label grade of the unified label is 1, namely, each multi-grade sub-classification of the unified label is reduced by 1 until the unified label is reduced to 1, and the unified label is not classified, so that the label grade of each unified label is more than or equal to 1 and less than or equal to 3;
s4, converting each original label of each program into a uniform label in a label library, wherein the generation step of the uniform label of each program is completed, and the replaced or modified uniform label is stored in the media asset library in real time and becomes a new original label in the media asset library;
s5, counting the exposure times and the total program number of each uniform label in the label library, and counting the total program number containing the uniform labels according to the exposure times of the uniform labels;
s6, calculating the weight of each unified label in the label library, wherein the calculation formula is as follows:
wherein, W i Is the weight of the uniform label i;
step S7, after weights are sequentially calculated for all the uniform labels in the label library, a uniform label weight vector W is generated according to a fixed sequence, namely:
wherein W is a uniform label weight vector, W 1 W2, W3 … Wn are the weight of the n-th unified tag in the tags;
and S8, calculating a new article vector, and multiplying each label by the label weight W in sequence in the process of calculating the article vector to serve as the new article vector.
The invention has the following beneficial effects: by using an algorithm combined with cosine similarity calculation, the problem that articles with few labels are frequently recommended by related recommendation functions in a recommendation system is solved, the weight of the labels is increased, words with high weights can better embody the characteristics of the recommended articles, after weight calculation is carried out on all the labels in a label library, the labels can better embody the characteristics of programs, and then the characteristics of the recommended articles can be better reflected according to the calculation result of content similarity.
Drawings
Fig. 1 is a schematic flow chart of the present application.
FIG. 2 is a timing diagram of one embodiment of the present application.
FIG. 3 is a schematic diagram of the tag categorization hierarchy of the present application.
FIG. 4 is a schematic diagram of an example of a unified tag categorization hierarchy in the present application.
FIG. 5 is a diagram of an example two of the unified tag categorization hierarchy of the present application.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
According to the method for calculating the relevant recommendation of the improved recommendation system shown in fig. 1, fig. 2, fig. 3, fig. 4, and fig. 5, a mobile terminal director is used as a basic platform for research and development construction, the functions related to director, live streaming, and live viewing are all integrated into the same application, and a plurality of mobile terminals install the application, and specific implementation modes, configurations, and terms are explained according to actual use. The specific implementation mode is as follows:
s1, using a crawler technology to collect tags of the media asset contents in each large website as original tags and form a media asset library, wherein a pyrapy frame based on Python is used for crawling in the specific implementation process, and UGC tags marked by users are generally prone to being crawled in the crawling process and have more representative significance;
and S2, combining all original labels of the label media asset library according to the existing near-sense word library to form a label library with unified labels, such as scene/landscape, yellow plum tone/drama movie/Beijing opera/Qunqiang, and the like, combining the labels containing the traditional words and the corresponding simple words to form unified labels, and finally forming a unified label library, wherein the label library and the near-sense word library are shown in a table 2.
TABLE 2
S3, manually classifying and classifying each unified label in a label library according to the category of the unified label to obtain a label level, wherein the classification principle is to use a reverse order mode for classification, the position of a label node is determined according to the inclusion relation of the unified label, the label range is larger, the label level is lower, the label weight is smaller, the unified label is classified into three levels, if the unified label has no subcategories, the label level of the unified label is 3, if the unified label comprises one-level subcategories, the label level of the unified label is 2, if the one-level subcategories under the unified label also comprise two-level subcategories, the label level of the unified label is 1, namely, if the unified label comprises more one-level subcategories, the unified label is reduced by 1 until the unified label is reduced to 1, and the unified label is not classified again, so that the label level of each label is more than or equal to 1 and less than or equal to 3, and the classification reasons are two: firstly, the weight problem brought by the unified tags in a large range is further reduced, the formula of the step S6 is referred, secondly, after the unified tags are classified, the characteristics of long-tail favorites of the user are further highlighted through similarity calculation, and the recommendation result is more accurate.
For example, there are 9 unified tags in the tag library, which are: [ basketball ], [ football ], [ basket-filling high-hand ], [ sports ], [ drama ], [ network art ], [ education ], [ college ] and [ child care ], as shown in figure 3, when the label classification and grading are not carried out, the label level of each unified label is 3, the unified labels are classified and graded, as the [ sports ] comprises a first-level subcategory [ basketball ], [ football ], [ tennis ], wherein the [ basketball ] of the first-level subcategory also comprises a second-level subcategory [ basket-filling high-hand ], [ football ], [ tennis ] and a non-subcategory below the [ tennis ], and the relationship thereof is shown in figure 4, then there are two levels of sub-classifications under [ sports ], then the label level of [ sports ] is 3-2=1, there is one level of sub-classification under [ basketball ], then the label level of [ basketball ] is 3-1=2, [ football ], [ tennis ] is no sub-classification, then [ football ], [ tennis ] is 3-0=3, and if [ education ] contains one level of sub-classifications [ gao chou ], [ child bearing ], but [ gao chou ], [ child bearing ] is no sub-classification, then the label level of [ education ] is 3-1=2, [ gao ], [ child bearing ] is 3-0=3.
S4, converting each original label of each program into a uniform label in a label library, finishing the generation step of the uniform label of each program, storing the replaced or modified uniform label in the media resource library in real time to form a new original label in the media resource library;
s5, counting the exposure times and the total program number of each uniform label in the label library, and counting the total program number containing the uniform labels according to the exposure times of the uniform labels;
s6, calculating the weight of each unified label in the label library, wherein the calculation formula is as follows:
wherein, W i Is the weight of the uniform label i;
as can be seen from the above formula, the more times [ uniform label i ] is labeled in the label library, and the higher the grade of the uniform label i is, the smaller the weight value of the uniform label i is; on the contrary, if the number of times of labeling the uniform label i is less, and the label level of the uniform label is lower, the weight value of the uniform label i is higher, and the weight of the uniform label with a large range and less number of times of labeling can be smoothly reduced through the level of the uniform label i, so that the personalized favor of the user is highlighted;
in selecting the degree of influence of the label grading, in actual production, such situations are often encountered, some labels with small exposure but high label grading, such as labels [ sports ], in calculating the exposure
In (3), the unified tag weight 5.51513, but [ sports ]]The label level of (1) is a large-class label including two-level sub-classifications, and after multiplying by a label classification formula ln (label level +1 of uniform label i), the weight of the uniform label is reduced to 3.822537, so that the label classification formula ln (label level +1 of uniform label i) represents the influence of the label level on the weight of the uniform label, and the partial result calculated by using a label weight calculation formula is listed in table 3.
TABLE 3
Step S7, after weights are sequentially calculated for all the uniform labels in the label library, a uniform label weight vector W is generated according to a fixed sequence, namely:
wherein W is a uniform label weight vector, W 1 W2, W3 … Wn is the nth unified tag in the tag libraryThe weight of the label;
and S8, calculating a new article vector, and multiplying each label by the label weight W in sequence in the process of calculating the article vector to obtain the new article vector.
The present invention is not limited to the above embodiments, and any structural changes made under the teaching of the present invention shall fall within the protection scope of the present invention, which is similar or similar to the technical solutions of the present invention.
The techniques, shapes, and configurations not described in detail in the present invention are all known techniques.
Claims (5)
1. A method for improving a recommendation system to compute relevant recommendations, the method comprising: the method comprises the following steps:
s1, collecting tags of the medium resource contents in each large website by using a crawler technology to serve as original tags and form a medium resource library;
s2, combining all original labels of the label media asset library according to the existing near-sense word library to form a label library with uniform labels;
s3, manually classifying and classifying each unified label in the label library according to the category of the label to obtain a label grade, wherein the classification principle is that a reverse order mode is used for classification, the classification of the unified label is set as multi-grade classification, and each multi-grade sub-classification of the unified label reduces 1 of the unified label until the unified label is reduced to 1;
s4, converting each original label of each program into a uniform label in a label library;
s5, counting the exposure times and the total program number of each unified tag in a tag library;
s6, calculating the weight of each unified tag in the tag library;
s7, calculating weights of all unified tags in the tag library in sequence, and generating a unified tag weight vector W according to a fixed sequence;
and S8, calculating a new article vector.
2. The method of claim 1, wherein the method comprises the steps of: the unified tag classification is set to three levels, if the unified tag has no sub-classification, the tag level of the unified tag is 3, if the unified tag comprises a first-level sub-classification, the tag level of the unified tag is 2, if the first-level sub-classification under the unified tag further comprises a second-level sub-classification, and the tag level of the unified tag is 1.
3. The method of claim 1, wherein the method comprises: and converting the original label in the step S4 into a uniform label in the label library, and storing the converted uniform label in the media asset library in real time to obtain the original label in the media asset library.
5. The method of claim 1, wherein the method comprises: in step S8, in the process of calculating a new item vector, the label weight W is multiplied to each label in turn as a new item vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310076252.1A CN115795173A (en) | 2023-02-08 | 2023-02-08 | Method for improving recommendation system to calculate related recommendations |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310076252.1A CN115795173A (en) | 2023-02-08 | 2023-02-08 | Method for improving recommendation system to calculate related recommendations |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115795173A true CN115795173A (en) | 2023-03-14 |
Family
ID=85430276
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310076252.1A Pending CN115795173A (en) | 2023-02-08 | 2023-02-08 | Method for improving recommendation system to calculate related recommendations |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115795173A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116204737A (en) * | 2023-05-04 | 2023-06-02 | 海看网络科技(山东)股份有限公司 | Recommendation method, system, equipment and medium based on user behavior codes |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103389988A (en) * | 2012-05-10 | 2013-11-13 | 腾讯科技(深圳)有限公司 | Method and device for guiding user to carry out information search |
CN105045907A (en) * | 2015-08-10 | 2015-11-11 | 北京工业大学 | Method for constructing visual attention-label-user interest tree for personalized social image recommendation |
CN112232524A (en) * | 2020-12-14 | 2021-01-15 | 北京沃东天骏信息技术有限公司 | Multi-label information identification method and device, electronic equipment and readable storage medium |
CN113873333A (en) * | 2021-09-30 | 2021-12-31 | 海看网络科技(山东)股份有限公司 | Method for calculating program portrait on IPTV |
-
2023
- 2023-02-08 CN CN202310076252.1A patent/CN115795173A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103389988A (en) * | 2012-05-10 | 2013-11-13 | 腾讯科技(深圳)有限公司 | Method and device for guiding user to carry out information search |
CN105045907A (en) * | 2015-08-10 | 2015-11-11 | 北京工业大学 | Method for constructing visual attention-label-user interest tree for personalized social image recommendation |
CN112232524A (en) * | 2020-12-14 | 2021-01-15 | 北京沃东天骏信息技术有限公司 | Multi-label information identification method and device, electronic equipment and readable storage medium |
CN113873333A (en) * | 2021-09-30 | 2021-12-31 | 海看网络科技(山东)股份有限公司 | Method for calculating program portrait on IPTV |
Non-Patent Citations (1)
Title |
---|
熊回香等: "基于用户兴趣主题模型的个性化推荐研究" * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116204737A (en) * | 2023-05-04 | 2023-06-02 | 海看网络科技(山东)股份有限公司 | Recommendation method, system, equipment and medium based on user behavior codes |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Reddy et al. | Content-based movie recommendation system using genre correlation | |
CN107220365B (en) | Accurate recommendation system and method based on collaborative filtering and association rule parallel processing | |
CN106028071A (en) | Video recommendation method and system | |
CN104008139B (en) | The creation method and device of video index table, the recommendation method and apparatus of video | |
CN108965938B (en) | Method and system for predicting potential pay users in smart television | |
CN104063481A (en) | Film individuation recommendation method based on user real-time interest vectors | |
CN112364204B (en) | Video searching method, device, computer equipment and storage medium | |
CN104063383A (en) | Information recommendation method and device | |
CN103092958A (en) | Display method and device for search result | |
CN115168744A (en) | Radio and television technology knowledge recommendation method based on user portrait and knowledge graph | |
CN115795173A (en) | Method for improving recommendation system to calculate related recommendations | |
CN104933135A (en) | Method and device for clustering multimedia data | |
CN107493467A (en) | A kind of video quality evaluation method and device | |
CN104854588B (en) | System and method for searching for the predominantly non-textual project of label | |
CN112749330A (en) | Information pushing method and device, computer equipment and storage medium | |
CN108109058A (en) | A kind of single classification collaborative filtering method for merging personal traits and article tag | |
CN108540860B (en) | Video recall method and device | |
Lee et al. | Dynamic item recommendation by topic modeling for social networks | |
Kowald et al. | Popularity bias in collaborative filtering-based multimedia recommender systems | |
CN113873333A (en) | Method for calculating program portrait on IPTV | |
Chen et al. | Research and implementation of movie recommendation system based on deep learning | |
CN114912031A (en) | Mixed recommendation method and system based on clustering and collaborative filtering | |
Tewari et al. | Efficient tag based personalised collaborative movie reccommendation system | |
Ye et al. | A collaborative neural model for rating prediction by leveraging user reviews and product images | |
CN108805628B (en) | Electronic commerce recommendation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230314 |