WO2021068610A1 - Resource recommendation method and apparatus, electronic device and storage medium - Google Patents

Resource recommendation method and apparatus, electronic device and storage medium Download PDF

Info

Publication number
WO2021068610A1
WO2021068610A1 PCT/CN2020/105925 CN2020105925W WO2021068610A1 WO 2021068610 A1 WO2021068610 A1 WO 2021068610A1 CN 2020105925 W CN2020105925 W CN 2020105925W WO 2021068610 A1 WO2021068610 A1 WO 2021068610A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
information
resource information
resource
vector
Prior art date
Application number
PCT/CN2020/105925
Other languages
French (fr)
Chinese (zh)
Inventor
陆园丽
余玉霞
Original Assignee
平安国际智慧城市科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安国际智慧城市科技股份有限公司 filed Critical 平安国际智慧城市科技股份有限公司
Publication of WO2021068610A1 publication Critical patent/WO2021068610A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • This application relates to the field of data analysis technology, and in particular to a method, device, electronic device, and storage medium for resource recommendation.
  • the inventor realizes that two methods are used for resource selection and prediction: one is to use dominant features (for example, similar features such as content and/or attributes) for resource selection.
  • This method generally requires a large amount of feature engineering to find a suitable feature combination.
  • the effect of the feature combination determines the quality of the final screening and prediction effect to a certain extent, and the accuracy needs to be improved; the other is to use machine learning algorithms to calculate hidden Type features (for example, the content and/or attributes are not similar, but there are certain related features) to filter resources.
  • this method can alleviate data sparsity to a certain extent, but there is resource update The accuracy of the characteristics of slowness and low interpretability of the results needs to be improved.
  • a method for recommending resources includes:
  • the algorithm decomposes and calculates the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
  • a weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and recommended to the user.
  • a device for recommending resources includes:
  • a first generation module configured to obtain first user information of a user, obtain first resource information explicitly associated with the user, and generate a user explicit vector based on the first user information and first resource information;
  • the second generation module is configured to obtain second resource information, obtain second user information of users who are explicitly associated with the second resource information, and generate and display the second user information based on the second resource information and the second user information.
  • the decomposition module is used to obtain the hidden behavior characteristics of users, obtain the third user information and third resource information associated with the hidden behavior characteristics, and construct a triple relationship based on the third user information and third resource information Matrix, using a predetermined algorithm to decompose and calculate the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
  • a calculation module configured to calculate the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculate the second similarity between the user implicit vector and the corresponding resource implicit vector;
  • the recommendation module is configured to perform a weighted summation on the first similarity and the second similarity, select resource information based on the result of the weighted summation, and recommend to the user.
  • An electronic device including a memory and a processor connected to the memory, the memory stores a computer program that can run on the processor, and the computer program is executed by the processor to implement the following steps:
  • the algorithm decomposes and calculates the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
  • a weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and recommended to the user.
  • a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the following steps are implemented:
  • the algorithm decomposes and calculates the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
  • a weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and recommended to the user.
  • FIG. 1 is a schematic diagram of the hardware architecture of an embodiment of an electronic device of this application.
  • Figure 2 is a program module diagram of an embodiment of a resource recommendation device
  • FIG. 3 is a schematic flowchart of an embodiment of a resource recommendation method of this application.
  • the electronic device 1 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • the electronic device 1 may be a computer, a single web server, a server group composed of multiple web servers, or a cloud composed of a large number of hosts or web servers based on cloud computing, where cloud computing is a type of distributed computing, A super virtual computer composed of a group of loosely coupled computer sets.
  • the electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13 that can be communicably connected to each other through a system bus.
  • the memory 11 stores a computer program that can run on the processor 12. It should be pointed out that FIG. 1 only shows the electronic device 1 with the components 11-13, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the memory 11 includes a memory and at least one type of readable storage medium, and the readable storage medium may be non-volatile or volatile.
  • the memory provides a cache for the operation of the electronic device 1;
  • the readable storage medium can be, for example, flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM) ), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks and other non-volatile storage media.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1.
  • the nonvolatile storage medium may also be an external storage unit of the electronic device 1.
  • Storage devices such as plug-in hard disks, Smart Media Card (SMC), Secure Digital (SD) cards, flash memory cards (Flash Card), etc., equipped on the electronic device 1.
  • the readable storage medium of the memory 11 is generally used to store the operating system and various application software installed in the electronic device 1, for example, to store the code of the computer program 14 in an embodiment of the present application.
  • the memory 11 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 12 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments, and is used to run data stored in the memory 11 Program code or processing data, such as running computer program 14 and so on.
  • CPU Central Processing Unit
  • controller microcontroller
  • microprocessor or other data processing chip in some embodiments, and is used to run data stored in the memory 11 Program code or processing data, such as running computer program 14 and so on.
  • the network interface 13 may include a standard wireless network interface and a wired network interface.
  • the network interface 13 is usually used to establish a communication connection between the electronic device 1 and other electronic devices.
  • the computer program 14 is stored in the memory 11, and includes at least one computer readable instruction stored in the memory 11, and the at least one computer readable instruction can be executed by the processor 12 to implement the method of each embodiment of the present application; And, the at least one computer-readable instruction can be divided into different logic modules according to the different functions implemented by each part thereof.
  • the algorithm decomposes and calculates the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
  • a weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and recommended to the user.
  • the first user information includes basic information and behavior information of the user
  • the first resource information includes resource information with the same or different service attributes
  • the first resource information is based on the first user information and the first resource.
  • the steps for generating user explicit vectors from information include:
  • the multi-dimensional vector including basic information and business attributes
  • the behavior information includes explicit behavior characteristics and implicit behavior characteristics, and the step of assigning values to the business attributes in the multidimensional vector based on each resource information, basic information, and behavior information in the first resource information , Specifically including:
  • the users are grouped based on the basic information, and the user's preference degree for each resource information in the first resource information is predicted through the association rules in the group, and the preference degree is used as the value of the corresponding business attribute.
  • the step of performing a weighted summation on the first similarity degree and the second similarity degree, and selecting resource information based on the result of the weighted summation and recommending to the user specifically includes:
  • the shelf time and popularity of each resource information are acquired, and multiple resource information is selected based on the total similarity, the shelf time and popularity of each resource information, and recommended to the user.
  • a program module diagram of the device 10 for resource recommendation is divided into multiple modules, and the multiple modules are stored in the memory 12 and executed by the processor 13 to complete the application.
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
  • the resource recommendation device 10 can be divided into: a first generation module 101, a second generation module 102, a decomposition module 103, a calculation module 104, and a recommendation module 105.
  • the first generating module 101 is configured to obtain first user information of a user, obtain first resource information explicitly associated with the user, and generate a user explicit vector based on the first user information and the first resource information;
  • the second generation module 102 is configured to obtain second resource information, obtain second user information of users that are explicitly associated with the second resource information, and generate and share information based on the second resource information and second user information.
  • Explicit vectors of resources with the same dimensions as the user's explicit vectors;
  • the decomposition module 103 is used to obtain the hidden behavior feature of the user, obtain the third user information and the third resource information associated with the hidden behavior feature, and construct the third user information and third resource information based on the third user information and the third resource information.
  • the calculation module 104 is configured to calculate the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculate the second similarity between the user implicit vector and the corresponding resource implicit vector;
  • the recommendation module 105 is configured to perform a weighted summation on the first similarity degree and the second similarity degree, select resource information based on the result of the weighted summation, and make recommendations to the user.
  • FIG. 3 is a schematic flowchart of an embodiment of a resource recommendation method of this application.
  • the processor 13 of the electronic device 1 executes the computer program 14 stored in the memory 12, the following steps of the method are implemented:
  • Step S1 acquiring first user information of a user, acquiring first resource information explicitly associated with the user, and generating a user explicit vector based on the first user information and first resource information;
  • the first user information includes the user's basic information and behavior information.
  • the basic information includes gender, age, consumption ability, work information, etc.
  • the behavior information is the behavior operation information of the user when browsing or operating resources, which can be obtained from the log, Including dominant behavior characteristics and recessive behavior characteristics.
  • Explicit behavior characteristics can directly reflect the user's preference for resources. Explicit behavior characteristics such as collecting, liking, sharing, etc., recessive behavior characteristics cannot directly reflect the user's preference for resources, such as resource page browsing Time, search keywords, comments, clicks, mouse sliding, etc.
  • the first resource information is resource information on the network, including various resource information with the same or different business attributes, which are distinguished according to business attributes.
  • the resource information can be product information, sales information, training information, artificial intelligence information, etc.
  • the user's behavior information when browsing or operating resource information is an explicit behavior feature, then the user is explicitly associated with the resource information.
  • the step of generating a user explicit vector based on the first user information and the first resource information specifically includes:
  • multi-dimensional vectors (a 1 , a 2 ,..., a j , b 1 , b 2 ,..., b k ), where the multi-dimensional vectors support scalable and configurable operations in the form of configuration files.
  • a 1 , a 2 ,..., a j is the user's basic information (including gender, age, consumption ability, work information, etc.), and its value is 0 or 1.
  • the basic information in the multidimensional vector is assigned.
  • the discrete variable can directly obtain the corresponding value, and the minimum entropy bin method is used to discretize the value of the continuous variable to obtain the corresponding value.
  • the value corresponding to male gender is 0, and the value corresponding to female gender is 1; for age, the value corresponding to 20 years old and above including 20 years old is 0, and the value corresponding to under 20 years old is 1; for work information , The value corresponding to the writer is 0, not the value corresponding to the writer is 1.
  • b 1 , b 2 ,..., b k are the business attributes of each resource information of the first resource information that is explicitly associated with the first user information.
  • the business attributes of each resource information of the first resource information can be determined by the following method: pre-build a business attribute label structure that meets the business development goals, and then extract the text information of each resource information of the first resource information, and then The subsequent processing of information can adopt existing technologies, namely word segmentation, data cleaning, LDA subject extraction, vectorization, and vector-based business attribute similarity calculation.
  • the similarity between the text information and the corresponding business attribute label exceeds the threshold, the first The business attribute of each resource information of the resource information is the business attribute pointed to by the business attribute tag.
  • the value of each business attribute can be obtained in any of the following predetermined ways:
  • the first method is to obtain the explicit behavior characteristics and time information generated when the user operates on each resource information in the first resource information, and calculate the user's corresponding resource information based on the explicit behavior characteristics and time information.
  • the preference degree of information is taken as the value of the corresponding business attribute, that is, the user’s preference degree for resource information of each business attribute is calculated through the user’s explicit behavior characteristics and time factor, as the value of each business attribute:
  • the resource information of a business attribute, the user's execution of the corresponding behavior is closely related to time.
  • t is the number of days since the user performed the dominant behavior feature on the resource information
  • ⁇ , ⁇ , c, t ⁇ are constant parameters, ⁇ >0, ⁇ >0, c>0, ⁇ , ⁇ , c
  • the default values of, t ⁇ are 1, 0.42, 0.025, and 0.0025 respectively.
  • the corresponding values of ⁇ , ⁇ , c, and t ⁇ can also be generated according to the data changes of the business attribute. Since users may browse or operate the same resource information at different times, they can be summarized according to user and business attributes.
  • the maximum value of the period can be taken
  • the preference degree is taken as the value of b
  • the preference degree corresponding to each business attribute is corresponding to the value of each business attribute b 1 , b 2 ,..., b k .
  • the fusion time factor is proposed to calculate the user's preference degree, and the correlation of the time factor is beneficial to improve the accuracy of resource recommendation.
  • the second method is to group users based on the basic information, and predict the user's preference for resource information of each business attribute through the association rules in the group, as the value of each business attribute: group users, and define the group It can be judged according to the user’s data scale. For example, if the user’s data scale is small, all users belong to the entire group. If the user’s data scale is large, it can be based on the user’s basic information, such as grouping by region or industry. Each user has a corresponding group. In the spark platform, the association rule (FP-Growth) algorithm is used to predict the user's preferred resource information in each group.
  • the association rule (FP-Growth) algorithm is used to predict the user's preferred resource information in each group.
  • G ⁇ g 1 , g 2 , g 3 ,..., g n ⁇
  • n the number of groups.
  • each group's preference resource list and recommendation scores based on resource frequent items, and then obtain (recommendation scores of users, business attributes corresponding to the preference resources) based on the relationship R, each group's preference resource list and recommendation scores, and use recommendation scores
  • the value is taken as the value of the corresponding business attributes b 1 , b 2 ,..., b k .
  • the association rules within the group proposed in this embodiment help to solve the problem of excessive consumption of calculation resources of the association rule algorithm on the one hand, and on the other hand, it helps to enhance the group effect of users and improve the accuracy of resource recommendation.
  • the third way is to merge the above-mentioned first way and the second way, that is, the business attribute values b 1 , b 2 ,..., b k in the first way and the business attribute values in the second way
  • b 1 , b 2 ,..., b k are corresponding
  • a weighted sum is performed, and each weight can be determined in advance.
  • the weights corresponding to the values of the business attributes in the first method are all the same, for example, 0.55
  • the weights corresponding to the values of the business attributes in the second method are all the same, for example, 0.45.
  • a user explicit vector is generated.
  • Step S2 acquiring second resource information, acquiring second user information of users explicitly associated with the second resource information, and generating based on the second resource information and second user information that have the same dimensions as the user explicit vector Explicit vector of resources;
  • the second resource information is also resource information on the network, including various resource information with the same or different service attributes.
  • the second user information also includes the user's basic information and behavior information.
  • the behavior information of the user when browsing or operating resource information is an explicit behavior feature
  • the resource information is explicitly associated with the user
  • the user information of the user is obtained. All explicit associations
  • the user information of the user constitutes the second user information.
  • generating an explicit vector of resources with the same dimension as the explicit vector of the user specifically includes:
  • multi-dimensional vectors (A 1 ,A 2 ,...,A j ,B 1 ,B 2 ,...,B k ), where the multi-dimensional vectors support scalable and configurable operations in the form of configuration files.
  • the dimensions of the aforementioned user explicit vectors are the same.
  • B 1 , B 2 ,..., B k are the business attributes of each resource information in the second resource information, which are determined by the following method: pre-build a business attribute label structure that meets the business development goals, and then compare the business attributes in the second resource information
  • the text information of each resource information is extracted, and the subsequent processing of the text information can use the existing technology, namely word segmentation, data cleaning, LDA subject extraction, vectorization, vector-based business attribute similarity calculation, when the text information and the corresponding business attribute label
  • the similarity of, exceeds the threshold the business attribute of each resource information in the second resource information is the business attribute pointed to by the business attribute tag, and its value is the similarity with the corresponding business attribute tag.
  • a 1 , A 2 ,..., A j are basic information of users that are explicitly associated with the second resource information.
  • the users who are explicitly associated with the second resource information may have multiple attribute tags (for example, male, high-consumption group, R&D engineer, etc.), which makes the user's basic information relatively scattered.
  • the resource information of one type of business attribute may be browsed and operated by different users, but the resource information of this type of business attribute may not actually be applicable to all these users. Therefore, this embodiment selects the corresponding attribute tag by clustering. Users, and get the values of A 1 , A 2 ,..., A j based on the user's basic information of these attribute tags.
  • the clustering in this embodiment can use the kmeans algorithm: group users (which can be based on basic user information, such as grouping by region or industry), and based on the analysis of user groups, a preset number of center points can be obtained, set to k, Clustering the basic information related to users with historical behavior information records can obtain the clustering center of the basic information, and obtain the relationship between the user's basic information and the clustering center [basic information list, clustering center]. After clustering, the relationship [resource information, basic information list] can be obtained through the user's historical behavior information, and the two relationships can be merged to obtain the relationship [resource information, clustering center].
  • the value of N cluster centers is selected according to the total number of users of the attribute category, and the values of the first 3 cluster centers can be defaulted. Then the weighted sum is finally performed according to the proportion of users as the weight, and the result of the weighted sum is Get the final values of A 1 , A 2 ,..., A j.
  • This embodiment vectorizes user explicit features and resource explicit features to obtain user explicit vectors and resource explicit vectors, which avoids the process of feature combination and related preprocessing of a large number of features, and can reduce the complexity of calculation.
  • Step S3 Obtain the hidden behavior feature of the user, obtain third user information and third resource information associated with the hidden behavior feature, and construct a triple relationship based on the third user information and third resource information Matrix, using a predetermined algorithm to decompose and calculate the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
  • the user's hidden behavior characteristics can be obtained from the log, and third user information and third resource information associated with the hidden behavior characteristics can be obtained, where the third user information also includes the user's basic information, and the third user information Resource information is also resource information on the network, including resource information with the same or different business attributes.
  • R[user, product, rating] based on the third user information and the third resource information.
  • User represents users and product represents resources.
  • Rating represents the rating (that is, the degree of preference).
  • the rating corresponding to the implicit behavior feature is uniformly defined as 1.
  • the triple relation matrix R is a sparse matrix with many missing items.
  • This embodiment is based on the spark platform and uses a predetermined algorithm (alternating least squares ALS) to calculate the user implicit vector and the resource implicit vector, thereby obtaining the user implicit vector and the resource implicit vector.
  • a predetermined algorithm alternating least squares ALS
  • R m*n u m*k ⁇ p k*n ;
  • um*k represents the user’s preference for hidden behavior features
  • p k*n represents the degree to which the resource contains hidden behavior features.
  • U m*k and p k*n can be calculated by the above formula, The calculated u m*k is used as the user implicit vector, and p k*n is used as the resource implicit vector.
  • Step S4 Calculate the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculate the second similarity between the user implicit vector and the corresponding resource implicit vector;
  • the locality-sensitive hashing algorithm (Locality-Sensitive Hashing, LSH) is used to calculate the similarity. .
  • LSH Locality-sensitive hashing algorithm
  • Step S5 Perform a weighted summation on the first similarity and the second similarity, select resources based on the result of the weighted summation, and recommend to the user.
  • Second similarity Normalize to the (0,1) interval to obtain sim explict and sim implict , and perform a weighted summation on the two sets of similarity values to obtain the total similarity:
  • the weights ⁇ and ⁇ are mainly set in two ways.
  • One way is expert scoring, which sets fixed values of ⁇ and ⁇ .
  • the other method is to determine through linear regression. By randomly sampling users, the sampled users are used as experience officers to score the similarity of the provided resources, and the scoring results are used as training data to generate the values of ⁇ and ⁇ . .
  • the topN resource information is selected to calculate the priority of each resource information, and the priority sorting is adopted View represents the heat (the previous day is the default), age represents the number of days from the current shelf time of the resource information, and the constant parameters i and j are all set to 1 by default. Finally, the sorted topN resource information can be pushed to the user.
  • this embodiment combines the user's recessive features and the resource's recessive features on the basis of the user's dominant feature and the resource's dominant feature, and the existing recommendation algorithm can be modified to improve resource recommendation. Accuracy, while enhancing the interpretability of the system.
  • the embodiment of the present application also proposes a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium may be a hard disk, a multimedia card, SD card, flash memory card, SMC, read only memory (ROM), erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, etc. or Any combination of several.
  • the computer-readable storage medium includes a computer program. For the functions that the computer program implements when executed by the processor, please refer to the above introduction with respect to FIG. 3, which will not be repeated here.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.
  • a terminal device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A resource recommendation method and apparatus, an electronic device and a storage medium, relating to a data analysis technique, the method comprising: acquiring first user information of a user, acquiring first resource information explicitly associated with the user and, on the basis of the first user information and the first resource information, generating a user explicit vector (S1); acquiring second resource information, acquiring second user information of the user explicitly associated with the second resource information and, on the basis of the second resource information and the second user information, generating a resource explicit vector of the same dimension as the user explicit vector (S2); acquiring an implicit behaviour feature of the user, acquiring third user information and third resource information associated with the implicit behaviour feature and, on the basis of the third user information and the third resource information, constructing a triplet relationship matrix and using a predetermined algorithm to decompose the triplet relationship matrix to obtain a user implicit vector and a resource implicit vector of the user (S3); calculating a first similarity of the user explicit vector of the user and the corresponding resource explicit vector and calculating a second similarity of the user implicit vector and the corresponding resource implicit vector (S4); implementing a weighted sum calculation of the first similarity and the second similarity and, on the basis of the result of the weighted sum, selecting resource information and recommending same to the user (S5). The present method can increase the accuracy of resource recommendation.

Description

资源推荐的方法、装置、电子设备及存储介质Resource recommendation method, device, electronic equipment and storage medium
本申请要求于2019年10月12日提交中国专利局、申请号为201910970985.3、名称为“资源推荐的方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, with application number 201910970985.3, titled "Resource Recommendation Method, Apparatus, and Storage Medium" on October 12, 2019, the entire content of which is incorporated into this application by reference in.
技术领域Technical field
本申请涉及数据分析技术领域,尤其涉及一种资源推荐的方法、装置、电子设备及存储介质。This application relates to the field of data analysis technology, and in particular to a method, device, electronic device, and storage medium for resource recommendation.
背景技术Background technique
用户画像及资源画像是提高推荐系统准确性的重要方式,全面精准的标签可以充分体现用户特征及资源特征,依据画像形成的特征可为用户生成个性化资源池,从而实现千人千面的效果,提高推荐的精准度,同时,提升用户满意度。User portraits and resource portraits are important ways to improve the accuracy of the recommendation system. Comprehensive and accurate tags can fully reflect user characteristics and resource characteristics. According to the characteristics formed by the portraits, a personalized resource pool can be generated for users, so as to achieve the effect of thousands of people. , Improve the accuracy of recommendations, and at the same time, improve user satisfaction.
目前,画像在推荐系统应用中,发明人意识到采用两种方法进行资源的筛选及并形成预测:一种是采用显性特征(例如,内容和/或属性等相似的特征)进行资源筛选,这种方法一般需要进行大量的特征工程找到合适的特征组合,特征组合的效果在一定程度决定着最终筛选及预测效果的好坏,准确性有待提高;另一种是为采用机器学习算法计算隐式特征(例如,内容和/或属性等不相似,但存在一定关联的特征)进行资源筛选,在数据量很大的情况下,这种方法在一定程度能缓解数据稀疏性,但是存在资源更新慢、结果的可解释性低的特点,准确性也有待提高。At present, in the application of portraits in recommendation systems, the inventor realizes that two methods are used for resource selection and prediction: one is to use dominant features (for example, similar features such as content and/or attributes) for resource selection. This method generally requires a large amount of feature engineering to find a suitable feature combination. The effect of the feature combination determines the quality of the final screening and prediction effect to a certain extent, and the accuracy needs to be improved; the other is to use machine learning algorithms to calculate hidden Type features (for example, the content and/or attributes are not similar, but there are certain related features) to filter resources. In the case of a large amount of data, this method can alleviate data sparsity to a certain extent, but there is resource update The accuracy of the characteristics of slowness and low interpretability of the results needs to be improved.
发明内容Summary of the invention
一种资源推荐的方法,所述资源推荐的方法包括:A method for recommending resources, the method for recommending resources includes:
获取用户的第一用户信息,获取与所述用户显性关联的第一资源信息,基于所述第一用户信息、第一资源信息生成用户显式向量;Acquiring first user information of a user, acquiring first resource information explicitly associated with the user, and generating a user explicit vector based on the first user information and the first resource information;
获取第二资源信息,获取与所述第二资源信息显性关联的用户的第二用户信息,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量;Obtain second resource information, obtain second user information of users explicitly associated with the second resource information, and generate a resource display with the same dimensions as the user explicit vector based on the second resource information and second user information Formula vector
获取用户的隐性行为特征,获取与所述隐性行为特征关联的第三用户信息、第三资源信息,基于所述第三用户信息、第三资源信息构建三元组关系矩阵,利用预定的算法对所述三元组关系矩阵进行分解计算,得到所述用户的用户隐式向量、资源隐式向量;Obtain the user's hidden behavior characteristics, obtain the third user information and third resource information associated with the hidden behavior characteristics, construct a triple relationship matrix based on the third user information and third resource information, and use the predetermined The algorithm decomposes and calculates the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
计算所述用户的用户显式向量与对应的资源显式向量的第一相似度,计算所述用户隐式向量与对应的资源隐式向量的第二相似度;Calculating the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculating the second similarity between the user implicit vector and the corresponding resource implicit vector;
对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐。A weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and recommended to the user.
一种资源推荐的装置,所述资源推荐的装置包括:A device for recommending resources, the device for recommending resources includes:
第一生成模块,用于获取用户的第一用户信息,获取与所述用户显性关联的第一资源信息,基于所述第一用户信息、第一资源信息生成用户显式向量;A first generation module, configured to obtain first user information of a user, obtain first resource information explicitly associated with the user, and generate a user explicit vector based on the first user information and first resource information;
第二生成模块,用于获取第二资源信息,获取与所述第二资源信息显性关联的用户的第二用户信息,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量;The second generation module is configured to obtain second resource information, obtain second user information of users who are explicitly associated with the second resource information, and generate and display the second user information based on the second resource information and the second user information. Explicit vector of resources with the same dimension of formula vector;
分解模块,用于获取用户的隐性行为特征,获取与所述隐性行为特征关联的第三用户信息、第三资源信息,基于所述第三用户信息、第三资源信息构建三元组关系矩阵,利用预定的算法对所述三元组关系矩阵进行分解计算,得到所述用户的用户隐式向量、资源隐式向量;The decomposition module is used to obtain the hidden behavior characteristics of users, obtain the third user information and third resource information associated with the hidden behavior characteristics, and construct a triple relationship based on the third user information and third resource information Matrix, using a predetermined algorithm to decompose and calculate the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
计算模块,用于计算所述用户的用户显式向量与对应的资源显式向量的第一相似度,计算所述用户隐式向量与对应的资源隐式向量的第二相似度;A calculation module, configured to calculate the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculate the second similarity between the user implicit vector and the corresponding resource implicit vector;
推荐模块,用于对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐。The recommendation module is configured to perform a weighted summation on the first similarity and the second similarity, select resource information based on the result of the weighted summation, and recommend to the user.
一种电子设备,包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如下步骤:An electronic device including a memory and a processor connected to the memory, the memory stores a computer program that can run on the processor, and the computer program is executed by the processor to implement the following steps:
获取用户的第一用户信息,获取与所述用户显性关联的第一资源信息,基于所述第一用户信息、第一资源信息生成用户显式向量;Acquiring first user information of a user, acquiring first resource information explicitly associated with the user, and generating a user explicit vector based on the first user information and the first resource information;
获取第二资源信息,获取与所述第二资源信息显性关联的用户的第二用户信息,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量;Obtain second resource information, obtain second user information of users explicitly associated with the second resource information, and generate a resource display with the same dimensions as the user explicit vector based on the second resource information and second user information Formula vector
获取用户的隐性行为特征,获取与所述隐性行为特征关联的第三用户信息、第三资源信息,基于所述第三用户信息、第三资源信息构建三元组关系矩阵,利用预定的算法对所述三元组关系矩阵进行分解计算,得到所述用户的用户隐式向量、资源隐式向量;Obtain the user's hidden behavior characteristics, obtain the third user information and third resource information associated with the hidden behavior characteristics, construct a triple relationship matrix based on the third user information and third resource information, and use the predetermined The algorithm decomposes and calculates the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
计算所述用户的用户显式向量与对应的资源显式向量的第一相似度,计算所述用户隐式向量与对应的资源隐式向量的第二相似度;Calculating the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculating the second similarity between the user implicit vector and the corresponding resource implicit vector;
对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐。A weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and recommended to the user.
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如下步骤:A computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the following steps are implemented:
获取用户的第一用户信息,获取与所述用户显性关联的第一资源信息,基于所述第一用户信息、第一资源信息生成用户显式向量;Acquiring first user information of a user, acquiring first resource information explicitly associated with the user, and generating a user explicit vector based on the first user information and the first resource information;
获取第二资源信息,获取与所述第二资源信息显性关联的用户的第二用户信息,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量;Obtain second resource information, obtain second user information of users explicitly associated with the second resource information, and generate a resource display with the same dimensions as the user explicit vector based on the second resource information and second user information Formula vector
获取用户的隐性行为特征,获取与所述隐性行为特征关联的第三用户信息、第三资源信息,基于所述第三用户信息、第三资源信息构建三元组关系矩阵,利用预定的算法对所述三元组关系矩阵进行分解计算,得到所述用户的用户隐式向量、资源隐式向量;Obtain the user's hidden behavior characteristics, obtain the third user information and third resource information associated with the hidden behavior characteristics, construct a triple relationship matrix based on the third user information and third resource information, and use the predetermined The algorithm decomposes and calculates the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
计算所述用户的用户显式向量与对应的资源显式向量的第一相似度,计算所述用户隐式向量与对应的资源隐式向量的第二相似度;Calculating the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculating the second similarity between the user implicit vector and the corresponding resource implicit vector;
对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐。A weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and recommended to the user.
附图说明Description of the drawings
图1为本申请电子设备一实施例的硬件架构的示意图;FIG. 1 is a schematic diagram of the hardware architecture of an embodiment of an electronic device of this application;
图2为资源推荐的装置一实施例的程序模块图;Figure 2 is a program module diagram of an embodiment of a resource recommendation device;
图3为本申请资源推荐的方法一实施例的流程示意图。FIG. 3 is a schematic flowchart of an embodiment of a resource recommendation method of this application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。It should be noted that the descriptions related to "first", "second", etc. in this application are only for descriptive purposes, and cannot be understood as indicating or implying their relative importance or implicitly indicating the number of indicated technical features . Therefore, the features defined with "first" and "second" may explicitly or implicitly include at least one of the features. In addition, the technical solutions between the various embodiments can be combined with each other, but it must be based on what can be achieved by a person of ordinary skill in the art. When the combination of technical solutions is contradictory or cannot be achieved, it should be considered that such a combination of technical solutions does not exist. , Is not within the scope of protection required by this application.
参阅图1所示,是本申请电子设备一实施例的硬件架构的示意图。电子设备1是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。所述电子设备1可以是计算机、也可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。Refer to FIG. 1, which is a schematic diagram of the hardware architecture of an embodiment of the electronic device of the present application. The electronic device 1 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. The electronic device 1 may be a computer, a single web server, a server group composed of multiple web servers, or a cloud composed of a large number of hosts or web servers based on cloud computing, where cloud computing is a type of distributed computing, A super virtual computer composed of a group of loosely coupled computer sets.
在本实施例中,电子设备1可包括,但不仅限于,可通过系统总线相互通信连接的存储器11、处理器12、网络接口13,存储器11存储有可在处理器12上运行的计算机程序。需要指出的是,图1仅示出了具有组件11-13的电子设备1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。In this embodiment, the electronic device 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13 that can be communicably connected to each other through a system bus. The memory 11 stores a computer program that can run on the processor 12. It should be pointed out that FIG. 1 only shows the electronic device 1 with the components 11-13, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
其中,存储器11包括内存及至少一种类型的可读存储介质,所述可读存储介质可以是非易失性,也可以是易失性,。内存为电子设备1的运行提供缓存;可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等的非易失性存储介质。在一些实施例中,可读存储介质可以是电子设备1的内部存储单元,例如该电子设备1的硬盘;在另一些实施例中,该非易失性存储介质也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。本实施例中,存储器11的可读存储介质通常用于存储安装于电子设备1的操作系统和各类应用软件,例如存储本申请一实施例中的计算机程序14的代码等。此外,存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 11 includes a memory and at least one type of readable storage medium, and the readable storage medium may be non-volatile or volatile. The memory provides a cache for the operation of the electronic device 1; the readable storage medium can be, for example, flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM) ), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks and other non-volatile storage media. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1. In other embodiments, the nonvolatile storage medium may also be an external storage unit of the electronic device 1. Storage devices, such as plug-in hard disks, Smart Media Card (SMC), Secure Digital (SD) cards, flash memory cards (Flash Card), etc., equipped on the electronic device 1. In this embodiment, the readable storage medium of the memory 11 is generally used to store the operating system and various application software installed in the electronic device 1, for example, to store the code of the computer program 14 in an embodiment of the present application. In addition, the memory 11 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片,用于运行所述存储器11中存储的程序代码或者处理数据,例如运行计算机程序14等。The processor 12 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments, and is used to run data stored in the memory 11 Program code or processing data, such as running computer program 14 and so on.
所述网络接口13可包括标准的无线网络接口、有线网络接口,该网络接口13通常用于在所述电子设备1与其他电子设备之间建立通信连接。The network interface 13 may include a standard wireless network interface and a wired network interface. The network interface 13 is usually used to establish a communication connection between the electronic device 1 and other electronic devices.
所述计算机程序14存储在存储器11中,包括至少一个存储在存储器11中的计算机可读指令,该至少一个计算机可读指令可被处理器器12执行,以实现 本申请各实施例的方法;以及,该至少一个计算机可读指令依据其各部分所实现的功能不同,可被划为不同的逻辑模块。The computer program 14 is stored in the memory 11, and includes at least one computer readable instruction stored in the memory 11, and the at least one computer readable instruction can be executed by the processor 12 to implement the method of each embodiment of the present application; And, the at least one computer-readable instruction can be divided into different logic modules according to the different functions implemented by each part thereof.
在一实施例中,上述计算机程序14被所述处理器12执行时实现如下步骤:In an embodiment, when the above-mentioned computer program 14 is executed by the processor 12, the following steps are implemented:
获取用户的第一用户信息,获取与所述用户显性关联的第一资源信息,基于所述第一用户信息、第一资源信息生成用户显式向量;Acquiring first user information of a user, acquiring first resource information explicitly associated with the user, and generating a user explicit vector based on the first user information and the first resource information;
获取第二资源信息,获取与所述第二资源信息显性关联的用户的第二用户信息,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量;Obtain second resource information, obtain second user information of users explicitly associated with the second resource information, and generate a resource display with the same dimensions as the user explicit vector based on the second resource information and second user information Formula vector
获取用户的隐性行为特征,获取与所述隐性行为特征关联的第三用户信息、第三资源信息,基于所述第三用户信息、第三资源信息构建三元组关系矩阵,利用预定的算法对所述三元组关系矩阵进行分解计算,得到所述用户的用户隐式向量、资源隐式向量;Obtain the user's hidden behavior characteristics, obtain the third user information and third resource information associated with the hidden behavior characteristics, construct a triple relationship matrix based on the third user information and third resource information, and use the predetermined The algorithm decomposes and calculates the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
计算所述用户的用户显式向量与对应的资源显式向量的第一相似度,计算所述用户隐式向量与对应的资源隐式向量的第二相似度;Calculating the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculating the second similarity between the user implicit vector and the corresponding resource implicit vector;
对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐。A weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and recommended to the user.
优选地,所述第一用户信息包括用户的基本信息及行为信息,所述第一资源信息中包括具有相同或不同业务属性的各资源信息,所述基于所述第一用户信息、第一资源信息生成用户显式向量的步骤,具体包括:Preferably, the first user information includes basic information and behavior information of the user, the first resource information includes resource information with the same or different service attributes, and the first resource information is based on the first user information and the first resource. The steps for generating user explicit vectors from information include:
获取预先定义的多维向量,所述多维向量包括基础信息及业务属性;Obtaining a pre-defined multi-dimensional vector, the multi-dimensional vector including basic information and business attributes;
基于所述基本信息为所述多维向量中的基础信息赋值,基于所述第一资源信息中的各资源信息、基本信息及行为信息为所述多维向量中的业务属性赋值,以赋值后的多维向量作为所述用户显式向量。Assign values to the basic information in the multi-dimensional vector based on the basic information, and assign values to the business attributes in the multi-dimensional vector based on the resource information, basic information, and behavior information in the first resource information, so that the assigned multi-dimensional vector The vector is used as the user explicit vector.
优选地,所述行为信息包括显性行为特征及隐性行为特征,所述基于所述第一资源信息中的各资源信息、基本信息及行为信息为所述多维向量中的业务属性赋值的步骤,具体包括:Preferably, the behavior information includes explicit behavior characteristics and implicit behavior characteristics, and the step of assigning values to the business attributes in the multidimensional vector based on each resource information, basic information, and behavior information in the first resource information , Specifically including:
获取所述用户对所述第一资源信息中的各资源信息操作时产生的显性行为特征及时间信息,基于所述显性行为特征及时间信息计算所述用户对相应的资源信息的偏好程度,以该偏好程度作为对应的业务属性的值;或者Obtain explicit behavior characteristics and time information generated when the user operates on each resource information in the first resource information, and calculate the user's preference for corresponding resource information based on the explicit behavior characteristics and time information , Use the preference degree as the value of the corresponding business attribute; or
基于所述基本信息对用户进行分组,通过群组内关联规则预测每个分组内用户对所述第一资源信息中的各资源信息的偏好程度,以该偏好程度作为对应的业务属性的值。The users are grouped based on the basic information, and the user's preference degree for each resource information in the first resource information is predicted through the association rules in the group, and the preference degree is used as the value of the corresponding business attribute.
优选地,所述对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐的步骤,具体包括:Preferably, the step of performing a weighted summation on the first similarity degree and the second similarity degree, and selecting resource information based on the result of the weighted summation and recommending to the user specifically includes:
分别将所述第一相似度及第二相似度归一化,获取预定的权值,基于归一化后的第一相似度、归一化后的第二相似度及权值进行加权求和,得到总相似度;Normalize the first similarity and the second similarity respectively to obtain a predetermined weight, and perform a weighted summation based on the normalized first similarity, the normalized second similarity and the weight , Get the total similarity;
获取各资源信息的上架时间及热度,基于所述总相似度、各资源信息的上架时间及热度选取多个资源信息并向所述用户进行推荐。The shelf time and popularity of each resource information are acquired, and multiple resource information is selected based on the total similarity, the shelf time and popularity of each resource information, and recommended to the user.
参照图2所示,为资源推荐的装置10的程序模块图。所述资源推荐的装置10被分割为多个模块,该多个模块被存储于存储器12中,并由处理器13执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。Referring to FIG. 2, a program module diagram of the device 10 for resource recommendation. The resource recommendation device 10 is divided into multiple modules, and the multiple modules are stored in the memory 12 and executed by the processor 13 to complete the application. The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
所述资源推荐的装置10可以被分割为:第一生成模块101、第二生成模块102、分解模块103、计算模块104及推荐模块105。The resource recommendation device 10 can be divided into: a first generation module 101, a second generation module 102, a decomposition module 103, a calculation module 104, and a recommendation module 105.
所述第一生成模块101,用于获取用户的第一用户信息,获取与所述用户显性关联的第一资源信息,基于所述第一用户信息、第一资源信息生成用户显式向量;The first generating module 101 is configured to obtain first user information of a user, obtain first resource information explicitly associated with the user, and generate a user explicit vector based on the first user information and the first resource information;
所述第二生成模块102,用于获取第二资源信息,获取与所述第二资源信息显性关联的用户的第二用户信息,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量;The second generation module 102 is configured to obtain second resource information, obtain second user information of users that are explicitly associated with the second resource information, and generate and share information based on the second resource information and second user information. Explicit vectors of resources with the same dimensions as the user's explicit vectors;
所述分解模块103,用于获取用户的隐性行为特征,获取与所述隐性行为特征关联的第三用户信息、第三资源信息,基于所述第三用户信息、第三资源信息构建三元组关系矩阵,利用预定的算法对所述三元组关系矩阵进行分解计算,得到所述用户的用户隐式向量、资源隐式向量;The decomposition module 103 is used to obtain the hidden behavior feature of the user, obtain the third user information and the third resource information associated with the hidden behavior feature, and construct the third user information and third resource information based on the third user information and the third resource information. A tuple relation matrix, decomposing and calculating the triple relation matrix using a predetermined algorithm, to obtain the user implicit vector and resource implicit vector of the user;
所述计算模块104,用于计算所述用户的用户显式向量与对应的资源显式向量的第一相似度,计算所述用户隐式向量与对应的资源隐式向量的第二相似度;The calculation module 104 is configured to calculate the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculate the second similarity between the user implicit vector and the corresponding resource implicit vector;
所述推荐模块105,用于对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐。The recommendation module 105 is configured to perform a weighted summation on the first similarity degree and the second similarity degree, select resource information based on the result of the weighted summation, and make recommendations to the user.
具体原理请参照下述图3关于该方法的流程图的介绍。For the specific principle, please refer to the introduction of the flow chart of the method in Figure 3 below.
如图3所示,图3为本申请资源推荐的方法一实施例的流程示意图,电子设备1的处理器13执行存储器12中存储的计算机程序14时实现该方法的如下步骤:As shown in FIG. 3, FIG. 3 is a schematic flowchart of an embodiment of a resource recommendation method of this application. When the processor 13 of the electronic device 1 executes the computer program 14 stored in the memory 12, the following steps of the method are implemented:
步骤S1,获取用户的第一用户信息,获取与所述用户显性关联的第一资源信息,基于所述第一用户信息、第一资源信息生成用户显式向量;Step S1, acquiring first user information of a user, acquiring first resource information explicitly associated with the user, and generating a user explicit vector based on the first user information and first resource information;
其中,第一用户信息包括用户的基本信息及行为信息,基本信息包括性别、年龄、消费能力、工作信息等,行为信息为用户在浏览或操作资源时的行为操作信息,可以从日志中获取,包括显性行为特征及隐性行为特征。显性行为特征能够直接反应用户对资源的喜好,显性行为特征例如为收藏、点赞、分享等,隐性行为特征不能够直接反应用户对资源的喜好,隐性行为特征例如为资源页面浏览时间、搜索关键字、评论、点击、鼠标滑动等。Among them, the first user information includes the user's basic information and behavior information. The basic information includes gender, age, consumption ability, work information, etc. The behavior information is the behavior operation information of the user when browsing or operating resources, which can be obtained from the log, Including dominant behavior characteristics and recessive behavior characteristics. Explicit behavior characteristics can directly reflect the user's preference for resources. Explicit behavior characteristics such as collecting, liking, sharing, etc., recessive behavior characteristics cannot directly reflect the user's preference for resources, such as resource page browsing Time, search keywords, comments, clicks, mouse sliding, etc.
第一资源信息为网络上的资源信息,包括具有相同或不同业务属性的各资源信息,按照业务属性区分,例如资源信息可以是产品信息、销售信息、培训信息、人工智能信息等。The first resource information is resource information on the network, including various resource information with the same or different business attributes, which are distinguished according to business attributes. For example, the resource information can be product information, sales information, training information, artificial intelligence information, etc.
其中,从用户的角度来看,用户在浏览或操作资源信息时的行为信息为显性行为特征,则该用户与该资源信息显性关联。Among them, from the perspective of the user, if the user's behavior information when browsing or operating resource information is an explicit behavior feature, then the user is explicitly associated with the resource information.
其中,基于第一用户信息、第一资源信息生成用户显式向量的步骤,具体包括:Wherein, the step of generating a user explicit vector based on the first user information and the first resource information specifically includes:
预先定义多维的向量(a 1,a 2,…,a j,b 1,b 2,…,b k),其中,多维的向量以配置文件的形式支持可扩展、可配置操作。a 1,a 2,…,a j为用户的各基础信息(包括性别、年龄、消费能力、工作信息等),其值为0或1,基于基本信息为多维向量中的基础信息赋值,对于离散变量可以直接获取对应的值,对于连续变量的值采用最小熵分箱方法进行离散化得到对应的值。例如,对于性别,性别为男对应的值为0,性别为女对应的值为1;对于年龄,20岁以上含20岁对应的值为0,20岁以下对应的值为1;对于工作信息,是作家对应的值为0,不是作家对应的值为1。 Predefine multi-dimensional vectors (a 1 , a 2 ,..., a j , b 1 , b 2 ,..., b k ), where the multi-dimensional vectors support scalable and configurable operations in the form of configuration files. a 1 , a 2 ,..., a j is the user's basic information (including gender, age, consumption ability, work information, etc.), and its value is 0 or 1. Based on the basic information, the basic information in the multidimensional vector is assigned. The discrete variable can directly obtain the corresponding value, and the minimum entropy bin method is used to discretize the value of the continuous variable to obtain the corresponding value. For example, for gender, the value corresponding to male gender is 0, and the value corresponding to female gender is 1; for age, the value corresponding to 20 years old and above including 20 years old is 0, and the value corresponding to under 20 years old is 1; for work information , The value corresponding to the writer is 0, not the value corresponding to the writer is 1.
b 1,b 2,…,b k为与第一用户信息显性关联的第一资源信息的各资源信息的 业务属性。对于第一资源信息的各资源信息的业务属性,可以通过以下方法确定:预先构建符合业务发展目标的业务属性标签结构,然后,对第一资源信息的各资源信息的文本信息进行抽取,对文本信息后续处理可采用现有技术,即分词、数据清洗、LDA主题提取、向量化、基于向量进行业务属性相似计算,当文本信息与相应的业务属性标签的相似度超过阈值时,则该第一资源信息的各资源信息的业务属性为该业务属性标签所指的业务属性。 b 1 , b 2 ,..., b k are the business attributes of each resource information of the first resource information that is explicitly associated with the first user information. The business attributes of each resource information of the first resource information can be determined by the following method: pre-build a business attribute label structure that meets the business development goals, and then extract the text information of each resource information of the first resource information, and then The subsequent processing of information can adopt existing technologies, namely word segmentation, data cleaning, LDA subject extraction, vectorization, and vector-based business attribute similarity calculation. When the similarity between the text information and the corresponding business attribute label exceeds the threshold, the first The business attribute of each resource information of the resource information is the business attribute pointed to by the business attribute tag.
各业务属性的值可以通过以下预定的方式中的任一种获得:The value of each business attribute can be obtained in any of the following predetermined ways:
第一种方式为,获取所述用户对所述第一资源信息中的各资源信息操作时产生的显性行为特征及时间信息,基于所述显性行为特征及时间信息计算用户对相应的资源信息的偏好程度,以该偏好程度作为对应的业务属性的值,即通过用户的显性行为特征及时间因子计算用户对各业务属性的资源信息的偏好程度,作为各业务属性的值:对于某一业务属性的资源信息,用户执行相应的行为是与时间密切相关的。从行为信息中获取用户的显性行为特征(例如,点赞、收藏等),采用如下公式计算用户的偏好程度b:The first method is to obtain the explicit behavior characteristics and time information generated when the user operates on each resource information in the first resource information, and calculate the user's corresponding resource information based on the explicit behavior characteristics and time information. The preference degree of information, the preference degree is taken as the value of the corresponding business attribute, that is, the user’s preference degree for resource information of each business attribute is calculated through the user’s explicit behavior characteristics and time factor, as the value of each business attribute: The resource information of a business attribute, the user's execution of the corresponding behavior is closely related to time. Obtain the user’s explicit behavior characteristics (for example, likes, favorites, etc.) from the behavior information, and calculate the user’s preference b using the following formula:
Figure PCTCN2020105925-appb-000001
Figure PCTCN2020105925-appb-000001
其中,t为该用户对该资源信息执行显性行为特征距离当前的天数,α、β、c、t γ均为常量参数,α>0,β>0,c>0,α、β、c、t γ默认值分别为1、0.42、0.025、0.0025,当然,也可以依据该业务属性的数据变化,生成相应的α、β、c、t γ的值。由于用户可能在不同时间对同一资源信息进行浏览或操作,因此可按照用户、业务属性进行偏好程度汇总,例如,用户在某一时间段与同一资源信息产生交互行为,可以取该时间段的最大偏好程度作为b的值,最后,将各业务属性对应的偏好程度分别对应作为各业务属性b 1,b 2,…,b k的值。 Among them, t is the number of days since the user performed the dominant behavior feature on the resource information, α, β, c, t γ are constant parameters, α>0, β>0, c>0, α, β, c The default values of, t γ are 1, 0.42, 0.025, and 0.0025 respectively. Of course, the corresponding values of α, β, c, and t γ can also be generated according to the data changes of the business attribute. Since users may browse or operate the same resource information at different times, they can be summarized according to user and business attributes. For example, if a user interacts with the same resource information in a certain period of time, the maximum value of the period can be taken The preference degree is taken as the value of b, and finally, the preference degree corresponding to each business attribute is corresponding to the value of each business attribute b 1 , b 2 ,..., b k .
本实施例中提出融合时间因子计算用户的偏好程度,其通过时间因素的关联有利于提高资源推荐的准确性。In this embodiment, the fusion time factor is proposed to calculate the user's preference degree, and the correlation of the time factor is beneficial to improve the accuracy of resource recommendation.
第二种方式为基于所述基本信息对用户进行分组,通过群组内关联规则预测用户对各业务属性的资源信息的偏好程度,作为各业务属性的值:对用户进行分组,群组的定义可按照用户的数据规模判断,例如,如果用户的数据规模较小,则所有用户为整个群组,如果用户的数据规模较大,可以基于用户的基本信息,如按照地区或者行业进行分组,每个用户有对应的分组。在spark平台中,在每个组内利用关联规则(FP-Growth)算法对用户偏好的资源信息进行预测。具体地,构建各分组记为G={g 1,g 2,g 3,…,g n},其中n为组数。获取用户的显性行为特征,用户u i的显性行为特征v i记为{u i,v i},每个用户u i有对应的分组g 1,生成对应的关系R={r 1,r 2,…,r m}, i={g i,v i},其中m为用户数,从而构造每个分组对应的资源频繁项。基于资源频繁项生成各群组偏好资源列表及推荐分值,进而依据关系R、各群组偏好资源列表及推荐分值获得(用户,偏好资源对应的业务属性的推荐分值),以推荐分值作为对应的业务属性b 1,b 2,…,b k的值。 The second method is to group users based on the basic information, and predict the user's preference for resource information of each business attribute through the association rules in the group, as the value of each business attribute: group users, and define the group It can be judged according to the user’s data scale. For example, if the user’s data scale is small, all users belong to the entire group. If the user’s data scale is large, it can be based on the user’s basic information, such as grouping by region or industry. Each user has a corresponding group. In the spark platform, the association rule (FP-Growth) algorithm is used to predict the user's preferred resource information in each group. Specifically, constructing each group is denoted as G={g 1 , g 2 , g 3 ,..., g n }, where n is the number of groups. Obtaining explicit user behavior characteristics, sexual characteristics significantly user u i v i is denoted as {u i, v i}, each user has a corresponding packet u i g 1, generates a corresponding relation R = {r 1, r 2, ..., r m} , i = {g i, v i}, where m is the number of users, so that each packet corresponding resource configured frequent item. Generate each group's preference resource list and recommendation scores based on resource frequent items, and then obtain (recommendation scores of users, business attributes corresponding to the preference resources) based on the relationship R, each group's preference resource list and recommendation scores, and use recommendation scores The value is taken as the value of the corresponding business attributes b 1 , b 2 ,..., b k .
本实施例中提出的群组内关联规则,一方面有助于解决关联规则算法计算资源消耗过大的问题,一方面有助于增强用户的群组效应,有利于提高资源推荐的 准确性。The association rules within the group proposed in this embodiment help to solve the problem of excessive consumption of calculation resources of the association rule algorithm on the one hand, and on the other hand, it helps to enhance the group effect of users and improve the accuracy of resource recommendation.
第三种方式为融合上述的第一种方式及第二种方式,即将第一种方式中的业务属性的值b 1,b 2,…,b k与第二种方式中的业务属性的值b 1,b 2,…,b k进行对应后,进行加权求和,各权值可以预先确定。其中,第一种方式中各业务属性的值对应的权值均相同,例如均为0.55,第二种方式中各业务属性的值对应的权值均相同,例如均为0.45,加权求和后得到最终的业务属性b 1,b 2,…,b k的值。 The third way is to merge the above-mentioned first way and the second way, that is, the business attribute values b 1 , b 2 ,..., b k in the first way and the business attribute values in the second way After b 1 , b 2 ,..., b k are corresponding, a weighted sum is performed, and each weight can be determined in advance. Among them, the weights corresponding to the values of the business attributes in the first method are all the same, for example, 0.55, and the weights corresponding to the values of the business attributes in the second method are all the same, for example, 0.45. After weighted summation, Obtain the final business attributes b 1 , b 2 ,..., b k values.
上述的多维的向量(a 1,a 2,…,a j,b 1,b 2,…,b k)的各值确定后,即生成了用户显式向量。 After the values of the above-mentioned multi-dimensional vectors (a 1 , a 2 ,..., a j , b 1 , b 2 ,..., b k ) are determined, a user explicit vector is generated.
步骤S2,获取第二资源信息,获取与所述第二资源信息显性关联的用户的第二用户信息,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量;Step S2, acquiring second resource information, acquiring second user information of users explicitly associated with the second resource information, and generating based on the second resource information and second user information that have the same dimensions as the user explicit vector Explicit vector of resources;
其中,第二资源信息也为网络上的资源信息,包括具有相同或不同业务属性的各资源信息。第二用户信息也包括用户的基本信息及行为信息。Wherein, the second resource information is also resource information on the network, including various resource information with the same or different service attributes. The second user information also includes the user's basic information and behavior information.
其中,从资源的角度来看,用户在浏览或操作资源信息时的行为信息为显性行为特征,则该资源信息显性与该用户显性关联,获取该用户的用户信息,所有显性关联的用户的用户信息构成第二用户信息。Among them, from the perspective of resources, the behavior information of the user when browsing or operating resource information is an explicit behavior feature, then the resource information is explicitly associated with the user, and the user information of the user is obtained. All explicit associations The user information of the user constitutes the second user information.
其中,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量,具体包括:Wherein, based on the second resource information and second user information, generating an explicit vector of resources with the same dimension as the explicit vector of the user specifically includes:
预先定义多维的向量(A 1,A 2,…,A j,B 1,B 2,…,B k),其中,多维的向量以配置文件的形式支持可扩展、可配置操作,其维度与上述的用户显式向量的维度相同。B 1,B 2,…,B k为第二资源信息中的各资源信息的业务属性,通过以下方法确定:预先构建符合业务发展目标的业务属性标签结构,然后,对第二资源信息中的各资源信息的文本信息进行抽取,对文本信息后续处理可采用现有技术,即分词、数据清洗、LDA主题提取、向量化、基于向量进行业务属性相似计算,当文本信息与相应的业务属性标签的相似度超过阈值时,则该第二资源信息中的各资源信息的业务属性为该业务属性标签所指的业务属性,其值为与相应的业务属性标签的相似度。 Predefine multi-dimensional vectors (A 1 ,A 2 ,…,A j ,B 1 ,B 2 ,…,B k ), where the multi-dimensional vectors support scalable and configurable operations in the form of configuration files. The dimensions of the aforementioned user explicit vectors are the same. B 1 , B 2 ,..., B k are the business attributes of each resource information in the second resource information, which are determined by the following method: pre-build a business attribute label structure that meets the business development goals, and then compare the business attributes in the second resource information The text information of each resource information is extracted, and the subsequent processing of the text information can use the existing technology, namely word segmentation, data cleaning, LDA subject extraction, vectorization, vector-based business attribute similarity calculation, when the text information and the corresponding business attribute label When the similarity of, exceeds the threshold, the business attribute of each resource information in the second resource information is the business attribute pointed to by the business attribute tag, and its value is the similarity with the corresponding business attribute tag.
A 1,A 2,…,A j为与第二资源信息显性关联的用户的各基础信息。第二资源信息显性关联的用户可能具有多种属性标签(例如男性、高消费群体、研发工程师等),这就使得用户的基础信息比较分散。一种业务属性的资源信息可能会被不同用户浏览及操作,但该种业务属性的资源信息实际上可能并不适用于所有的这些用户,故本实施例通过聚类的方式选取对应属性标签的用户,并基于这些属性标签的用户的基础信息得到A 1,A 2,…,A j的值。 A 1 , A 2 ,..., A j are basic information of users that are explicitly associated with the second resource information. The users who are explicitly associated with the second resource information may have multiple attribute tags (for example, male, high-consumption group, R&D engineer, etc.), which makes the user's basic information relatively scattered. The resource information of one type of business attribute may be browsed and operated by different users, but the resource information of this type of business attribute may not actually be applicable to all these users. Therefore, this embodiment selects the corresponding attribute tag by clustering. Users, and get the values of A 1 , A 2 ,..., A j based on the user's basic information of these attribute tags.
本实施例聚类可采用kmeans算法:对用户进行分组(可以基于用户的基本信息,如按照地区或者行业进行分组),基于对用户组分析的可以获得预设的中心点数量,设为k,对有历史行为信息记录的用户涉及的基础信息聚类,可获取基础信息的聚类中心,得到用户基础信息与聚类中心的关系[基础信息列表,聚类中心]。聚类之后,通过用户历史行为信息可获取关系[资源信息,基础信息列表],将这两种关系进行融合,可获取关系[资源信息,聚类中心],由于一个资源信息会对应多个聚类中心,因此依据属性类别用户总量取N个聚类中心的值, 可默认取前3个聚类中心的值,则最终按照用户量占比作为权重进行加权求和,通过加权求和结果得到最终的A 1,A 2,…,A j的值。 The clustering in this embodiment can use the kmeans algorithm: group users (which can be based on basic user information, such as grouping by region or industry), and based on the analysis of user groups, a preset number of center points can be obtained, set to k, Clustering the basic information related to users with historical behavior information records can obtain the clustering center of the basic information, and obtain the relationship between the user's basic information and the clustering center [basic information list, clustering center]. After clustering, the relationship [resource information, basic information list] can be obtained through the user's historical behavior information, and the two relationships can be merged to obtain the relationship [resource information, clustering center]. Because one resource information corresponds to multiple clusters Therefore, the value of N cluster centers is selected according to the total number of users of the attribute category, and the values of the first 3 cluster centers can be defaulted. Then the weighted sum is finally performed according to the proportion of users as the weight, and the result of the weighted sum is Get the final values of A 1 , A 2 ,..., A j.
本实施例对用户显式特征与资源显式特征进行向量化,得到用户显式向量与资源显式向量,避免了特征组合及相关大量的特征预处理的过程,能够降低计算的复杂度。This embodiment vectorizes user explicit features and resource explicit features to obtain user explicit vectors and resource explicit vectors, which avoids the process of feature combination and related preprocessing of a large number of features, and can reduce the complexity of calculation.
步骤S3,获取所述用户的隐性行为特征,获取与所述隐性行为特征关联的第三用户信息、第三资源信息,基于所述第三用户信息、第三资源信息构建三元组关系矩阵,利用预定的算法对所述三元组关系矩阵进行分解计算,得到所述用户的用户隐式向量、资源隐式向量;Step S3: Obtain the hidden behavior feature of the user, obtain third user information and third resource information associated with the hidden behavior feature, and construct a triple relationship based on the third user information and third resource information Matrix, using a predetermined algorithm to decompose and calculate the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
本实施例中,可从日志中获取用户的隐性行为特征,获取与隐性行为特征关联的第三用户信息、第三资源信息,其中,第三用户信息也包括用户的基本信息,第三资源信息也为网络上的资源信息,包括具有相同或不同业务属性的各资源信息。基于第三用户信息、第三资源信息构建三元组关系矩阵R[user,product,rating],在该三元组关系矩阵中,包含m个user和n个product,user表示用户,product表示资源,rating表示评分(即偏好程度),本实施例将隐性行为特征对应的rating评分统一定义为1。In this embodiment, the user's hidden behavior characteristics can be obtained from the log, and third user information and third resource information associated with the hidden behavior characteristics can be obtained, where the third user information also includes the user's basic information, and the third user information Resource information is also resource information on the network, including resource information with the same or different business attributes. Construct a triple relationship matrix R[user, product, rating] based on the third user information and the third resource information. In the triple relationship matrix, there are m users and n products. User represents users and product represents resources. , Rating represents the rating (that is, the degree of preference). In this embodiment, the rating corresponding to the implicit behavior feature is uniformly defined as 1.
在实际应用中,由于n和m的数量都十分巨大,因此三元组关系矩阵R的规模很大。这时,传统的矩阵分解方法对于这么大的数据量难以处理;再者一个用户也不可能给所有资源product进行评分,因此,三元组关系矩阵R是个稀疏矩阵,有很多的缺失项。In practical applications, since the numbers of n and m are both very large, the scale of the triple relation matrix R is very large. At this time, the traditional matrix decomposition method is difficult to handle such a large amount of data; furthermore, it is impossible for a user to rate all resource products. Therefore, the triple relation matrix R is a sparse matrix with many missing items.
本实施例基于spark平台,利用预定的算法(交替最小二乘ALS)计算用户隐式向量及资源隐式向量,从而得到用户隐式向量及资源隐式向量。其中,由于三元组关系矩阵R为m*n的矩阵,可以看做由m*k和k*n两个矩阵相乘得到的,其中k<<m、n,k的典型取值一般是20~200,因此得到下式:This embodiment is based on the spark platform and uses a predetermined algorithm (alternating least squares ALS) to calculate the user implicit vector and the resource implicit vector, thereby obtaining the user implicit vector and the resource implicit vector. Among them, since the triple relation matrix R is a matrix of m*n, it can be regarded as the multiplication of two matrices m*k and k*n, where k<<m, n, and the typical value of k is generally 20~200, so the following formula is obtained:
R m*n=u m*k×p k*nR m*n = u m*k ×p k*n ;
上面的公式中,u m*k表示用户对隐性行为特征的偏好程度,p k*n表示资源包含隐性行为特征的程度,通过上式可计算出u m*k、p k*n,将计算得到的u m*k作为用户隐式向量,将p k*n作为资源隐式向量。 In the above formula, um*k represents the user’s preference for hidden behavior features, and p k*n represents the degree to which the resource contains hidden behavior features. U m*k and p k*n can be calculated by the above formula, The calculated u m*k is used as the user implicit vector, and p k*n is used as the resource implicit vector.
步骤S4,计算所述用户的用户显式向量与对应的资源显式向量的第一相似度,计算所述用户隐式向量与对应的资源隐式向量的第二相似度;Step S4: Calculate the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculate the second similarity between the user implicit vector and the corresponding resource implicit vector;
由于直接计算用户显式向量与对应的资源显式向量之间的相似度存在计算量大且计算复杂度高的问题,因此,采用局部敏感哈希算法(Locality-Sensitive Hashing,LSH)计算相似度。首先,将用户显式向量与对应的资源显式向量进行hash映射处理,指定特征列及唯一标识列,将特征作为算法的输入,指定算法输出列,然后,对于映射后的向量采用近似相似度连接的方法计算向量间的欧式距离作为相似度值:Since the direct calculation of the similarity between the user explicit vector and the corresponding resource explicit vector has the problem of large calculation and high computational complexity, the locality-sensitive hashing algorithm (Locality-Sensitive Hashing, LSH) is used to calculate the similarity. . First, perform hash mapping between the user explicit vector and the corresponding resource explicit vector, specify the feature column and the unique identification column, use the feature as the input of the algorithm, specify the output column of the algorithm, and then use the approximate similarity for the mapped vector The method of connection calculates the Euclidean distance between vectors as the similarity value:
计算用户显式向量与资源显式向量之间的欧式距离,在实现过程中采用上述描述的局部敏感哈希算法LSH获取第一相似度sim explict_initCalculate the Euclidean distance between the user's explicit vector and the resource's explicit vector, and use the above-described local sensitive hash algorithm LSH to obtain the first similarity sim explict_init in the implementation process;
计算用户隐式向量与资源隐式向量之间的欧式距离,在实现过程中采用上述描述局部敏感哈希算法LSH获取第二相似度sim impiict_initCalculate the Euclidean distance between the user's implicit vector and the resource's implicit vector, and use the above-described local sensitive hash algorithm LSH to obtain the second similarity sim impiict_init in the implementation process.
步骤S5,对所述第一相似度及第二相似度进行加权求和,基于加权求和的 结果选取资源并向所述用户进行推荐。Step S5: Perform a weighted summation on the first similarity and the second similarity, select resources based on the result of the weighted summation, and recommend to the user.
分别将第一相似度
Figure PCTCN2020105925-appb-000002
第二相似度
Figure PCTCN2020105925-appb-000003
归一化到(0,1)区间,得到sim explict、sim implict,对两组相似度值进行加权求和得到总相似度:
First degree of similarity
Figure PCTCN2020105925-appb-000002
Second similarity
Figure PCTCN2020105925-appb-000003
Normalize to the (0,1) interval to obtain sim explict and sim implict , and perform a weighted summation on the two sets of similarity values to obtain the total similarity:
Sim=α*sim explict+β*sim explictSim=α*sim explict +β*sim explict ,
其中,权值α、β主要采用两种方式进行设定,一种方式为专家评分,设定α、β的固定值。另一种方式为通过线性回归进行确定,通过对用户随机采样的方式,将采样用户作为体验官,让其对提供的资源进行相似度打分,将打分结果作为训练数据,生成α、β的值。Among them, the weights α and β are mainly set in two ways. One way is expert scoring, which sets fixed values of α and β. The other method is to determine through linear regression. By randomly sampling users, the sampled users are used as experience officers to score the similarity of the provided resources, and the scoring results are used as training data to generate the values of α and β. .
最后,依据加权求和得到的总相似度Sim、资源信息的上架时间及热度,选取topN个资源信息计算各资源信息的优先级,优先级排序采用
Figure PCTCN2020105925-appb-000004
view表示热度(默认取前一天热度),age表示资源信息的上架时间距离当前的天数,常量参数i、j均默认取1。最后,可按照排序后的该topN个资源信息向用户推送。
Finally, according to the total similarity Sim obtained by the weighted summation, the shelf time and popularity of the resource information, the topN resource information is selected to calculate the priority of each resource information, and the priority sorting is adopted
Figure PCTCN2020105925-appb-000004
View represents the heat (the previous day is the default), age represents the number of days from the current shelf time of the resource information, and the constant parameters i and j are all set to 1 by default. Finally, the sorted topN resource information can be pushed to the user.
通过上述的描述可以得出,本实施例在用户显性特征及资源显性特征的基础上,融合了用户隐性特征及资源隐性特征,对现有的推荐算法进行修正,能够提高资源推荐准确度,同时增强系统的可解释性。From the above description, it can be concluded that this embodiment combines the user's recessive features and the resource's recessive features on the basis of the user's dominant feature and the resource's dominant feature, and the existing recommendation algorithm can be modified to improve resource recommendation. Accuracy, while enhancing the interpretability of the system.
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读存储介质可以是硬盘、多媒体卡、SD卡、闪存卡、SMC、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器等等中的任意一种或者几种的任意组合。所述计算机可读存储介质中包括计算机程序,该计算机程序被处理器执行时实现的功能,请参照上述关于图3的介绍,在此不再赘述。In addition, the embodiment of the present application also proposes a computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium may be a hard disk, a multimedia card, SD card, flash memory card, SMC, read only memory (ROM), erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, etc. or Any combination of several. The computer-readable storage medium includes a computer program. For the functions that the computer program implements when executed by the processor, please refer to the above introduction with respect to FIG. 3, which will not be repeated here.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the superiority of the embodiments.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, device, article or method including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, device, article, or method.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to enable a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the method described in each embodiment of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种资源推荐的方法,其中,所述资源推荐的方法包括:A method for recommending resources, wherein the method for recommending resources includes:
    获取用户的第一用户信息,获取与所述用户显性关联的第一资源信息,基于所述第一用户信息、第一资源信息生成用户显式向量;Acquiring first user information of a user, acquiring first resource information explicitly associated with the user, and generating a user explicit vector based on the first user information and the first resource information;
    获取第二资源信息,获取与所述第二资源信息显性关联的用户的第二用户信息,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量;Obtain second resource information, obtain second user information of users explicitly associated with the second resource information, and generate a resource display with the same dimensions as the user explicit vector based on the second resource information and second user information Formula vector
    获取用户的隐性行为特征,获取与所述隐性行为特征关联的第三用户信息、第三资源信息,基于所述第三用户信息、第三资源信息构建三元组关系矩阵,利用预定的算法对所述三元组关系矩阵进行分解计算,得到所述用户的用户隐式向量、资源隐式向量;Obtain the user's hidden behavior characteristics, obtain the third user information and third resource information associated with the hidden behavior characteristics, construct a triple relationship matrix based on the third user information and third resource information, and use the predetermined The algorithm decomposes and calculates the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
    计算所述用户的用户显式向量与对应的资源显式向量的第一相似度,计算所述用户隐式向量与对应的资源隐式向量的第二相似度;Calculating the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculating the second similarity between the user implicit vector and the corresponding resource implicit vector;
    对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐。A weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and recommended to the user.
  2. 根据权利要求1所述的资源推荐的方法,其中,所述第一用户信息包括用户的基本信息及行为信息,所述第一资源信息中包括具有相同或不同业务属性的各资源信息,所述基于所述第一用户信息、第一资源信息生成用户显式向量的步骤,具体包括:The method for resource recommendation according to claim 1, wherein the first user information includes basic user information and behavior information, and the first resource information includes resource information with the same or different service attributes, and the The step of generating a user explicit vector based on the first user information and the first resource information specifically includes:
    获取预先定义的多维向量,所述多维向量包括基础信息及业务属性;Obtaining a pre-defined multi-dimensional vector, the multi-dimensional vector including basic information and business attributes;
    基于所述基本信息为所述多维向量中的基础信息赋值,基于所述第一资源信息中的各资源信息、基本信息及行为信息为所述多维向量中的业务属性赋值,以赋值后的多维向量作为所述用户显式向量。Assign values to the basic information in the multi-dimensional vector based on the basic information, and assign values to the business attributes in the multi-dimensional vector based on the resource information, basic information, and behavior information in the first resource information, so that the assigned multi-dimensional vector The vector is used as the user explicit vector.
  3. 根据权利要求2所述的资源推荐的方法,其中,所述行为信息包括显性行为特征及隐性行为特征,所述基于所述第一资源信息中的各资源信息、基本信息及行为信息为所述多维向量中的业务属性赋值的步骤,具体包括:The method for resource recommendation according to claim 2, wherein the behavior information includes explicit behavior characteristics and implicit behavior characteristics, and the resource information, basic information, and behavior information based on the first resource information are The steps of assigning business attributes in the multidimensional vector specifically include:
    获取所述用户对所述第一资源信息中的各资源信息操作时产生的显性行为特征及时间信息,基于所述显性行为特征及时间信息计算所述用户对相应的资源信息的偏好程度,以该偏好程度作为对应的业务属性的值;或者Obtain explicit behavior characteristics and time information generated when the user operates on each resource information in the first resource information, and calculate the user's preference for corresponding resource information based on the explicit behavior characteristics and time information , Use the preference degree as the value of the corresponding business attribute; or
    基于所述基本信息对用户进行分组,通过群组内关联规则预测每个分组内用户对所述第一资源信息中的各资源信息的偏好程度,以该偏好程度作为对应的业务属性的值。The users are grouped based on the basic information, and the user's preference degree for each resource information in the first resource information is predicted through the association rules in the group, and the preference degree is used as the value of the corresponding business attribute.
  4. 根据权利要求1至3任一项所述的资源推荐的方法,其中,所述对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐的步骤,具体包括:The method for resource recommendation according to any one of claims 1 to 3, wherein the weighted summation is performed on the first similarity and the second similarity, and resource information is selected based on the result of the weighted summation and sent to all Describes the steps for the user to make recommendations, including:
    分别将所述第一相似度及第二相似度归一化,获取预定的权值,基于归一化后的第一相似度、归一化后的第二相似度及权值进行加权求和,得到总相似度;Normalize the first similarity and the second similarity respectively to obtain a predetermined weight, and perform a weighted summation based on the normalized first similarity, the normalized second similarity and the weight , Get the total similarity;
    获取各资源信息的上架时间及热度,基于所述总相似度、各资源信息的上架时间及热度选取多个资源信息并向所述用户进行推荐。The shelf time and popularity of each resource information are acquired, and multiple resource information is selected based on the total similarity, the shelf time and popularity of each resource information, and recommended to the user.
  5. 根据权利要求4所述的资源推荐的方法,其中,所述基于所述总相似度、各资源信息的上架时间及热度选取多个资源信息并向所述用户进行推荐的步骤,具体包括:The method for resource recommendation according to claim 4, wherein the step of selecting a plurality of resource information based on the total similarity, the shelf time and popularity of each resource information and recommending it to the user specifically includes:
    基于所述总相似度、各资源信息的上架时间及热度计算各资源信息的优先级, 根据所述各资源信息的优先级选取多个资源信息并向所述用户进行推荐。The priority of each resource information is calculated based on the total similarity, the shelf time and popularity of each resource information, and multiple resource information is selected according to the priority of each resource information and recommended to the user.
  6. 根据权利要求2所述的资源推荐的方法,其中,所述业务属性的确定,具体包括如下步骤:The method for resource recommendation according to claim 2, wherein the determination of the service attribute specifically includes the following steps:
    预先构建符合业务发展目标的业务属性标签结构;Pre-build a business attribute label structure that meets business development goals;
    对所述第一资源信息的各资源信息的文本信息进行抽取,并将抽取出的文本信息与业务属性标签进行相似计算,当所述文本信息与相应的业务属性标签的相似度超过阈值时,则该第一资源信息的各资源信息的业务属性为该业务属性标签所指的业务属性。Extract the text information of each resource information of the first resource information, and calculate similarity between the extracted text information and the business attribute label. When the similarity between the text information and the corresponding business attribute label exceeds the threshold, Then, the business attribute of each resource information of the first resource information is the business attribute pointed to by the business attribute tag.
  7. 根据权利要求4所述的资源推荐的方法,其中,所述第一相似度和所述第二相似度通过局部敏感哈希算法计算,其具体计算方法包括如下步骤:The method for resource recommendation according to claim 4, wherein the first degree of similarity and the second degree of similarity are calculated by a local sensitive hash algorithm, and the specific calculation method includes the following steps:
    将所述用户显式向量与对应的所述资源显式向量进行哈希映射处理,指定特征列及唯一标识列,将特征作为算法的输入,指定算法输出列,对映射处理后的所述用户显式向量和所述资源显式向量采用近似相似度连接的方法计算出两者之间的欧式距离作为所述第一相似度;Perform a hash mapping process on the user explicit vector and the corresponding resource explicit vector, specify the feature column and the unique identification column, use the feature as the input of the algorithm, specify the algorithm output column, and perform the mapping process on the user after the mapping process. The explicit vector and the resource explicit vector adopt an approximate similarity connection method to calculate the Euclidean distance between the two as the first similarity;
    将所述用户隐式向量与对应的所述资源隐式向量进行哈希映射处理,指定特征列及唯一标识列,将特征作为算法的输入,指定算法输出列,对映射处理后的所述用户隐式向量和所述资源隐式向量采用近似相似度连接的方法计算出两者之间的欧式距离作为所述第二相似度。Perform hash mapping processing on the user implicit vector and the corresponding resource implicit vector, specify the feature column and the unique identification column, use the feature as the algorithm input, specify the algorithm output column, and perform the mapping process on the user after the mapping process. The implicit vector and the resource implicit vector adopt an approximate similarity connection method to calculate the Euclidean distance between the two as the second similarity.
  8. 一种资源推荐的装置,其中,所述资源推荐的装置包括:A device for recommending resources, wherein the device for recommending resources includes:
    第一生成模块,用于获取用户的第一用户信息,获取与所述用户显性关联的第一资源信息,基于所述第一用户信息、第一资源信息生成用户显式向量;A first generation module, configured to obtain first user information of a user, obtain first resource information explicitly associated with the user, and generate a user explicit vector based on the first user information and first resource information;
    第二生成模块,用于获取第二资源信息,获取与所述第二资源信息显性关联的用户的第二用户信息,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量;The second generation module is configured to obtain second resource information, obtain second user information of users who are explicitly associated with the second resource information, and generate and display the second user information based on the second resource information and the second user information. Explicit vector of resources with the same dimension of formula vector;
    分解模块,用于获取用户的隐性行为特征,获取与所述隐性行为特征关联的第三用户信息、第三资源信息,基于所述第三用户信息、第三资源信息构建三元组关系矩阵,利用预定的算法对所述三元组关系矩阵进行分解计算,得到所述用户的用户隐式向量、资源隐式向量;The decomposition module is used to obtain the hidden behavior characteristics of users, obtain the third user information and third resource information associated with the hidden behavior characteristics, and construct a triple relationship based on the third user information and third resource information Matrix, using a predetermined algorithm to decompose and calculate the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
    计算模块,用于计算所述用户的用户显式向量与对应的资源显式向量的第一相似度,计算所述用户隐式向量与对应的资源隐式向量的第二相似度;A calculation module, configured to calculate the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculate the second similarity between the user implicit vector and the corresponding resource implicit vector;
    推荐模块,用于对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐。The recommendation module is configured to perform a weighted summation on the first similarity and the second similarity, select resource information based on the result of the weighted summation, and recommend to the user.
  9. 一种电子设备,包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的计算机程序,其中,所述计算机程序被所述处理器执行时实现如下步骤:An electronic device comprising a memory and a processor connected to the memory, and a computer program that can be run on the processor is stored in the memory, wherein the computer program is executed by the processor as follows step:
    获取用户的第一用户信息,获取与所述用户显性关联的第一资源信息,基于所述第一用户信息、第一资源信息生成用户显式向量;Acquiring first user information of a user, acquiring first resource information explicitly associated with the user, and generating a user explicit vector based on the first user information and the first resource information;
    获取第二资源信息,获取与所述第二资源信息显性关联的用户的第二用户信息,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量;Obtain second resource information, obtain second user information of users explicitly associated with the second resource information, and generate a resource display with the same dimensions as the user explicit vector based on the second resource information and second user information Formula vector
    获取用户的隐性行为特征,获取与所述隐性行为特征关联的第三用户信息、第三资源信息,基于所述第三用户信息、第三资源信息构建三元组关系矩阵,利用预定的算法对所述三元组关系矩阵进行分解计算,得到所述用户的用户隐式向量、资源隐式向量;Obtain the user's hidden behavior characteristics, obtain the third user information and third resource information associated with the hidden behavior characteristics, construct a triple relationship matrix based on the third user information and third resource information, and use the predetermined The algorithm decomposes and calculates the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
    计算所述用户的用户显式向量与对应的资源显式向量的第一相似度,计算所述用户隐式向量与对应的资源隐式向量的第二相似度;Calculating the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculating the second similarity between the user implicit vector and the corresponding resource implicit vector;
    对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐。A weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and recommended to the user.
  10. 根据权利要求9所述的电子设备,其中,所述第一用户信息包括用户的基本信息及行为信息,所述第一资源信息中包括具有相同或不同业务属性的各资源信息,所述基于所述第一用户信息、第一资源信息生成用户显式向量的步骤,具体包括:The electronic device according to claim 9, wherein the first user information includes basic information and behavior information of the user, the first resource information includes resource information with the same or different service attributes, and the first resource information is based on all The steps of generating a user explicit vector with the first user information and the first resource information specifically include:
    获取预先定义的多维向量,所述多维向量包括基础信息及业务属性;Obtaining a pre-defined multi-dimensional vector, the multi-dimensional vector including basic information and business attributes;
    基于所述基本信息为所述多维向量中的基础信息赋值,基于所述第一资源信息中的各资源信息、基本信息及行为信息为所述多维向量中的业务属性赋值,以赋值后的多维向量作为所述用户显式向量。Assign values to the basic information in the multi-dimensional vector based on the basic information, and assign values to the business attributes in the multi-dimensional vector based on the resource information, basic information, and behavior information in the first resource information, so that the assigned multi-dimensional vector The vector is used as the user explicit vector.
  11. 根据权利要求10所述的电子设备,其中,所述行为信息包括显性行为特征及隐性行为特征,所述基于所述第一资源信息中的各资源信息、基本信息及行为信息为所述多维向量中的业务属性赋值的步骤,具体包括:The electronic device according to claim 10, wherein the behavior information includes explicit behavior characteristics and implicit behavior characteristics, and the resource information, basic information, and behavior information based on the first resource information are the The steps of business attribute assignment in the multidimensional vector include:
    获取所述用户对所述第一资源信息中的各资源信息操作时产生的显性行为特征及时间信息,基于所述显性行为特征及时间信息计算所述用户对相应的资源信息的偏好程度,以该偏好程度作为对应的业务属性的值;或者Obtain explicit behavior characteristics and time information generated when the user operates on each resource information in the first resource information, and calculate the user's preference for corresponding resource information based on the explicit behavior characteristics and time information , Use the preference degree as the value of the corresponding business attribute; or
    基于所述基本信息对用户进行分组,通过群组内关联规则预测每个分组内用户对所述第一资源信息中的各资源信息的偏好程度,以该偏好程度作为对应的业务属性的值。The users are grouped based on the basic information, and the user's preference degree for each resource information in the first resource information is predicted through the association rules in the group, and the preference degree is used as the value of the corresponding business attribute.
  12. 根据权利要求9至11任一项所述的电子设备,其中,所述对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐的步骤,具体包括:The electronic device according to any one of claims 9 to 11, wherein the weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and reported to the user The recommended steps include:
    分别将所述第一相似度及第二相似度归一化,获取预定的权值,基于归一化后的第一相似度、归一化后的第二相似度及权值进行加权求和,得到总相似度;Normalize the first similarity and the second similarity respectively to obtain a predetermined weight, and perform a weighted summation based on the normalized first similarity, the normalized second similarity and the weight , Get the total similarity;
    获取各资源信息的上架时间及热度,基于所述总相似度、各资源信息的上架时间及热度选取多个资源信息并向所述用户进行推荐。The shelf time and popularity of each resource information are acquired, and multiple resource information is selected based on the total similarity, the shelf time and popularity of each resource information, and recommended to the user.
  13. 根据权利要求12所述的电子设备,其中,所述基于所述总相似度、各资源信息的上架时间及热度选取多个资源信息并向所述用户进行推荐的步骤,具体包括:The electronic device according to claim 12, wherein the step of selecting a plurality of resource information based on the total similarity, the shelf time and popularity of each resource information and recommending it to the user specifically comprises:
    基于所述总相似度、各资源信息的上架时间及热度计算各资源信息的优先级,根据所述各资源信息的优先级选取多个资源信息并向所述用户进行推荐。The priority of each resource information is calculated based on the total similarity, the shelf time and popularity of each resource information, and multiple resource information is selected according to the priority of each resource information and recommended to the user.
  14. 根据权利要求10所述的电子设备,其中,所述业务属性的确定,具体包括如下步骤:The electronic device according to claim 10, wherein the determination of the business attribute specifically includes the following steps:
    预先构建符合业务发展目标的业务属性标签结构;Pre-build a business attribute label structure that meets business development goals;
    对所述第一资源信息的各资源信息的文本信息进行抽取,并将抽取出的文本信息与业务属性标签进行相似计算,当所述文本信息与相应的业务属性标签的相似度超过阈值时,则该第一资源信息的各资源信息的业务属性为该业务属性标签所指的业务属性。Extract the text information of each resource information of the first resource information, and calculate similarity between the extracted text information and the business attribute label. When the similarity between the text information and the corresponding business attribute label exceeds the threshold, Then, the business attribute of each resource information of the first resource information is the business attribute pointed to by the business attribute tag.
  15. 根据权利要求12所述的电子设备,其中,所述第一相似度和所述第二相似度通过局部敏感哈希算法计算,其具体计算方法包括如下步骤:The electronic device according to claim 12, wherein the first degree of similarity and the second degree of similarity are calculated by a local sensitive hash algorithm, and the specific calculation method thereof includes the following steps:
    将所述用户显式向量与对应的所述资源显式向量进行哈希映射处理,指定特征列及唯一标识列,将特征作为算法的输入,指定算法输出列,对映射处理后的 所述用户显式向量和所述资源显式向量采用近似相似度连接的方法计算出两者之间的欧式距离作为所述第一相似度;Perform a hash mapping process on the user explicit vector and the corresponding resource explicit vector, specify the feature column and the unique identification column, use the feature as the input of the algorithm, specify the algorithm output column, and perform the mapping process on the user after the mapping process. The explicit vector and the resource explicit vector adopt an approximate similarity connection method to calculate the Euclidean distance between the two as the first similarity;
    将所述用户隐式向量与对应的所述资源隐式向量进行哈希映射处理,指定特征列及唯一标识列,将特征作为算法的输入,指定算法输出列,对映射处理后的所述用户隐式向量和所述资源隐式向量采用近似相似度连接的方法计算出两者之间的欧式距离作为所述第二相似度。Perform hash mapping processing on the user implicit vector and the corresponding resource implicit vector, specify the feature column and the unique identification column, use the feature as the algorithm input, specify the algorithm output column, and perform the mapping process on the user after the mapping process. The implicit vector and the resource implicit vector adopt an approximate similarity connection method to calculate the Euclidean distance between the two as the second similarity.
  16. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如下步骤:A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the following steps are implemented:
    获取用户的第一用户信息,获取与所述用户显性关联的第一资源信息,基于所述第一用户信息、第一资源信息生成用户显式向量;Acquiring first user information of a user, acquiring first resource information explicitly associated with the user, and generating a user explicit vector based on the first user information and the first resource information;
    获取第二资源信息,获取与所述第二资源信息显性关联的用户的第二用户信息,基于所述第二资源信息、第二用户信息生成与所述用户显式向量维度相同的资源显式向量;Obtain second resource information, obtain second user information of users explicitly associated with the second resource information, and generate a resource display with the same dimensions as the user explicit vector based on the second resource information and second user information Formula vector
    获取用户的隐性行为特征,获取与所述隐性行为特征关联的第三用户信息、第三资源信息,基于所述第三用户信息、第三资源信息构建三元组关系矩阵,利用预定的算法对所述三元组关系矩阵进行分解计算,得到所述用户的用户隐式向量、资源隐式向量;Obtain the user's hidden behavior characteristics, obtain the third user information and third resource information associated with the hidden behavior characteristics, construct a triple relationship matrix based on the third user information and third resource information, and use the predetermined The algorithm decomposes and calculates the triple relationship matrix to obtain the user implicit vector and resource implicit vector of the user;
    计算所述用户的用户显式向量与对应的资源显式向量的第一相似度,计算所述用户隐式向量与对应的资源隐式向量的第二相似度;Calculating the first similarity between the user explicit vector of the user and the corresponding resource explicit vector, and calculating the second similarity between the user implicit vector and the corresponding resource implicit vector;
    对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐。A weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and recommended to the user.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述第一用户信息包括用户的基本信息及行为信息,所述第一资源信息中包括具有相同或不同业务属性的各资源信息,所述基于所述第一用户信息、第一资源信息生成用户显式向量的步骤,具体包括:The computer-readable storage medium according to claim 16, wherein the first user information includes basic user information and behavior information, and the first resource information includes resource information with the same or different service attributes, and The step of generating a user explicit vector based on the first user information and first resource information specifically includes:
    获取预先定义的多维向量,所述多维向量包括基础信息及业务属性;Obtaining a pre-defined multi-dimensional vector, the multi-dimensional vector including basic information and business attributes;
    基于所述基本信息为所述多维向量中的基础信息赋值,基于所述第一资源信息中的各资源信息、基本信息及行为信息为所述多维向量中的业务属性赋值,以赋值后的多维向量作为所述用户显式向量。Assign values to the basic information in the multi-dimensional vector based on the basic information, and assign values to the business attributes in the multi-dimensional vector based on the resource information, basic information, and behavior information in the first resource information, so that the assigned multi-dimensional vector The vector is used as the user explicit vector.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述行为信息包括显性行为特征及隐性行为特征,所述基于所述第一资源信息中的各资源信息、基本信息及行为信息为所述多维向量中的业务属性赋值的步骤,具体包括:The computer-readable storage medium according to claim 17, wherein the behavior information includes explicit behavior characteristics and implicit behavior characteristics, and the information is based on each resource information, basic information, and behavior information in the first resource information. The steps of assigning values to the business attributes in the multidimensional vector specifically include:
    获取所述用户对所述第一资源信息中的各资源信息操作时产生的显性行为特征及时间信息,基于所述显性行为特征及时间信息计算所述用户对相应的资源信息的偏好程度,以该偏好程度作为对应的业务属性的值;或者Obtain explicit behavior characteristics and time information generated when the user operates on each resource information in the first resource information, and calculate the user's preference for corresponding resource information based on the explicit behavior characteristics and time information , Use the preference degree as the value of the corresponding business attribute; or
    基于所述基本信息对用户进行分组,通过群组内关联规则预测每个分组内用户对所述第一资源信息中的各资源信息的偏好程度,以该偏好程度作为对应的业务属性的值。The users are grouped based on the basic information, and the user's preference degree for each resource information in the first resource information is predicted through the association rules in the group, and the preference degree is used as the value of the corresponding business attribute.
  19. 根据权利要求16至18任一项所述的计算机可读存储介质,其中,所述对所述第一相似度及第二相似度进行加权求和,基于加权求和的结果选取资源信息并向所述用户进行推荐的步骤,具体包括:The computer-readable storage medium according to any one of claims 16 to 18, wherein the weighted summation is performed on the first similarity degree and the second similarity degree, and resource information is selected based on the result of the weighted summation and sent to The steps for the user to recommend specifically include:
    分别将所述第一相似度及第二相似度归一化,获取预定的权值,基于归一化后的第一相似度、归一化后的第二相似度及权值进行加权求和,得到总相似度;Normalize the first similarity and the second similarity respectively to obtain a predetermined weight, and perform a weighted summation based on the normalized first similarity, the normalized second similarity and the weight , Get the total similarity;
    获取各资源信息的上架时间及热度,基于所述总相似度、各资源信息的上架 时间及热度选取多个资源信息并向所述用户进行推荐。Acquire the shelf time and popularity of each resource information, select multiple resource information based on the total similarity, the shelf time and popularity of each resource information, and recommend them to the user.
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述基于所述总相似度、各资源信息的上架时间及热度选取多个资源信息并向所述用户进行推荐的步骤,具体包括:The computer-readable storage medium according to claim 19, wherein the step of selecting a plurality of resource information based on the total similarity, the shelf time and popularity of each resource information and recommending to the user specifically comprises:
    基于所述总相似度、各资源信息的上架时间及热度计算各资源信息的优先级,根据所述各资源信息的优先级选取多个资源信息并向所述用户进行推荐。The priority of each resource information is calculated based on the total similarity, the shelf time and popularity of each resource information, and multiple resource information is selected according to the priority of each resource information and recommended to the user.
PCT/CN2020/105925 2019-10-12 2020-07-30 Resource recommendation method and apparatus, electronic device and storage medium WO2021068610A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910970985.3A CN110866181B (en) 2019-10-12 2019-10-12 Resource recommendation method, device and storage medium
CN201910970985.3 2019-10-12

Publications (1)

Publication Number Publication Date
WO2021068610A1 true WO2021068610A1 (en) 2021-04-15

Family

ID=69652645

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/105925 WO2021068610A1 (en) 2019-10-12 2020-07-30 Resource recommendation method and apparatus, electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN110866181B (en)
WO (1) WO2021068610A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095583A (en) * 2021-04-23 2021-07-09 何桂霞 Data analysis method applied to business management and business management server
CN113220999A (en) * 2021-05-14 2021-08-06 北京百度网讯科技有限公司 User feature generation method and device, electronic equipment and storage medium
CN113343086A (en) * 2021-06-01 2021-09-03 合肥工业大学 Data-driven green design knowledge pushing method
CN113538108A (en) * 2021-07-27 2021-10-22 北京沃东天骏信息技术有限公司 Resource information determination method and device, electronic equipment and storage medium
CN113590935A (en) * 2021-06-30 2021-11-02 深圳市东信时代信息技术有限公司 Information recommendation method and device, computer equipment and storage medium
CN114039744A (en) * 2021-09-29 2022-02-11 中孚信息股份有限公司 Abnormal behavior prediction method and system based on user characteristic label
CN115422438A (en) * 2022-07-21 2022-12-02 中国铁道科学研究院集团有限公司电子计算技术研究所 Railway material supply resource recommendation method, system and storage medium
CN117952657A (en) * 2024-03-26 2024-04-30 吉林省吉龙芯科技有限公司 Information pushing method based on energy Internet marketing service system

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110866181B (en) * 2019-10-12 2022-04-22 平安国际智慧城市科技股份有限公司 Resource recommendation method, device and storage medium
CN111488526B (en) * 2020-04-14 2024-04-05 北京声智科技有限公司 Recommendation method and device
CN111581505B (en) * 2020-04-28 2023-07-07 海南太美航空股份有限公司 Flight recommendation method and system based on combined recommendation
CN112559901B (en) * 2020-12-11 2022-02-08 百度在线网络技术(北京)有限公司 Resource recommendation method and device, electronic equipment, storage medium and computer program product
CN113538053B (en) * 2021-07-20 2023-09-01 深圳市爱易讯数据有限公司 OTT resource bit classification method, system and storage medium for brand construction
CN114003826A (en) * 2021-12-31 2022-02-01 思创数码科技股份有限公司 Resource directory recommendation method and device, readable storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354867A (en) * 2016-09-12 2017-01-25 传线网络科技(上海)有限公司 Multimedia resource recommendation method and device
CN106802915A (en) * 2016-12-09 2017-06-06 宁波大学 A kind of academic resources based on user behavior recommend method
CN107492008A (en) * 2017-08-09 2017-12-19 阿里巴巴集团控股有限公司 Information recommendation method, device, server and computer-readable storage medium
CN110309405A (en) * 2018-03-08 2019-10-08 腾讯科技(深圳)有限公司 A kind of item recommendation method, device and storage medium
CN110866181A (en) * 2019-10-12 2020-03-06 平安国际智慧城市科技股份有限公司 Resource recommendation method, device and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462385B (en) * 2014-12-10 2018-07-03 山东科技大学 A kind of film personalization similarity calculating method based on user interest model
CN105589916B (en) * 2016-01-11 2020-05-08 西华大学 Method for extracting explicit and implicit interest knowledge
CN106815297B (en) * 2016-12-09 2020-04-10 宁波大学 Academic resource recommendation service system and method
CN107330461B (en) * 2017-06-27 2020-11-03 安徽师范大学 Emotion and trust based collaborative filtering recommendation method
CN109947983A (en) * 2017-09-19 2019-06-28 Tcl集团股份有限公司 Video recommendation method, system, terminal and computer readable storage medium
US10699321B2 (en) * 2017-10-17 2020-06-30 Adobe Inc. Global vector recommendations based on implicit interaction and profile data
CN108628999B (en) * 2018-05-02 2022-11-11 南京大学 Video recommendation method based on explicit and implicit information
CN109241405B (en) * 2018-08-13 2021-11-23 华中师范大学 Learning resource collaborative filtering recommendation method and system based on knowledge association
CN110188208B (en) * 2019-06-04 2021-01-26 河海大学 Knowledge graph-based information resource query recommendation method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354867A (en) * 2016-09-12 2017-01-25 传线网络科技(上海)有限公司 Multimedia resource recommendation method and device
CN106802915A (en) * 2016-12-09 2017-06-06 宁波大学 A kind of academic resources based on user behavior recommend method
CN107492008A (en) * 2017-08-09 2017-12-19 阿里巴巴集团控股有限公司 Information recommendation method, device, server and computer-readable storage medium
CN110309405A (en) * 2018-03-08 2019-10-08 腾讯科技(深圳)有限公司 A kind of item recommendation method, device and storage medium
CN110866181A (en) * 2019-10-12 2020-03-06 平安国际智慧城市科技股份有限公司 Resource recommendation method, device and storage medium

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095583A (en) * 2021-04-23 2021-07-09 何桂霞 Data analysis method applied to business management and business management server
CN113095583B (en) * 2021-04-23 2024-05-03 浙江皓亮智享信息技术咨询有限公司 Data analysis method applied to service management and service management server
CN113220999A (en) * 2021-05-14 2021-08-06 北京百度网讯科技有限公司 User feature generation method and device, electronic equipment and storage medium
CN113343086A (en) * 2021-06-01 2021-09-03 合肥工业大学 Data-driven green design knowledge pushing method
CN113343086B (en) * 2021-06-01 2022-09-16 合肥工业大学 Data-driven green design knowledge pushing method
CN113590935A (en) * 2021-06-30 2021-11-02 深圳市东信时代信息技术有限公司 Information recommendation method and device, computer equipment and storage medium
CN113538108A (en) * 2021-07-27 2021-10-22 北京沃东天骏信息技术有限公司 Resource information determination method and device, electronic equipment and storage medium
CN114039744A (en) * 2021-09-29 2022-02-11 中孚信息股份有限公司 Abnormal behavior prediction method and system based on user characteristic label
CN114039744B (en) * 2021-09-29 2024-02-27 中孚信息股份有限公司 Abnormal behavior prediction method and system based on user feature labels
CN115422438A (en) * 2022-07-21 2022-12-02 中国铁道科学研究院集团有限公司电子计算技术研究所 Railway material supply resource recommendation method, system and storage medium
CN115422438B (en) * 2022-07-21 2023-07-28 中国铁道科学研究院集团有限公司电子计算技术研究所 Method, system and storage medium for recommending railway material supply resources
CN117952657A (en) * 2024-03-26 2024-04-30 吉林省吉龙芯科技有限公司 Information pushing method based on energy Internet marketing service system

Also Published As

Publication number Publication date
CN110866181A (en) 2020-03-06
CN110866181B (en) 2022-04-22

Similar Documents

Publication Publication Date Title
WO2021068610A1 (en) Resource recommendation method and apparatus, electronic device and storage medium
WO2019169756A1 (en) Product recommendation method and apparatus, and storage medium
WO2020048084A1 (en) Resource recommendation method and apparatus, computer device, and computer-readable storage medium
US20200110842A1 (en) Techniques to process search queries and perform contextual searches
WO2018103718A1 (en) Application recommendation method and apparatus, and server
CN109165975B (en) Label recommending method, device, computer equipment and storage medium
CN106844407B (en) Tag network generation method and system based on data set correlation
WO2019062021A1 (en) Method for pushing loan advertisement in application program, electronic device, and medium
US9176969B2 (en) Integrating and extracting topics from content of heterogeneous sources
CN110503459B (en) User credibility assessment method and device based on big data and storage medium
CN110880006B (en) User classification method, apparatus, computer device and storage medium
CN109471978B (en) Electronic resource recommendation method and device
JP2013511085A (en) Search method and system
CN111178949B (en) Service resource matching reference data determining method, device, equipment and storage medium
WO2019061664A1 (en) Electronic device, user&#39;s internet surfing data-based product recommendation method, and storage medium
WO2022105496A1 (en) Intelligent follow-up contact method and apparatus, and electronic device and readable storage medium
CN111966886A (en) Object recommendation method, object recommendation device, electronic equipment and storage medium
CN113032668A (en) Product recommendation method, device and equipment based on user portrait and storage medium
Xu et al. Efficient summarization framework for multi-attribute uncertain data
CN111967045A (en) Big data-based data publishing privacy protection algorithm and system
CN112529636A (en) Commodity recommendation method and device, computer equipment and medium
CN111667018A (en) Object clustering method and device, computer readable medium and electronic equipment
CN114282119B (en) Scientific and technological information resource retrieval method and system based on heterogeneous information network
US11709798B2 (en) Hash suppression
CN112182390B (en) Mail pushing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20874506

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 18.08.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20874506

Country of ref document: EP

Kind code of ref document: A1