CN115712761A

CN115712761A - Recommendation method and system, and storage medium

Info

Publication number: CN115712761A
Application number: CN202110960735.9A
Authority: CN
Inventors: 李娜; 冯子威; 程造洋; 储一鹏; 罗红
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2023-02-24

Abstract

The embodiment of the application discloses a recommendation method and a system as well as a storage medium, wherein the recommendation system determines a data set according to original data; performing distributed computing processing according to the data set to obtain a similarity parameter; obtaining a similarity dictionary according to the similarity parameters; performing recommendation processing according to the similarity dictionary; the accuracy and the calculation efficiency of similarity calculation can be improved, a better recommendation effect is obtained, and the feasibility of a recommendation model is improved.

Description

Recommendation method and system, and storage medium

Technical Field

The present invention relates to the field of data service technologies, and in particular, to a recommendation method and system, and a storage medium.

Background

The similarity algorithm is the core of a recommendation model, and the currently commonly used similarity algorithm mainly comprises Euclidean distance similarity, cosine similarity and Pearson similarity; however, the similarity calculation methods all require all scores of two objects to be calculated as vectors, when the score matrix is sparse, that is, the number of the two objects or two users to be scored simultaneously is small, a large number of single-user or single-object evaluations will cause deviation of similarity, and the precision of similarity calculation is poor; meanwhile, the common similarity algorithm not only needs a large amount of memories to store the scoring vector matrix, but also cannot realize parallel computation, so that the computation efficiency is greatly reduced, and the feasibility of constructing a recommendation model by using large-scale data is greatly reduced.

Disclosure of Invention

The embodiment of the application provides a recommendation method, a recommendation system and a storage medium, which can improve the accuracy and the calculation efficiency of similarity calculation, further obtain a better recommendation effect and improve the feasibility of a recommendation model.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a recommendation method, where the method includes:

determining a data set according to the original data;

performing distributed computing processing according to the data set to obtain a similarity parameter;

obtaining a similarity dictionary according to the similarity parameter;

and performing recommendation processing according to the similarity dictionary.

In a second aspect, an embodiment of the present application provides a recommendation system, which includes a determination unit, an obtaining unit, and a recommendation unit,

the determining unit is used for determining a data set according to the original data;

the acquisition unit is used for performing distributed computing processing according to the data set to obtain a similarity parameter; obtaining a similarity dictionary according to the similarity parameter;

and the recommending unit is used for recommending according to the similarity dictionary.

In a third aspect, an embodiment of the present application provides a recommendation system, which further includes a processor and a memory storing instructions executable by the processor, and when the instructions are executed by the processor, the recommendation method as described above is implemented.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a program is stored, and the program is applied to a recommendation system, and when the program is executed by a processor, the program implements the recommendation method as described above.

The embodiment of the application provides a recommendation method and system and a storage medium, wherein the recommendation system determines a data set according to original data; performing distributed computing processing according to the data set to obtain a similarity parameter; obtaining a similarity dictionary according to the similarity parameters; and performing recommendation processing according to the similarity dictionary. That is to say, in the embodiment of the application, a data set can be determined according to original data, then the data set is subjected to distributed computing processing to obtain similarity, a similarity dictionary is obtained according to the similarity obtained after the distributed computing processing, and finally recommendation processing is performed according to the similarity dictionary.

Drawings

Fig. 1 is a first schematic flow chart illustrating an implementation process of a recommendation method according to an embodiment of the present application;

fig. 2 is a schematic diagram illustrating a second implementation flow of a recommendation method according to an embodiment of the present application;

fig. 3 is a schematic flow chart illustrating an implementation process of the recommendation method according to the embodiment of the present application;

fig. 4 is a schematic diagram illustrating an implementation flow of a recommendation method according to an embodiment of the present application;

fig. 5 is a schematic diagram of an implementation flow of a recommendation method provided in an embodiment of the present application;

fig. 6 is a schematic flow chart illustrating an implementation process of the recommendation method according to the embodiment of the present application;

fig. 7 is a seventh implementation flow diagram of the recommendation method provided in the embodiment of the present application;

fig. 8 is a schematic diagram illustrating an implementation of a recommendation method according to an embodiment of the present application;

fig. 9 is a first schematic structural diagram of a component of a recommendation system according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a component of a recommendation system according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are illustrative of the relevant application and are not limiting of the application. It should be noted that, for the convenience of description, only the parts related to the related applications are shown in the drawings.

The similarity algorithm is the core of the recommendation model, and the currently commonly used similarity algorithms are mainly the following three methods: euclidean distance similarity, cosine similarity and pearson similarity; the euclidean distance is a linear distance in euclidean space, and usually represents the similarity between two object features by using a two-dimensional euclidean distance, which can be represented as the following formula:

the similarity value is between 0 and 1, and the closer the distance, the higher the similarity. Therefore, the euclidean distance similarity is defined as the following equation:

cosine similarity is the cosine value of the included angle of two vectors, and the similarity between the two vectors is described. If the included angle is 90 degrees, the similarity is 0; if the two vectors are in the same direction, the similarity is 1. The cosine similarity of two vectors is defined as the following equation:

where | X | |, | Y | | | represents the 2-norm of vector X and vector Y.

The pearson correlation coefficient is also an important metric for measuring the correlation between two vectors, and is defined as the following formula:

because the range of the Pearson correlation coefficient is between-1 and 1, the Pearson correlation coefficient is usually processed as follows, and the similarity is normalized to be between 0 and 1:

Simi _x,y ＝0.5+0.5ρ _x,y (5)

however, the prior art cannot overcome the problems of data storage, operation efficiency and the like, and cannot overcome the similarity calculation deviation caused by a sparse matrix; firstly, all scores of two objects are required to be calculated as vectors by the existing three recommendation methods, and the method not only needs a large amount of memories to store score vector matrixes, but also causes that the three recommendation methods cannot perform parallel calculation, thereby greatly reducing the calculation efficiency and the feasibility of constructing a recommendation model by using large-scale data; secondly, the existing recommendation method uses all scoring data when describing the similarity of two objects, and 0 treatment is performed when missing values, namely, no scoring, occur. When the scoring matrix is sparse, i.e., there are fewer instances where two items or two users are scored simultaneously, a large number of single user/single item evaluations will result in a bias in similarity.

In order to solve the problems of the recommendation method in the prior art, the embodiment of the application records the intermediate value of the similarity between the users/articles by scanning all scores of the users/articles one by one, so that the parallel calculation of the similarity is realized; meanwhile, only the condition that the user evaluates the two articles together or the articles are evaluated by the two users simultaneously is considered, namely the probability that the two objects are evaluated simultaneously is considered, so that the method is more in line with the idea of calculating and evaluating similarity, and the accuracy of a recommendation model is higher; in addition, the similarity dictionary type storage method and the device have the advantages that the similarity matrix is stored instead of the scoring vector matrix in the similarity dictionary mode, storage space is greatly reduced, query efficiency is improved, and feasibility of a large-scale data recommendation model is improved.

Specifically, the application provides a recommendation method and system, and a storage medium, wherein the recommendation system determines a data set according to original data; performing distributed computing processing according to the data set to obtain a similarity parameter; obtaining a similarity dictionary according to the similarity parameters; performing recommendation processing according to the similarity dictionary; the accuracy and the calculation efficiency of similarity calculation can be improved.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

Example one

The embodiment of the application provides a recommendation method, and the method for calculating the similarity of the recommendation system comprises the following steps:

step 101, determining a data set according to the original data.

In the embodiment of the application, when the recommendation system performs similarity calculation, a data set may be determined according to original data.

It should be noted that, in the embodiments of the present application, the raw data refers to the scoring data of the user on the item; wherein, the number of the users and the articles can be multiple; that is, the raw data includes the ratings of different items by different users, respectively.

Further, in the embodiment of the present application, the original data may be processed according to a preset data type; illustratively, the raw data is data in a flexible Distributed data set (RDD) format; raw data in RDD format may be first processed into a data type with (User) score as a primary key, so that a corresponding score value can be quickly obtained through the primary key.

Further, in the embodiment of the application, after the original data is processed according to the preset data type, the original data can be averaged; for example, if there are multiple scores of the same item by one user in the raw data, the scores of the item are averaged; for example, if user A scored item P three times, with a score of 7,7,8, user A would score item P22/3.

It should be noted that, in the embodiment of the present application, the data set is determined according to the original data, and the determination manner of the data set may include two manners, one of which is to integrate the original data based on the dimension of the article in the original data to obtain the first data set; the other method is that the original data is integrated based on the user dimension in the original data to obtain a second data set; also, the data set is determined in one of the two ways described above, that is, the data set may be the first data set or the second data set.

Further, in the embodiment of the present application, when determining a data set according to original data, a basis of the original data needs to be determined first, and then a corresponding data set is obtained according to the basis; wherein, the base can be a user dimension or an item dimension in the raw data.

Further, in the embodiment of the present application, the method is based on the smaller one of the number of users and the number of articles in the original data; illustratively, in the original data, if the number of items is three thousand, the number of users is forty thousand, and the number of items is less than the number of users, the item dimension is determined as a base, and integration processing is performed to obtain a second data set.

Further, in the embodiment of the present application, the integration processing refers to grouping the bases as new primary keys, and integrating the original data in a list form; illustratively, the item is determined as a base, and the new primary key is (user x, (the user's scoring list for all items)); if the user is determined to be the base, the new primary key is (item x, (a list of all users' ratings for the item)).

It can be understood that, in the embodiments of the present application, the first data set and the second data set are different forms of score lists obtained after the integration processing; that is, the first data set may be a listing of all items scored for a particular user; the second data set may be a listing of all users' ratings for a particular item; illustratively, the users in the original data include a, B, and C; the article comprises 1, 2 and 3; the first data set may include score data for 1 for a, B, C, respectively, score data for 2 for a, B, C, respectively, and score data for 3 for a, B, C, respectively; and a second data set may include a scoring data for 1, 2, 3, respectively, B scoring data for 1, 2, 3, respectively, and C scoring data for 1, 2, 3, respectively.

And 102, performing distributed computing processing according to the data set to obtain a similarity parameter.

In the embodiment of the application, after the recommendation system determines the data set according to the original data, the recommendation system may perform distributed computing processing according to the data set to obtain the similarity parameter.

It should be noted that, in the embodiment of the present application, the distributed computing processing refers to performing multi-line parallel computing, that is, after a data set is obtained, the data set is used to perform multi-line parallel computing to obtain a similarity parameter, so that the computing efficiency is effectively improved through the distributed computing processing.

Illustratively, in the embodiment of the present application, based on the above example, the data set determined according to the original data is a second data set, the second data set includes score data of a for 1, 2, and 3, score data of B for 1, 2, and 3, and score data of C for 1, 2, and 3, the score data of a for 1, 2, and 3, respectively, may be placed in a host a for calculation of the similarity parameter, the score data of B for 1, 2, and 3, respectively, is placed in B host B for calculation of the similarity parameter, and the score data of C for 1, 2, and 3, respectively, is placed in C host C for calculation of the similarity parameter; therefore, the calculation efficiency is improved in a multi-line parallel calculation mode.

Further, in the embodiment of the present application, the distributed computation process is performed based on the map function, that is, the similarity parameter is found through the map function and the distributed computation framework.

For example, in the embodiment of the present application, the data set is a second data set, and the similarity may be expressed as the following formula:

wherein, the first and the second end of the pipe are connected with each other,

similarity parameters for scoring between item i and item j for user u; r is _ui Scoring item i for user u, r _uj For the user u to score the item j,

it is understood that, in the embodiment of the present application, based on the above example, when the scores of the two items i and j by the user u are more consistent, the similarity between the items i and j is higher.

And 103, obtaining a similarity dictionary according to the similarity parameters.

In the embodiment of the application, after the recommendation system performs distributed computing processing according to the data set to obtain the similarity parameter, the similarity dictionary may be obtained according to the similarity parameter.

It should be noted that, in the embodiment of the present application, the similarity dictionary is determined according to the similarity, and the similarity is obtained through distributed computation, so that the similarity dictionary may be obtained through cross-partition processing.

Exemplarily, in the embodiment of the present application, based on the above example, the data set is a first data set; summarizing different similarities obtained on the hosts a, b and c, and further calculating according to the similarities to obtain a similarity dictionary; namely, through cross-partition processing, a similarity dictionary is obtained.

Specifically, in the embodiment of the present application, the similarity dictionary is obtained according to the similarity parameter, and the similarity parameter may be summed to obtain the total similarity; then, carrying out summation operation on the evaluation numbers corresponding to the similarity parameters to obtain the total evaluation number; and finally, carrying out division operation on the total similarity and the total evaluation number to obtain a similarity dictionary.

Further, in the embodiment of the present application, based on the above example, the similarity dictionary may be expressed as the following formula:

where ρ is _ij Similarity obtained for scoring item i and item j simultaneously based on all users; p is a radical of formula _ij Probability of scoring item i and item j for all users simultaneously; s _u A set of items that are scored for user U, U being a set of users,

is the sum of the similarity degrees,

the total number of users that scored item i and item j.

Further, in the embodiment of the present application, the similarity dictionary may be obtained by using reduce operation.

And 104, performing recommendation processing according to the similarity dictionary.

In the embodiment of the application, after the recommendation system obtains the similarity dictionary according to the similarity parameter, recommendation processing can be performed according to the similarity dictionary.

It is understood that, in the embodiments of the present application, the similarity dictionary may reflect similarities between items; therefore, item recommendation can be performed according to the similarity dictionary, i.e., recommendation processing is realized.

Specifically, in the embodiment of the present application, when recommendation processing is performed by using the similarity dictionary, an evaluated set and an unevaluated set may be obtained first; further, according to the similarity dictionary and the evaluated set, the grade of the non-evaluated set is obtained; and finally, carrying out recommendation processing according to the scores of the non-evaluation set.

For example, in the embodiment of the application, when recommending an item to the user a, an evaluated set and an unevaluated set of the user a are obtained first, and since the item in the unevaluated set has no score, the preference degree of the user a for the item in the unevaluated set cannot be judged, so that the score of the unevaluated set can be obtained by using the similarity dictionary and the evaluated set, and then the recommendation processing can be performed on the user a by combining the scores of the unevaluated set.

Further, in an embodiment of the present application, when obtaining the score of the unevaluated set from the similarity dictionary and the evaluated set, the score of the unevaluated set may be obtained by the following formula:

wherein the content of the first and second substances,

as a score of an unevaluated item in the unevaluated set, r _k For the rating of the evaluated items in the evaluated set, simi _ki Is the similarity between item i and item k.

It can be understood that, in the embodiment of the present application, based on the above equation (8), the principle of obtaining the score of the non-evaluated set is that, when the recommendation object does not score the item i, the similarity between the item i and the item j that has been evaluated by the recommendation object can be obtained by using the scores of the item i by other users who have scored the item i, so that the score of the user relative to the item i is obtained according to the similarity between the item i and the item j and the score of the recommendation object on the item j.

Further, in the embodiment of the application, the recommendation processing is to take the first N items for recommendation after the items are ranked from high to low; wherein N may be any number, and the application is not limited.

Fig. 2 is a schematic view of a second implementation flow of the recommendation method provided in the embodiment of the present application, and as shown in fig. 2, the recommendation system performs distributed computation processing according to the data set to obtain the similarity parameter, that is, step 102 may include the following steps:

step 102a, calculating a ratio of scores of different articles by one user in the first data set by using a preset algorithm to obtain a first similarity.

In the embodiment of the application, the recommendation system performs distributed calculation processing according to the data set to obtain the similarity parameter, and specifically, the recommendation system may first perform calculation processing on a ratio between scores of different articles by a user in the first data set by using a preset algorithm to obtain the first similarity.

It is to be understood that, in the embodiments of the present application, the preset algorithm refers to a map function.

Further, in the embodiment of the present application, when the data set is the first data set, the first similarity refers to a ratio between scores of different items of one user in the first data set.

Illustratively, in the embodiment of the present application, the user includes a, B, and C, the item includes 1, 2, and 3, and the first similarity is calculated by using the above formula (6); the first similarity includes a similarity constituted by any two of the scores of the items 1, 2, 3 by a, a similarity constituted by any two of the scores of the items 1, 2, 3 by B, and a similarity constituted by any two of the scores of the items 1, 2, 3 by C.

And 102b, carrying out minimum value processing on the first similarity to obtain a similarity parameter.

In the embodiment of the application, after the recommendation system calculates and processes the ratio between the scores of different items by one user in the first data set by using a preset algorithm to obtain the first similarity, the minimum value processing may be performed on the first similarity to obtain the similarity parameter.

It is to be understood that, in the embodiments of the present application, the minimum value processing means taking a minimum value for the first similarity; the formula for calculating the first similarity is shown in the foregoing formula (6), and r is _ui And r _uj For the scoring of any two articles, the first similarity needs to be minimized to ensure that the interval of similarity values is between 0 and 1.

Fig. 3 is a schematic flow chart of a third implementation process of the recommendation method provided in the embodiment of the present application, and as shown in fig. 3, the recommendation system performs distributed computation processing according to a data set to obtain a similarity parameter, that is, step 102 may further include the following steps:

and 102c, respectively calculating the ratio of different scores of one article and different users to one article in the second data set by using a preset algorithm to obtain a second similarity.

In the embodiment of the application, the recommendation system performs distributed calculation processing according to the data set to obtain the similarity parameter, and specifically, the recommendation system may further perform calculation processing on ratios of different scores of one item under one item and different users on one item in the second data set respectively by using a preset algorithm to obtain the second similarity.

It should be noted that, in the embodiment of the present application, the second similarity refers to a similarity obtained by calculating a ratio between different scores of an item by different users under an item in the second data set; that is, the second similarity can characterize the similarity between different users for the same item.

It is understood that, in the embodiment of the present application, since the second similarity can represent the similarity between different users for the same item; therefore, when the similarity dictionary is obtained according to the similarity parameter, if the similarity dictionary is obtained according to the second similarity, the similarity dictionary can describe the similarity between different users, so that when recommendation processing is performed on a recommendation object by using the similarity dictionary, preference items of other users with high similarity to the recommendation object can be recommended to the recommendation object, and recommendation processing can be realized.

And 102d, carrying out minimum value processing on the second similarity to obtain a similarity parameter.

In the embodiment of the application, the recommendation system respectively calculates and processes the ratio of different scores of one item in the second data set to different scores of one item by different users by using a preset algorithm, and after the second similarity is obtained, the minimum value processing may be performed on the second similarity to obtain the similarity parameter.

It is understood that in the embodiment of the present application, in order to ensure that the interval of the second similarity value is between 0 and 1, the second similarity value needs to be minimized.

Fig. 4 is a schematic view of an implementation flow of the recommendation method provided in the embodiment of the present application, and as shown in fig. 4, the similarity dictionary is obtained according to the similarity parameter, that is, the method provided in step 103 may include the following steps:

and 103a, carrying out summation operation on the similarity parameters to obtain the total similarity.

In the embodiment of the application, the recommendation system obtains the similarity dictionary according to the similarity parameter, and specifically, the recommendation system can perform summation operation on the similarity parameter to obtain the total similarity.

It is to be understood that, in the embodiment of the present application, for example, as shown in the foregoing formula (7), when the similarity is the first similarity; the similarity parameters are subjected to summation operation to obtain the total similarity which is

And 103b, carrying out summation operation on the evaluation numbers corresponding to the similarity parameters to obtain the total evaluation number.

Further, in the embodiment of the application, the recommendation system obtains the similarity dictionary according to the similarity parameter, and specifically, the evaluation number corresponding to the similarity parameter may be summed to obtain the total evaluation number.

It is to be understood that, in the embodiment of the present application, exemplarily, as shown in the foregoing formula (7), when the similarity is the first similarity; summing the evaluation numbers corresponding to the similarity parameters to obtain the total evaluation number

And 103c, performing division operation on the total similarity and the total evaluation number to obtain a similarity dictionary.

In the embodiment of the application, the recommendation system performs summation operation on the similarity parameters to obtain the total similarity, performs summation operation on the evaluation numbers corresponding to the similarity parameters to obtain the total evaluation number, and then performs division operation on the total similarity and the total evaluation number to obtain the similarity dictionary.

It is to be understood that, in the embodiment of the present application, exemplarily, as shown in the foregoing formula (7), when the similarity is the first similarity; to total similarity

And evaluating the total number

Division operation is carried out to obtain a similarity dictionary Simi _i,j 。

Fig. 5 is a schematic diagram of an implementation flow of the recommendation method provided in the embodiment of the present application, and as shown in fig. 5, after the similarity dictionary is obtained according to the similarity parameter, that is, after step 103, the method for the recommendation system to perform similarity calculation may further include the following steps:

and 105, acquiring incremental data.

In the embodiment of the application, the recommendation system may further obtain the incremental data after obtaining the similarity dictionary according to the similarity parameter.

It should be noted that, in the embodiment of the present application, the incremental data may also be called online data, that is, online data newly generated in the recommendation system.

It is understood that, in the embodiment of the present application, the incremental data may include the newly added user, the item, and the rating data of the user on the item.

For example, in the embodiment of the present application, the incremental data may be the added user D and the rating of the item by the user D.

And 106, obtaining an increment similarity dictionary according to the increment data.

In an embodiment of the application, after the recommendation system acquires the incremental data, an incremental similarity dictionary may be obtained according to the incremental data.

It can be understood that, in the embodiment of the present application, after obtaining the similarity dictionary, the recommendation system may store the similarity dictionary first, and then after obtaining the incremental data, may obtain the incremental similarity dictionary according to the incremental data.

Specifically, in the embodiment of the present application, the incremental similarity dictionary may perform calculation according to incremental data, and the calculation manner is the same as the foregoing manner of calculating the similarity, except that the calculated data is incremental data; for example, the incremental similarity and the incremental number may be calculated according to the incremental data; the incremental similarity may be expressed as:

wherein, U ₁ Adding a user data set, namely incremental data; increment a number of

and scoring a set of user scores corresponding to the incremental data.

And step 107, updating the similarity dictionary according to the increment similarity dictionary.

In an embodiment of the application, after the recommendation system obtains the incremental similarity according to the incremental data, the similarity dictionary may be updated according to the incremental similarity.

It can be understood that, in the embodiment of the present application, the similarity dictionary may be updated according to the incremental similarity dictionary; specifically, the incremental similarity dictionary may be merged with the original similarity dictionary to obtain an updated similarity dictionary.

For example, in the embodiment of the present application, the updated similarity dictionary may be expressed as the following formula:

fig. 6 is a schematic view illustrating an implementation flow of a recommendation method according to an embodiment of the present application, as shown in fig. 6, a method for performing recommendation processing by a recommendation system according to a similarity dictionary, that is, the method provided in step 104 may include the following steps:

and 104a, acquiring an evaluated set and an unevaluated set.

In the embodiment of the application, the recommendation system performs recommendation processing according to the similarity dictionary, and specifically, the recommendation system may first obtain an evaluated set and an unevaluated set.

For example, in an embodiment of the present application, a rated set may be an item set that has already been rated by a recommendation object, and an unvalued set may be an item set that has not been rated by the recommendation object; at least one unevaluated item may be included in the unevaluated set.

And 104b, obtaining the scores of the non-evaluated set according to the similarity dictionary and the evaluated set.

In an embodiment of the application, after the recommendation system obtains the evaluated set and the unevaluated set, the scores of the unevaluated set may be obtained according to the similarity dictionary and the evaluated set.

It is to be understood that, in the embodiments of the present application, the scores of the unvalued sets are obtained from the similarity dictionary and the evaluated sets; illustratively, the score of the unvalued set may be calculated by the aforementioned formula (8).

And step 104c, recommending according to the scores of the non-evaluation sets.

In the embodiment of the application, after the recommendation system obtains the scores of the non-evaluated sets according to the similarity dictionary and the evaluated sets, recommendation processing may be performed according to the scores of the non-evaluated sets.

In an embodiment of the present application, after obtaining the scores of the non-evaluation set, the recommendation system may sort the scores of all the items from high to low, and recommend the items corresponding to the top N ranked scores as recommended items.

Fig. 7 is a seventh implementation flow diagram of the recommendation method provided in the embodiment of the present application, and as shown in fig. 7, the recommendation system obtains the scores of the non-evaluated sets according to the similarity dictionary and the evaluated sets, that is, step 104b may include the following steps:

step 201, performing query processing on the similarity dictionary to obtain recommendation similarity between the evaluated set and the non-evaluated set and scores of the evaluated set.

In an embodiment of the application, the recommendation system obtains the score of the non-evaluated set according to the similarity dictionary and the evaluated set, and specifically, the recommendation system may perform query processing on the similarity dictionary to obtain the recommendation similarity between the evaluated set and the non-evaluated set and the score of the evaluated set.

It is understood that, in the embodiment of the present application, the recommendation similarity between the evaluated set and the unvalued set and the score of the evaluated set may be obtained by performing query processing on the similarity dictionary.

Exemplarily, in the embodiment of the present application, as shown in the foregoing formula (8), the recommended similarity may be Simi _ki The score of the evaluated set may be r _k 。

And 202, obtaining the grade of the non-evaluated set according to the recommendation similarity and the grade of the evaluated set.

In the embodiment of the application, the recommendation system performs query processing on the similarity dictionary to obtain recommendation similarity between the evaluated set and the unevaluated set, and after the scores of the evaluated set are obtained, the scores of the unevaluated set can be obtained according to the recommendation similarity and the scores of the evaluated set.

It is understood that in the embodiments of the present application, the scores of the non-rated sets may be obtained according to the recommendation similarity and the scores of the rated sets.

For example, in the embodiment of the present application, the score of the unvalued set may be calculated according to the aforementioned formula (8).

The embodiment of the application provides a recommendation method, wherein a recommendation system determines a data set according to original data; performing distributed computing processing according to the data set to obtain a similarity parameter; obtaining a similarity dictionary according to the similarity parameters; and performing recommendation processing according to the similarity dictionary. That is to say, in the embodiment of the application, a data set may be determined according to original data, and then the data set is subjected to distributed computing to obtain similarity, and a similarity dictionary is obtained according to the similarity obtained after the distributed computing is performed, and finally recommendation processing is performed according to the similarity dictionary.

Example two

For example, in another embodiment of the present application, based on a collaborative filtering algorithm, fig. 8 is a schematic diagram illustrating an implementation of a recommendation method provided in the embodiment of the present application, and as shown in fig. 8, a recommendation result may be fed back to online data or offline data by creating and updating a similarity dictionary and then using a recommendation process implemented by the similarity dictionary.

For example, in the embodiment of the present application, after the raw data, that is, the stock data, is obtained, a data set may be determined according to the raw data; specifically, raw data in the RDD format is processed into a data type with a (User) score as a primary key, and the raw data is subjected to averaging processing.

Further, in the embodiment of the present application, after the processing, a base of the original data needs to be determined, and then a data set is determined according to the base; the base may be one of two dimensions, article and user, and the base may be the lesser of the number of users and the number of articles in the raw data.

Illustratively, in the embodiments of the present application, a data set is determined according to original data based on an item dimension, that is, a first data set is subjected to subsequent creation of a similarity dictionary.

Specifically, in the embodiment of the present application, first, distributed calculation processing needs to be performed on the first data set to obtain the first similarity, and a calculation manner of the first similarity is shown in the foregoing formula (6).

Further, in the embodiment of the present application, the distributed computation process is performed based on a map function, that is, the first similarity is found by the map function and the distributed computation framework.

Therefore, in the embodiment of the application, the first similarity can be obtained through a large amount of parallel computing, so that the computing speed of the similarity can be improved, and the creating speed of the similarity dictionary can be further improved.

Further, in the embodiment of the present application, a similarity dictionary may be obtained according to the first similarity, and the calculation manner of the similarity dictionary is as shown in the foregoing formula (7), that is, the first similarity is subjected to summation operation to obtain the total similarity of the first similarity; then, carrying out summation operation on the evaluation numbers corresponding to the first similarity to obtain the total evaluation number of the first similarity; and finally, performing division operation on the evaluation number of the first similarity and the evaluation total number of the first similarity to obtain a similarity dictionary.

Further, in the embodiment of the present application, the similarity dictionary may also be updated according to the incremental data, and similarly, after the incremental data is obtained, basic preprocessing may be performed on the incremental data first, including data type adjustment, averaging processing, and the like; and then determining an increment similarity dictionary corresponding to the increment data according to the increment data, and updating the similarity dictionary according to the increment similarity dictionary.

It can be understood that, in the embodiment of the application, since the original similarity dictionary is stored, the similarity dictionary can be updated on the basis of the original similarity dictionary, so that secondary calculation of the original similarity dictionary is not needed, and feasibility is provided for combination of offline recommendation and online recommendation.

Further, in the embodiment of the application, after the similarity dictionary is obtained, recommendation processing can be realized according to the similarity dictionary;

specifically, in the embodiment of the present application, an evaluated set and an unevaluated set may be obtained first; and then obtaining the scores of the non-evaluated set according to the similarity dictionary and the evaluated set, and finally performing the recommendation processing according to the scores of the non-evaluated set, wherein the scores of the non-evaluated set can be calculated through the formula (8).

Specifically, in the embodiment of the present application, after the scores of the unvalued sets are obtained, the recommendation lists may be generated by taking the top N items before the scoring, so as to implement the recommendation processing.

EXAMPLE III

Based on the foregoing embodiment, in another embodiment of the present application, fig. 9 is a schematic structural diagram of a composition of a recommendation system provided in the embodiment of the present application, and as shown in fig. 9, a recommendation system 10 provided in the embodiment of the present application may include a determining unit 11, an obtaining unit 12, a recommending unit 13, and an updating unit 14.

The determining unit 11 is configured to determine a data set according to the original data.

The obtaining unit 12 is configured to perform distributed computation processing according to the data set to obtain a similarity parameter; and obtaining a similarity dictionary according to the similarity parameter.

The recommending unit 13 is configured to perform recommendation processing according to the similarity dictionary.

Further, the determining unit 11 is specifically configured to perform integration processing on the original data based on a user dimension in the original data to obtain a first data set; or integrating the original data by taking the dimension of the article in the original data as a base to obtain a second data set.

Further, the obtaining unit 12 is specifically configured to calculate a ratio between scores of different items by a user in the first data set by using a preset algorithm, so as to obtain a first similarity; and carrying out minimum value processing on the first similarity to obtain the similarity.

Further, the obtaining unit 12 is specifically configured to utilize a preset algorithm to respectively calculate and process ratios between different scores of one item in the second data set and different users for the one item, so as to obtain a second similarity; and carrying out minimum value processing on the second similarity to obtain the similarity.

Further, the obtaining unit 12 is specifically configured to perform summation operation on the similarity parameters to obtain a total similarity; summing the evaluation numbers corresponding to the similarity parameters to obtain the total evaluation number; and performing division operation on the total similarity and the total evaluation number to obtain the similarity dictionary.

Further, the obtaining unit 12 is further configured to obtain incremental data after obtaining the similarity dictionary according to the similarity parameter; and obtaining an increment similarity dictionary according to the increment data.

Further, the updating unit 14 is configured to update the similarity dictionary according to the incremental similarity dictionary.

Further, the recommending unit 13 is specifically configured to obtain an evaluated set and an unevaluated set; obtaining the grade of the non-evaluated set according to the similarity dictionary and the evaluated set; and performing the recommendation processing according to the rating of the unvalued set.

Further, the obtaining unit 12 is specifically configured to perform query processing on the similarity dictionary, and obtain recommendation similarity between the evaluated set and the unevaluated set and a score of the evaluated set; and obtaining the grade of the non-evaluated set according to the recommendation similarity and the grade of the evaluated set.

Fig. 10 is a schematic diagram of a second composition structure of the recommendation system according to the embodiment of the present application, and as shown in fig. 10, the recommendation system 10 according to the embodiment of the present application may further include a processor 15 and a memory 16 storing executable instructions of the processor 15, and further, the recommendation system 10 may further include a communication interface 17, and a bus 18 for connecting the processor 15, the memory 16, and the communication interface 17.

In an embodiment of the present Application, the Processor 15 may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a ProgRAMmable Logic Device (PLD), a Field ProgRAMmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic device for implementing the above processor function may be other electronic devices, and the embodiments of the present application are not limited in particular. The processor 15 may further comprise a memory 16, which memory 16 may be connected to the processor 15, wherein the memory 16 is configured to store executable program code comprising computer operating instructions, and wherein the memory 16 may comprise a high speed RAM memory and may further comprise a non-volatile memory, such as at least two disk memories.

In the embodiment of the present application, the bus 18 is used to connect the communication interface 17, the processor 15, and the memory 16 and to communicate among these devices.

In an embodiment of the present application, the memory 16 is used for storing instructions and data.

Further, in an embodiment of the present application, the processor 15 is configured to determine a data set according to raw data;

obtaining a similarity dictionary according to the similarity parameter;

In practical applications, the Memory 16 may be a volatile Memory (volatile Memory), such as a Random-Access Memory (RAM); or a non-volatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 15.

In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.

Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiment of the application provides a recommendation system, which determines a data set according to original data; performing distributed computing processing according to the data set to obtain a similarity parameter; obtaining a similarity dictionary according to the similarity parameters; and performing recommendation processing according to the similarity dictionary. That is to say, in the embodiment of the application, a data set may be determined according to original data, and then the data set is subjected to distributed computing to obtain similarity, and a similarity dictionary is obtained according to the similarity obtained after the distributed computing is performed, and finally recommendation processing is performed according to the similarity dictionary.

Specifically, the program instructions corresponding to a recommended method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, a usb flash disk, or the like, and when the program instructions corresponding to a recommended method in the storage medium are read or executed by an electronic device, the method includes the following steps:

determining a data set according to the original data;

obtaining a similarity dictionary according to the similarity parameter;

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of implementations of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks in the flowchart and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims

1. A recommendation method, characterized in that the method comprises:

determining a data set according to the original data;

obtaining a similarity dictionary according to the similarity parameter;

2. The method of claim 1, wherein determining the set of data from the raw data comprises:

integrating the original data by taking the user dimensionality in the original data as a base to obtain a first data set; alternatively, the first and second liquid crystal display panels may be,

and integrating the original data by taking the item dimension in the original data as a base to obtain a second data set.

3. The method of claim 2, wherein the performing distributed computation processing according to the data set to obtain a similarity parameter comprises:

calculating the ratio of the scores of one user to different articles in the first data set by using a preset algorithm to obtain a first similarity;

and carrying out minimum value processing on the first similarity to obtain the similarity parameter.

4. The method of claim 2, wherein the performing distributed computation processing according to the data set to obtain a similarity parameter comprises:

calculating ratios of different scores of one item and different users for the item in the second data set by using a preset algorithm to obtain a second similarity;

and carrying out minimum value processing on the second similarity to obtain the similarity parameter.

5. The method according to claim 1, wherein the obtaining a similarity dictionary according to the similarity parameter comprises:

carrying out summation operation on the similarity parameters to obtain total similarity;

summing the evaluation numbers corresponding to the similarity parameters to obtain the total evaluation number;

and performing division operation on the total similarity and the total evaluation number to obtain the similarity dictionary.

6. The method according to claim 1, wherein after obtaining the similarity dictionary according to the similarity parameter, the method further comprises:

obtaining incremental data;

obtaining an increment similarity dictionary according to the increment data;

and updating the similarity dictionary according to the increment similarity dictionary.

7. The method of claim 1, wherein the performing recommendation processing according to the similarity dictionary comprises:

acquiring an evaluated set and an unevaluated set;

obtaining the scores of the unevaluated set according to the similarity dictionary and the evaluated set;

and performing the recommendation processing according to the scores of the non-evaluation set.

8. The method of claim 6, wherein said obtaining scores for said unevaluated set from said similarity dictionary and said evaluated set comprises:

performing query processing on the similarity dictionary to obtain recommendation similarity between the evaluated set and the non-evaluated set and scores of the evaluated set;

and obtaining the scores of the non-evaluated set according to the recommendation similarity and the scores of the evaluated set.

9. A recommendation system, characterized in that the recommendation system comprises a determination unit, an acquisition unit and a recommendation unit,

10. A recommendation system, further comprising a processor, a memory storing instructions executable by the processor, the instructions when executed by the processor implementing the method of any of claims 1-8.

11. A computer-readable storage medium, having stored thereon a program for use in a recommendation system, which program, when executed by a processor, carries out the method of any one of claims 1-8.