WO2018205853A1

WO2018205853A1 - Distributed computing system and method and storage medium

Info

Publication number: WO2018205853A1
Application number: PCT/CN2018/084870
Authority: WO
Inventors: 谭蕴琨; 余乐乐
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2017-05-10
Filing date: 2018-04-27
Publication date: 2018-11-15
Also published as: CN108874529B; CN108874529A

Abstract

Disclosed in the present invention are a distributed computing system and method, and a storage medium. The distributed computing system comprises: at least two computing nodes, and at least two parameter service nodes. The computing nodes initializes, according to users comprised in a training data subset, vectors corresponding to users in a user matrix to obtain a user submatrix composed of the initialized vectors; the computing nodes iteratively compute the user submatrix and a project submatrix according to the training data subset and the project submatrix acquired from the parameter service nodes, and transmit a project submatrix obtained after each iterative computation to the corresponding parameter service nodes; the parameter service nodes initialize vectors corresponding to a part of projects to obtain a project submatrix composed of the initialized vectors; the parameter service nodes update, according to the project submatrix transmitted by the computing nodes, the project submatrix stored by the parameter service nodes.

Description

Distributed computing system, method and storage medium

Cross-reference to related applications

The present application is based on a Chinese patent application filed on Jan. 10, 2017, the entire disclosure of which is hereby incorporated by reference.

Technical field

The present invention relates to computer technology, and in particular, to a distributed computing system, method, and storage medium.

Background technique

Artificial intelligence has been rapidly developed and widely used in various industries. Taking the application scenario of the product recommendation as an example, according to the user's behavior data, a machine learning method is used to train a model for predicting the user's rating on different commodities, so that the user's ranking of different products can be calculated, and the high-scoring product can be selected. Recommend to users, can help users quickly locate the products of interest, and achieve accurate and efficient product marketing.

For example, current product recommendations rely on big data processing techniques, and the collected behavioral data needs to be analyzed and processed to train a model with scoring prediction performance, which is the resources of the computing system that undertakes the training task (including memory resources, communication). Resources, etc.) overheads place high demands.

However, the resources of a single node in the computing system provided by the related art are limited, and the upgrade of the computing system often has hysteresis, and the current situation of limited resources of a single node and the calculation of model training require high resource overhead, which becomes difficult to solve. technical problem.

Summary of the invention

Embodiments of the present invention are directed to providing a distributed computing system, method, and storage medium capable of performing computing tasks in a resource intensive manner.

The technical solution of the embodiment of the present invention is implemented as follows:

In a first aspect, an embodiment of the present invention provides a distributed computing system, including:

At least two computing nodes and at least two parameter service nodes; wherein

The computing node is configured to initialize a vector corresponding to the user in the user matrix according to a user included in the subset of the training data, to obtain a user sub-matrix formed by the initialized vector;

The computing node is configured to iteratively calculate the user sub-matrix and the item sub-matrix according to the subset of the training data, the item sub-matrix obtained from the parameter service node, and obtain the calculation after each iteration The project sub-matrix is transmitted to the corresponding parameter service node;

The parameter service node is configured to initialize a vector corresponding to the partial item, and obtain a project sub-matrix composed of the initialized vector, where the partial item is a part of the items included in the training data;

The parameter service node is configured to update an item sub-matrix stored by the parameter service node according to an item sub-matrix transmitted by the computing node;

The user sub-matrix stored by each of the computing nodes is used to combine to obtain a user matrix, and the item sub-matrix stored by each parameter service node is used to combine to obtain an item matrix;

A vector corresponding to the target user in the user matrix and a vector of the corresponding target item in the item matrix are used to obtain a score of the target user for the target item.

In a second aspect, an embodiment of the present invention provides a distributed computing method, which is applied to a distributed computing system including at least two computing nodes and at least two parameter service nodes;

The computing node initializes a vector corresponding to the user in the user matrix according to a user included in the subset of the training data, and obtains a user sub-matrix composed of the initialized vector;

The computing node iteratively calculates the user sub-matrix and the item sub-matrix according to the subset of the training data, the item sub-matrix obtained from the parameter service node, and the item obtained after each iteration calculation a matrix, transmitted to the corresponding parameter service node;

The parameter service node initializes a vector corresponding to the partial item, and obtains a project sub-matrix composed of the initialized vector, where the partial item is a part of the items included in the training data;

And the parameter service node updates the item sub-matrix stored by the parameter service node according to the item sub-matrix transmitted by the computing node;

In a third aspect, an embodiment of the present invention provides a storage medium storing an executable program, and when the executable program is executed by a processor, the following operations are implemented:

When in the computing node mode, according to the user included in the subset of the training data, the vector corresponding to the user in the user matrix is initialized, and a user sub-matrix composed of the initialized vector is obtained;

When in the computing node mode, iteratively calculates the user sub-matrix and the item sub-matrix according to the subset of the training data, the item sub-matrix obtained from the parameter service node, and obtains after each iteration calculation The project sub-matrix is transmitted to the corresponding parameter service node;

When in the parameter service node mode, the vector corresponding to the partial item is initialized, and a project sub-matrix composed of the initialized vector is obtained, and the partial item is a part of the items included in the training data;

When in the parameter service node mode, the item sub-matrix stored by the parameter service node is updated according to the item sub-matrix transmitted by the computing node.

Embodiments of the present invention have the following beneficial effects:

1) Distributing the project matrix and the user matrix in a sub-matrix manner, which reduces the occupation of the memory space of a single node, and overcomes the limitation that the related technology needs to be able to store a complete user matrix and a project matrix for a single-node memory. Implement large-scale computing in distributed computing systems with limited memory resources;

2) The communication overhead of a single node is effectively reduced, which eliminates the situation that the communication overhead encounters the network bandwidth bottleneck, which is beneficial to the equalization of the network communication load, avoids the situation that the computing node is idle due to waiting data, and improves the calculation efficiency;

3) A plurality of computing nodes perform iterative calculation on the stored user sub-matrix and the item sub-matrix based on the subset of the training data. On the one hand, the computational complexity is reduced, thereby reducing the overhead of computing resources for a single node, and reducing the single The computational complexity of the nodes, on the other hand, the way in which the computational nodes are paralleled effectively improves the computational efficiency.

DRAWINGS

1 is an optional schematic diagram of decomposing a scoring matrix into a user matrix and an item matrix according to a matrix decomposition model according to an embodiment of the present invention;

2 is an optional structural diagram of a big data platform according to an embodiment of the present invention;

3 is a schematic diagram of decomposing a scoring matrix into a user matrix and an item matrix according to a matrix decomposition model according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a distributed computing system 200 according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a distributed computing system 200 according to an embodiment of the present invention;

6 is a schematic diagram of an optional process when the distributed computing system 200 shown in FIG. 5 is used for model training according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an optional process when the distributed computing system 200 shown in FIG. 5 is used for model training according to an embodiment of the present invention;

8-1 is an optional schematic diagram of parameters of a transmission item matrix between a parameter service node and a computing node according to an embodiment of the present invention;

8-2 is an optional schematic diagram of parameters of a transmission item matrix between a parameter service node and a computing node according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of processing of a calculation node sub-batch and a parameter service node transmission item matrix according to an embodiment of the present invention; FIG.

10 is a schematic flowchart of applying to a distributed computing method according to an embodiment of the present invention;

FIG. 11 is an optional schematic flowchart of a model for training a predictive score according to an embodiment of the present invention; FIG.

FIG. 12 is a schematic diagram of an optional application scenario of the big data platform 200 shown in FIG. 2 according to an embodiment of the present invention.

detailed description

The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Before the present invention is further described in detail, the nouns and terms involved in the embodiments of the present invention are explained, and the nouns and terms involved in the embodiments of the present invention are applied to the following explanations.

1) Behavior data, including user (such as identification information in the form of serial number, etc.), items that users generate scoring behavior (such as goods, articles, applications, etc., can be described by serial number, etc.), and user interest in the project (also referred to as scoring in this article), behavior data of multiple users constitute a behavior data set (also referred to as training data in this paper); for online products, for example, the scoring behavior includes: browsing products, collecting items, purchasing goods and comments commodity.

2) The model, the matrix decomposition model, also known as the Latent Factor Model (LFM), is used to initialize the scoring matrix and decompose the scoring matrix used to represent the training data to form the product of the user matrix and the item matrix. model.

3) Matrix Factorization (MF, Matrix Factorization), the training data is represented by the scoring matrix Y, assuming that the scoring data relates to the scores of M users for N different items, and each row vector of the scoring matrix Y corresponds to one user for different items. Scoring, each column vector of the scoring matrix Y corresponds to the scores of different users obtained by one item, and the matrix decomposition model is used to initialize the scoring matrix, that is, the feature of K (preset value) dimensions is introduced in the scoring matrix, so that the score is obtained The matrix Y is initialized according to the matrix decomposition model as a product of a user-characteristic matrix (referred to as user matrix) U and a feature-item matrix V (abbreviated as an item matrix).

Since the training data is the user's behavior data, in reality, it is impossible for the user to collect the scores of all the items, and the missing values in the scoring matrix are predicted, that is, the user's score on the ungraded items is predicted, and the matrix decomposition model is The problem of predicting missing values is transformed into the problem of solving the parameters of the user matrix and the parameters of the item matrix, that is, solving the parameter vector of the user matrix in K dimensions and the parameter vector of the item matrix in K dimensions.

For example, referring to FIG. 1, FIG. 1 is an optional schematic diagram of decomposing a scoring matrix into a user matrix and an item matrix according to a matrix decomposition model according to an embodiment of the present invention, for a given training data (including all users, All projects, as well as the scores of each user's scoring behavior), model the behavioral data using a latent semantic model to obtain the model shown in Figure 1 (assuming there are 3 users and 4 projects in the behavioral data) The score is decomposed into a user matrix (representing the interest of three users for the features of the three dimensions) and an item matrix (representing the weight of the features of the four items in the three dimensions).

Taking the score y _{11 of the} user 1 for the item 1 as an example, it can be expressed as: a row vector (u ₁₁ , u ₁₂ , u ₁₃ ) corresponding to the user 1 in the user matrix and a column vector (q ₁₁ of the corresponding item 1 in the item matrix, The product of q ₂₁ , q ₃₁ ).

4) Training, ie model training, iteratively calculates the parameters of the model using the training data, ie iteratively calculates the parameter u _{ik of the} user matrix U and the parameter v _kj in the item matrix V until the iterative suspension condition is met, eg the iterative calculation reaches a predetermined number of times or Parameter convergence.

5) Distributed computing: the training data is decomposed into multiple subsets and distributed to multiple computing nodes in the distributed computing system. The computing nodes calculate the parameters of the model in parallel based on the subset of the assigned training data, since the computing task will be Assigned to multiple computing nodes to complete, so distributed computing can expand the scale of computing and improve the efficiency of training.

6) Parameter service node architecture: A distributed computing system that implements machine learning architecture in distributed computing. It consists of a parameter service node (PS, Parameter Server) and a compute node (Worker). The number of each node is at least two. One.

7) Parameter service node: at least two parameter service nodes are included in the distributed computing system, and each parameter service node may be implemented by one or more servers, and may also be referred to as a parameter service node when implemented by a server. Responsible for storing and updating the parameters of the sub-matrix of the item matrix (hereinafter referred to as the item sub-matrix), and the parameter service node provides the service node with a parameter reading and updating the parameters of the item matrix.

8) Compute node: Each computing node can be implemented by one server or multiple servers, and the parameter service node architecture includes multiple computing nodes. Each compute node is assigned to a subset of the training data, the subset includes behavior data of some users, and the parameters of the project matrix are obtained from the parameter service node (the parameter service node always stores the latest parameters of the project matrix), and the training is used. The data update user parameter corresponding to the parameters of the above part of the user, and the update value of the parameter of the partial item of the item matrix (that is, the item of the partial user generating the scoring behavior), and then the updated value of the parameter of the item matrix is transmitted to the parameter service. The node, the parameter service node, in combination with the updated value of the parameter transmitted by each computing node, updates the item matrix stored locally by the parameter service node.

9) Spark: A distributed computing architecture based on the model training implemented by the Map-Reduce node, which involves mapping nodes and protocol nodes. The mapping nodes are responsible for the filtering and distribution of data, and the protocol nodes are responsible for the calculation and merging of data.

The big data platform is widely used for the processing of user's behavior data collected by various industries, data cleaning and screening when necessary, and then a matrix decomposition model based on behavior data to predict the user's score on different items. The score reflects the user's The degree of interest of the project, in the business scenario recommended by the project, is recommended to the user according to the ranking of the project from high to low, and can support targeted production/marketing activities to achieve high efficiency and cost of production/marketing. Saving.

For the training to obtain the above model, as an example of the training data training model, refer to FIG. 2, which is an optional structural diagram of the big data platform provided by the embodiment of the present invention, and relates to the distributed computing system 200. The data acquisition system 300, the real-time computing system 400, the offline computing system 500, and the resource scheduling 600 are described below.

The data collection system 300 is configured to collect training data of the training model (for example, for project recommendation, the training data may include: all users, all projects, users browsing, purchasing, paying attention, putting in a shopping cart, etc. A list of items of various behaviors), and appropriate processing. It can be understood that for training data, appropriate processing may include: data cleaning and screening to filter out noise data (such as apparently non-real data outside the predetermined interval), beyond the validity period (such as data collected six months ago). ), and to make the training data conform to the desired distribution and the like.

In an alternative embodiment of the invention, a mechanism for user authorization and application authorization is provided to protect privacy in the context of employing various behavioral data of the user.

The distributed computing system 200 is configured to train the model in a manner that iteratively calculates the parameters of the model based on the training data until the iterative abort condition is met.

The real-time computing system 400 is configured to implement the distributed computing system 200 to train the machine learning model in a real-time manner (also referred to as online mode), one or a batch of records in the training data (each record corresponds to one user, including the user to different objects) When submitted to the distributed computing system 200, the distributed computing system 200 loads the received one or a batch of records in real time in memory and performs training based on the training results (eg, between the true and predicted values of the score) The degree of difference) the updated parameters of the real-time calculation model.

The offline computing system 500 is configured to implement the distributed computing system 200 to adopt an offline mode training model. The distributed computing system 200 loads the newly received training data and the received historical training data in the memory to iteratively calculate the updated model. parameter.

The resource scheduling 600 is configured to allocate computing resources such as a central processing unit (CPU) and a graphics processing unit (GPU) to each of the foregoing systems, and allocate bandwidth resources for communication and the like.

In the case of the distributed computing system 200 training model, taking the aforementioned model for scoring as an example, it is necessary to collect the user's scores on different items (such as items), and form user's scoring data for different items. An example of the scoring data is as follows Table 1 shows:

	项目1Item 1	项目1Item 1	项目1Item 1	项目1Item 1	……......
用户1User 1	44	55	11		……......
用户2User 2			33	44	……......
用户3User 3		22			……......
用户4User 4	55		11	11	……......
用户5User 5					……......
用户6User 6	33	44			……......
……......	……......	……......	……......	……......	……......

Table 1 User-item rating data

For the scoring data as shown in Table 1, based on the scoring data, a scoring matrix composed of data of all users, all items, and users scoring different items can be established, and of course, there is inevitably a missing value in the scoring matrix; The matrix decomposition model is initialized, which is expressed as: the product of the user-character matrix and the feature-item matrix (referred to as the item matrix, which represents the scores of users with different features for different items).

As an example of scoring matrix decomposition, referring to FIG. 3, FIG. 3 is a schematic diagram of decomposing a scoring matrix into a user matrix and an item matrix according to a matrix decomposition model according to an embodiment of the present invention, and setting scoring data relates to M users' ratings of N items. When the scoring matrix Y is used to represent the scoring data, the dimension of Y is: M×N; the scoring matrix is initialized by using the matrix decomposition model, that is, the features of K dimensions are introduced into the scoring matrix, thereby decomposing the scoring matrix Y into user-features. The form of the product of the matrix (referred to as the user matrix) U and the feature-item matrix V (referred to as the item matrix), namely:

Y _M×N ≈U _M×K ×V _K×N (1)

The dimension of Y is: M×N, y _ij represents the score of the jth item by the i-th user, and y _{ij is} expressed as:

Where u _ik represents the score of the user k for the feature k, v _kj represents the weight of the item j in the feature k, k takes the value 1 ≤ k ≤ K, and the values of i and j are positive integers, 1 ≤ i ≤ M, 1 ≤ j ≤ N.

According to the matrix decomposition model, the scoring matrix Y is initialized to the product of the user matrix U and the item matrix V. The dimension of the user matrix U is M×K, and the row vector u _i is a K-dimensional vector corresponding to the user i to the K dimensions. The score of the feature; the dimension of the item matrix V is K×N, each column corresponds to a K-dimensional column vector v _j , which represents the weight of the item j in K dimensions; K is the dimension of the feature specified in the matrix decomposition, user i pairs The score y _ij of item j is the product of u _i and v _j .

The actual collection user's scoring data involves a large number of items, and each user tends to only score some items, so the scoring matrix is sparse, that is, the value of some elements in the scoring matrix is missing (represented by 0). Referred to as the missing value, according to the above formula (2), the missing value in the scoring matrix can be predicted, thereby converting the prediction of the missing value into the parameter u _ik for solving the user matrix U and the parameter v _kj in the item matrix V. That is, the problem of solving the parameter vector u _{i of the} user matrix U in K dimensions and the parameter vector v _j of the item matrix V in K dimensions.

For example, the product of the user vector u _i and the item vector v _j is used as the predicted value of the score of the user i on the item j, and is recorded as

The real value of the score of user i on item j is y _ij , and the difference between the predicted value and the true value is recorded as e _ij , that is:

e _ij =y _ij -u _i ·v _j (3)

Then, the problem of solving the model parameters is transformed into the problem of minimizing e _ij . Based on this, the objective function is used to represent the difference between the predicted value and the true value of the model for the score. The objective function is as shown in formula (4):

In order to prevent the model from over-fitting the training data, a regular term is introduced in the objective function. The objective function is shown in formula (5):

Where β/2 is the weight of the regular term, since the user i's score y _ij for the item j is the multiplied _{integral solution} of u _i and v _j into K dimensions, then the objective function of the matrix decomposition algorithm can be expressed as:

The process of iteratively training the model is converted into a process of solving the value (ie, parameter) of u _ik , v _kj when the above objective function converges, for example, using the gradient descent method for the above objective function, that is, the negative gradient of the objective function The direction is solved by solving u _ik and v _kj , and the update formula for u _ik and v _kj is: u _ik ←u _ik +2αe _ij v _kj (7.1)

v _kj ←v _kj +2αe _ij u _ik (7.2)

Where α is the step size, indicating the learning rate. In practical applications, the number of iterations of training is reached a predetermined number of times, or the value of the objective function is lower than a predetermined value (ie, the objective function converges) as the abort condition of the iterative training, and the output training is obtained. The parameters of the model, according to the parameters, combined with the formula (2), can calculate the user's score for different items, and select a certain number of items with the highest score for recommendation.

Referring to FIG. 4, FIG. 4 is a schematic structural diagram of a distributed computing system 200 according to an embodiment of the present invention. The distributed matrix decomposition and training are implemented by using a Map-Reduce distributed architecture, and the model is stored in a driver. The node 210 can be implemented by one server (or multiple servers), each of the Executor nodes can be implemented by one server (or multiple servers), and the driving node 210 transmits the item matrix and the user matrix to the executor node 220. Thereafter, the executor node 220 performs training according to the received user matrix and the item matrix, calculates an updated value of the parameter of the model, and then transmits it to the driving node 210, and the driving node 210 updates the updated value of the parameter transmitted by all the executor nodes 220, and updates The parameters of the locally stored model are then broadcast to all of the actuator nodes 220.

It can be seen that the following problems exist:

1) The matrix decomposition model can easily reach a large scale. Take the training data provided by the netfliex site as an example, involving 17771 projects and 480,000 users. When taking K=1000, the dimension of the model is as high as 5×10 ⁸ . The Spark distributed computing architecture maintains all the parameters of the model on a single drive node. The physical limitations of the memory of the drive node result in the inability to train complex models.

2) In the process of mapping/provisioning for the training model, each actuator node transmits the parameters of the model to the driver node, and the driver nodes are aggregated and broadcast to all actuator nodes, resulting in a relationship between the driver node and the actuator node. A large communication overhead, the communication node and multiple actuator nodes communicate to encounter bandwidth bottlenecks, and the transmission of updated values of model parameters leads to low communication efficiency.

In an alternative embodiment of the present invention, a distributed computing architecture based on a parameter service node is provided. The user is used to decompose the training data to obtain a subset of the training data, and the plurality of computing nodes are parallel based on the subset of the training data. The training model is then combined with the parameters of the model calculated by each computing node by the parameter service node.

For example, referring to FIG. 5, FIG. 5 is an optional structural diagram of a distributed computing system 200 according to an embodiment of the present invention. In FIG. 5, the parameter service node 230, the control node 240, the computing node 250, and the scheduling layer 260 are involved. And storage layer 270.

The control node 240 is configured to control the overall operation of the parameter service node 230 and the computing node 250 to ensure the orderly operation of the operation, including: dividing the training data into subsets by using the user as a dimension, and each subset includes a part of users (ie, training data A portion of all users involved) assigns a subset of the training data to each computing node 250 and controls the orderly execution of the operations of each computing node and parameter service node 230. It will be appreciated that in an alternative embodiment, the set control node 240 may be omitted from the distributed computing system 200 illustrated in FIG. 5, coupling the functionality of the control node 240 into the parameter service node 230.

The number of parameter service nodes 230 and compute nodes 250 are both multiple, and each parameter service node 230 is configured to store sub-matrices of the item matrix V (hereinafter referred to as item sub-matrices); each compute node 250 is configured to be stored A sub-matrix of the user matrix U (hereinafter referred to as a user sub-matrix), based on the item sub-matrix obtained from the parameter service node 230, and iteratively calculates the parameters of the stored user sub-matrix according to the subset of the assigned training data. The updated value, and the updated value of the obtained parameter of the item sub-matrix, after each iteration calculation is completed, the updated value of the parameter of the item sub-matrix is returned (of course, the updated parameter can also be directly returned) corresponding parameter service Node 230.

The scheduling layer 260 is an abstract representation of the scheduling functions of the distributed computing system 200, involving the allocation of computing resources (such as CPU and GPU) of the control node 240, the parameter service node 230, and the computing node 250, as well as the control node 240, the parameter service node. The allocation of communication resources for communication between the computing node 250 and the computing node 250.

The storage layer 270 is an abstract representation of the storage resources of the distributed computing system 200, and relates to memory resources of the above-described nodes and non-volatile storage resources.

It can be understood that the distributed computing system 200 shown in FIG. 5 can be implemented by a cluster of servers. The servers in the server cluster can be separated in physical locations, or can be deployed in the same physical location, through various communications such as optical cables and cables. Way to connect.

For each node shown in FIG. 5, it may have a one-to-one correspondence with servers in the cluster. Of course, multiple nodes may be deployed in one server according to the actual processing capability of the server; in particular, for servers in the cluster. In the hardware and software, in the optional embodiment of the present invention, a virtual machine environment can be set in the cluster, and the node shown in FIG. 5 is deployed in the virtual machine environment, which facilitates rapid deployment and migration of the node.

FIG. 6 is a schematic diagram of a distributed computing system 200 shown in FIG. An optional processing diagram during training (with part of the structure in FIG. 5 omitted), showing a distributed computing architecture based on parameter service nodes, wherein multiple parameter service nodes 230 and multiple computing nodes 250 are involved, Explain separately.

The parameter service node 230 is configured to store the item matrix V, and each parameter service node 230 stores a project sub-matrix composed of vectors of corresponding partial items in the item matrix V, denoted as V-part, and the item stored by the different parameter service node 230 The items corresponding to the matrix are different, and the intersection of the items corresponding to the item sub-matrix stored by all parameter service nodes 230 is all the items involved in the training data.

Since the sub-matrix stored by each parameter service node 230 only corresponds to a part of the project, the technical effect of adaptively adjusting the scale of the items in the model can be realized by adjusting the number of parameter service nodes 230, which is advantageous for adjusting the distributed according to service requirements. The size of the parameter service node 230 in the computing system 200.

For example, when the scale of the project needs to be expanded, the number of parameter service nodes 230 may be increased in the distributed computing system 200, and the newly added parameter service node 230 is responsible for storing the vector of the corresponding new item in the project matrix V; When it is no longer necessary to predict the score of certain items, it can be implemented by revoking the parameter service node 230 storing the corresponding sub-matrix.

The computing node 250 is configured to utilize a subset of the assigned training data, the subset including behavior data of a portion of the users (ie, some of the users involved in the training data), during each iterative calculation, the computing node 250 The parameters of the item sub-matrix V are sequentially acquired from each parameter service node 230, and the parameters of the item sub-matrix acquired from any parameter service node 230 are combined with the assigned subset, and the user sub-item is calculated according to the above-mentioned update formula (7.1). The parameter of the matrix U-part (that is, the matrix of the user matrix U corresponding to the vector of the partial user) is updated locally, and the user sub-matrix U-part is updated locally; then the parameter of the item sub-matrix V-part is calculated according to the formula (7.2) The updated value transmits the updated value of the parameter of the item sub-matrix V-part to the parameter service node 230 storing the corresponding item sub-matrix for updating.

It can be understood that since each computing node 250 processes only the training data of some users, it is possible to achieve the technical effect of adaptively adjusting the user scale by adjusting the number of computing nodes 250. For example, when the scale of the user needs to be expanded, the number of computing nodes 250 may be increased in the distributed computing system 200, and the newly added computing node 250 is responsible for storing and calculating the sub-dimension of the corresponding new user in the user matrix U. Matrix; for the same reason, when it is no longer necessary to predict the score of some users for the project, it can be realized by revoking the computing node 250 storing the sub-matrix of the corresponding user.

The implementation process of the training model will be described below.

The size of the matrix decomposition model = (number of users + number of projects) × K, the scale of the model in actual applications will rise to hundreds of millions, or even one billion or ten billion, in the embodiment of the present invention, the distributed computing architecture using parameter service nodes, The dimension of the model stored and calculated by the computing node is reduced, thereby reducing the network communication overhead caused by the transmission model parameters between the computing node and the parameter service node, improving the network transmission efficiency, and supporting the adjustment of the parameter service node and the computing node. The number, the linear expansion of the support model scale, mainly involves the following aspects.

1) Training data division

The training data is processed into a format of "user ID, item ID: rating, ..., item: rating", that is, all the scores of one user are stored in one record, and the training data is divided into dimensions by user (for example, evenly divided). a plurality of subsets, each subset comprising a plurality of user records, the subset being assigned to a plurality of compute nodes 250; for example, a subset of the training data is evenly distributed to each compute node based on the state of the computational power balance of each compute node 250 Or, according to the case where the calculation power of each calculation node 250 is disparity (the calculation power ratio exceeds the ratio threshold), a subset of the training data of the corresponding ratio is allocated according to the ratio of the calculation power.

2) Model storage

According to the foregoing formulas (7.1) and (7.2), the update of the item sub-matrix and the user sub-matrix is mutually dependent. For each iteration calculation, it is first necessary to calculate the updated value of the parameter of the user sub-matrix using the parameters of the item sub-matrix (it can be understood, Since each iteration calculation is to iterate an update value based on the original value of the parameter, therefore, in this paper, for the update value of the calculation parameter, and the calculation of the updated parameter, you can make no distinction), and then use the user. The updated value of the parameter of the sub-matrix calculates the updated value of the parameter of the item sub-matrix, so before the iteration begins, the computing node needs to obtain the parameters of the item sub-matrix from the parameter service node through the network, and the computing node needs to serve the parameter through the network after the iteration ends. The node transmits the updated value of the parameters of the project submatrix.

Given that in most application scenarios, the number of users involved in the training data far exceeds the number of projects, taking netfliex training data as an example, the number of users involved is 27 times the number of projects. Therefore, in order to reduce the communication overhead caused by the transmission parameters between the computing node 250 and the parameter service node 230, the item sub-matrix is stored by the parameter service node 230, and the computing user sub-matrix is stored by the computing node 250, so that in each iterative calculation When the computing node 250 calculates the updated value of the parameter of the user sub-matrix, it only needs to obtain the parameters of the item sub-matrix from each parameter service node 250, and after the iterative calculation ends, return the updated parameters of the item sub-matrix to the corresponding item. The parameter service node 230 of the sub-matrix updates the item sub-matrix by the parameter service node 230.

It can be seen that only the parameters of the item matrix need to be transmitted between the parameter service node 230 and the computing node 250, and the user matrix U does not need to be transmitted. Since V is less than U orders of magnitude, the parameter service node 230 and the computing node 250 are significantly reduced. Communication overhead between.

3) Model calculation

The update formula of the component vector u _i in the user matrix shown by the formula (7.1) in the dimension u _ik of the dimension k shows that the calculation of the parameter is only related to the user's score, and the vectors corresponding to different users in the user matrix are independent of each other. The user matrix U is divided into a plurality of sub-matrices according to the user dimension, correspondingly stored in the plurality of computing nodes 250, and the training data allocated by each computing node 250 calculates the updated value of the stored parameters of the user sub-matrix, the user sub-matrix The dimension is: the number of users involved in the training data to which the compute node 250 is assigned x K.

Taking the gradient descent method as an example, first, the control node 240 divides the training data, assigns a subset of the training data to each computing node 250, initializes the user matrix U and the item matrix V, and then iterates through multiple trainings, each iteration training. Each computing node 250 performs the following operations in parallel:

Referring to FIG. 7, FIG. 7 is an optional processing diagram of the distributed computing system 200 shown in FIG. 5, which is configured as a model training, and obtains a corresponding parameter service node from each parameter service node 230. 230 stores the parameters of the item sub-matrix. According to the foregoing formula (7.1), the calculation node 250 calculates the updated parameters of the locally stored user sub-matrix U-part; and then calculates the parameter update of the item sub-matrix according to the formula (7.2). The value is transmitted to the parameter service node 230 storing the corresponding item sub-matrix, and the parameter storage node 230 updates the locally stored item sub-matrix.

Since the calculation node 250 calculates the updated value of the vector of the corresponding item in the item sub-matrix, the calculation result is only related to the user's score for the item, and the subset of the training data allocated by the calculation node 250 may only include the part of the item sub-matrix. The score of the item, so only the updated value corresponding to the vector corresponding to the scored item in the item sub-matrix is decreased according to the maximum gradient, and the gradient value calculated for the item without scoring is 0, which is equivalent to no update.

In the optional embodiment of the present invention, when the computing node 250 obtains the item sub-matrix from the parameter service node 230, only the vector corresponding to the scored item in the item sub-matrix stored by the parameter service node 230 may be acquired, and V-sub, according to formula (7.1), combining the subset of the assigned training data and the vector corresponding to the scored item in the item sub-matrix, calculating the updated value of the vector corresponding to some users in the locally stored user sub-matrix, Some of the users are users who generate scoring behavior for the scored items in the project submatrix;

According to formula (7.2), combined with the updated value of the partial user correspondence vector in the user sub-matrix, the updated value of the vector corresponding to the scored item in the item sub-matrix is calculated, and the parameter service node 230 (ie, the parameter service storing the corresponding item sub-matrix) is calculated. The node 230) returns the updated value of the vector of the scored item, since the vector corresponding to the unscore item no longer needs to be transmitted, thus saving the communication overhead caused by transmitting the vector of the unrated item.

For example, referring to FIG. 8-1, FIG. 8-1 is an optional schematic diagram of parameters of a transmission item matrix between a parameter service node 1 and a computing node according to an embodiment of the present invention, where a distributed computing system is provided. 4 computing nodes, computing node 1 to computing node 4 are correspondingly assigned to different subsets of training data, and correspondingly stored user sub-matrices are: U _part1 , U _part2 , U part ₃ and U _{part 4} ; computing node 1 to 4 slave parameters when the serving node 1 acquires parameters submatrix V _part1 project respectively from a centralized service node parameter acquisition sub items corresponding to the item has a sub-matrix of the vector V _part1 score.

For example, the computing node 1 determines, according to the subset of the assigned training data, the scored items in the subset, and obtains the corresponding vector of the scored items in the item sub-matrix V _part1 from the parameter service node, to parameter service node 1 For example, the obtained vector of the scored item in the item sub-matrix V _part1 is denoted as V _part1-sub1 ; according to formula (7.1), a subset of the assigned training data and V _{part1-sub1 are} calculated U The updated value of the parameter of _part1 , for example, when calculating the updated value of the vector corresponding to the partial user in U _part1 , the partial user is the user who generates the scoring behavior for the scored item; according to formula (7.2), the partial user corresponding to U _part1 corresponds updated value vector calculating an updated value V _part1-sub1, denoted by ΔV _part1-sub1, transmission ΔV _part1-sub1 to the parameter serving node 1, the parameter serving node 1 based on the updated values of each computing node returns (including computing node 1 returned ΔV _part1-sub4, computing node 2 ΔV _part1-sub2 returned computing node 3 ΔV _part1-sub3 returned computing node 4 ΔV _part1-sub4 returned) updates the local memory Stored project submatrix.

Only one parameter service node 1 is shown in FIG. 8-1. At least two parameter service nodes are disposed in the distributed computing system, and the parameter service node 2 including the storage item sub-matrix V _part2 is taken as an example. 8-2, the computing nodes 1 to 4 also obtain corresponding vectors of the scored items in the item sub-matrix V _part2 from the parameter service node 2, and record them as V _part2-sub1 , V _part2-sub2 , V _part2-sub3, and V _{part2 -sub4} , and perform iterative calculation. Similarly, the parameter service node 2 returns the updated value of the return vector of each computing node (including ΔV _part2-sub4 returned by the computing node 1, ΔV _part2-sub2 returned by the computing node 2, and the return of the computing node 3) the ΔV _part2-sub3, computing node 4 ΔV _part2-sub4 returned) to update the locally stored program sub-matrix V _part2.

For the distributed computing system 200 shown in FIG. 7, when the number of items involved in the training data assigned by the computing node 250 and the value of K are large and the model exceeds a predetermined scale (for example, the model is scaled to billions of times) ), there is a case where the storage space required for the V-sub matrix still exceeds the memory of a single compute node 250.

For this case, since the vectors of the items in the item matrix are independent of each other, the scheme of updating the V-sub matrix in batches can be adopted, so that the parameters of each batch transmission are smaller than the memory of the computing node 250, and the computing node 250 is guaranteed to have Sufficient memory to calculate the updated value of the parameter.

In an alternative embodiment of the present invention, the compute node 250 retrieves the parameters of the V-sub matrix from the parameter service node 230 in batches, sub-batch from the parameter service node 230 based on the scored items assigned to the subset of training data. Obtaining a vector corresponding to a part of the scored items in the V-sub; calculating the stored user sub-in accordance with the formula (7.1), combining the vector of the scored items acquired by each batch, and the subset of the assigned training data The updated value of the parameter of the matrix; according to formula (7.2), combined with the updated value of the parameter of the user sub-matrix, the updated value of the corresponding vector of the scored item is calculated and transmitted to the corresponding parameter service node 230 for the parameter service node 230 to update the local storage. The vector of the item that has been graded in the item submatrix.

For example, referring to FIG. 9, FIG. 9 is a schematic diagram of processing of a calculation node sub-batch and a parameter service node transmission item matrix according to an embodiment of the present invention. In FIG. 9, the training data relates to M users' ratings of N items. The training data is divided into subsets and equally distributed to 4 computing nodes, and the 4 computing nodes correspond to sub-matrices storing the initialized user matrix, which are recorded as U _part1 , U _part2 , U _part3 and U _part4 ;

Each computing node performs such operations in parallel: dividing the scored items in the assigned subset into two batches, and obtaining a batch from the item sub-matrix stored in the parameter service node in each iterative calculation process The vector corresponding to the scored item is recorded as V-sub; according to formula (7.1), combined with the V-sub and the subset of the assigned training data, some users in the user sub-matrix are calculated (ie, the scoring behavior of the scored item is generated) User) corresponding to the updated value of the vector, and then according to formula (7.2), combined with the updated value of the partial user corresponding vector in the user submatrix, the updated value of the scored item corresponding vector in the item submatrix is calculated and transmitted to the parameter service node, and the parameter is transmitted. The service node updates the item matrix stored locally.

The parameters of the sub-matrix of the batch transmission project between the computing node and the parameter service node avoid the limitation of the memory resources of the computing node caused by all the parameters of the one-time transmission project sub-matrix, effectively avoiding the single calculation when training the large-scale model. The memory resource overhead of the node is large.

The following is a description of the process of implementing the model training in the distributed computing system provided by the foregoing embodiments of the present invention. Referring to FIG. 10, FIG. 10 is a schematic diagram of a distributed computing method, which is applicable to at least two applications. a computing system and a distributed computing system of at least two parameter service nodes;

Step 101: The computing node initializes a vector of a corresponding user in the user matrix according to a user included in the subset of the training data, and obtains a user sub-matrix composed of the initialized vector.

In an optional embodiment of the present invention, the distributed control system may further include a control node that divides the training data by the user, and divides the plurality of score data for the different items included in the training data into a plurality of subsets. The plurality of subsets are assigned to the compute nodes; for example, an average allocation or a proportional allocation according to the computational power of the compute nodes may be employed.

Step 102: The parameter service node initializes a vector corresponding to the partial item, and obtains a project sub-matrix composed of the initialized vector, and the partial item is a part of the items included in the training data.

Step 103: The computing node iteratively calculates the user sub-matrix and the item sub-matrix according to the subset of the training data and the item sub-matrix obtained from the parameter service node, and transmits the item sub-matrix calculated by each iteration to the corresponding parameter service node. .

In an optional embodiment of the present invention, when the parameter service node calculates the item sub-matrix in the iterative calculation, the updated value of the item sub-matrix may be calculated, and the updated value of the item sub-matrix is transmitted to the corresponding parameter service node (ie, before the iterative calculation is stored) The parameter service node of the item sub-matrix, the parameter service node calculates new parameters of the item sub-matrix according to the updated value of the item sub-matrix transmitted by the calculation node, and updates the item sub-matrix stored locally by the parameter service node.

In an optional embodiment of the present invention, the computing node initializes a vector of a corresponding user in the user matrix in the following manner, and the computing node determines, according to the assigned subset, the scored items included in the subset, and the items stored from the parameter service node. In the submatrix, obtain a vector corresponding to the scored item;

Correspondingly, the computing node iteratively calculates the user sub-matrix, and the item sub-matrix adopts the following method: iteratively calculates a vector corresponding to some users in the user sub-matrix, and a vector corresponding to the scored item in the item sub-matrix, and some users are included in the subset. a user who has a rating for a graded item;

After the calculation of each iteration of the calculation node, the vector corresponding to the scored item is obtained by iterative calculation and transmitted to the corresponding parameter service node for the parameter service node to update the stored item sub-matrix.

In order to further reduce the communication overhead of the calculation node sub-matrix of the computing node and the parameter service node, the computing node may obtain the vector corresponding to the scored item from the item sub-matrix stored by the parameter service node, and may store the item from the parameter service node. In the matrix, the vector corresponding to the scored item is obtained in batches; the vector corresponding to the corresponding batch user in the user sub-matrix is calculated and the vector corresponding to the scored item of the corresponding batch is calculated, and the corresponding batch user is a batch for some users. The user who has scored the graded item;

After each iteration calculation is completed, the vector corresponding to the scored item of the corresponding batch obtained after each iteration calculation is transmitted to the corresponding parameter service node for the parameter service node to be based on the locally stored item sub-matrix.

For the way of determining the batch, the computing node is determined according to the memory space of the computing node, wherein the storage space occupied by the vector corresponding to the scored item of each batch is smaller than the memory space of the computing node, and the calculation has sufficient resources. Complete the calculation.

It is not difficult to see that, because the calculation node and the parameter service node do not need to transmit the vector corresponding to the unscoring item in the project sub-matrix, the communication between the computing node and the parameter service node is minimized without affecting the iterative calculation. Consumption, for the computing node, the transmission waiting time is further reduced, thereby improving the efficiency of the iterative calculation.

In an optional embodiment of the present invention, the computing node iteratively calculates the user sub-matrix and the item sub-matrix, and calculates the user sub-matrix and the item sub-matrix by using the loss function as the target of the maximum gradient; for example,

In each iterative calculation process, the computing node compares the score prediction value with the score actual value included in the subset of the training data to obtain a predicted difference value; the product of the predicted difference value and the item sub-matrix, and the locally stored user sub- The matrix is superimposed to obtain the updated user sub-matrix; the product of the predicted difference value and the updated user sub-matrix is superimposed with the item sub-matrix to obtain the updated item sub-matrix; when the iterative suspension condition is satisfied, the control node is responsible for outputting the complete model.

In the control node output model, the user sub-matrix stored by each computing node is combined to obtain a user matrix; the item sub-matrix stored by each parameter service node is combined to obtain a project matrix; when it is required to predict the target user's score for the target item, according to The product of the corresponding target user in the user matrix, and the product of the corresponding target item in the item matrix, obtains the target user's score for the target item.

Referring to FIG. 11, FIG. 11 is an optional schematic flowchart of a model for training a predictive score according to an embodiment of the present invention, which is described in conjunction with the distributed computing system shown in FIG.

First explain the parameters of the model involved:

N: number of items

M: number of users

k: the feature vector of the user, the dimension of the feature vector of the item.

Iterm, the sample data in the training data, the sample data includes the user ID, and the user's rating of the item.

IterNum: number of iterations of training

BatchNum: During each iterative training, the computing node 250 acquires the item matrix from the parameter service node 230 in batches, and performs iterative calculation according to each batch acquisition item sub-matrix.

First, initialization;

In step 201, the control node 240 evenly distributes a subset of the training data for each of the computing nodes 250.

In step 202, each computing node 250 performs the following processing in parallel:

Step 2021: Create and initialize a user sub-matrix according to the subset of the assigned training data, and each computing node stores a sub-matrix of the user matrix.

Each row vector of the user sub-matrix corresponds to one user, the row number corresponds to the user's ID, the row vector represents the user's score for different features, and the user sub-matrix includes a vector corresponding to the partial user, and the above-mentioned partial users are allocated to the computing node 250. The users included in the subset.

In step 2022, the scored item is divided into a plurality of batches.

Collect the set of IDs of the scored items in the subset of the assigned training data, and record them as IDset; divide the IDset into multiple subsets, the number is BatchNum, and each subset is recorded as: IDset[1],...,IDset[BatchNum] .

In step 203, the parameter service node 230 creates and initializes a sub-matrix of the N×k-dimensional item matrix, and each parameter service node stores the item sub-matrix.

N is the number of items. Each column vector of the item matrix corresponds to one item. The column number corresponds to the ID of the item, and the column vector indicates the weight of the item in different features.

It should be noted that there is no limitation of the execution order between step 201, step 202 and step 203.

Second, an iterative calculation process;

Iterative IterNum iterative calculations, each iterative calculation process, for each parameter service node 250, perform the following steps:

Step 204: The calculation node 250 obtains the vector corresponding to the scored item from the item sub-matrix stored by the parameter service node 230 in batches.

In each batch, the vector corresponding to IDset[m] is obtained from the parameter service node 230, and the value of m satisfies: 1≤m≤BatchNum, and the parameter service node 250 according to each calculation node 250 for the IDset[m] in the item matrix. The request for the corresponding vector returns to the computing node 250 a vector corresponding to IDset[m] in the item matrix.

Step 205: Update a vector of users in the user sub-matrix that has scored the scored item, and calculate an updated value of the vector corresponding to the scored item in the item sub-matrix.

The updated vector of the user who has scored in IDset[m] in the user sub-matrix stored by the calculation node 250 is updated: u _ik ←u _ik +2αe _ij v _kj ; the updated value of the vector corresponding to IDset[m] is calculated: Δv _kj = _2αe _ij u _ik; then IDset [m] corresponding to vector △ v _kj updated value of the parameter is transmitted to the service node 230.

Step 206: The parameter service node 230 updates the locally stored item sub-matrix according to the updated value of the vector corresponding to the scored item in the item sub-matrix returned by each computing node.

Upon receiving the update value of the vector corresponding to the IDset[m] transmitted by the calculation node 250, the vector corresponding to the update IDset[m] is updated as follows:

v _j ←v _j +Δv _j /Num, Num is the number of compute nodes 250 in the distributed computing system 200.

Step 207: The control node 240 acquires parameters of the user sub-matrix from each computing node 250 and combines to form a user matrix, and acquires parameters of the item matrix from each parameter service node 230 and combines to form an item matrix.

So far, the matrix decomposition model based on the scores of different users in the training data is obtained. According to formula (2), the scores of different users can be calculated. In the business scenario of product recommendation, the product with the highest score can be selected. User recommendation.

Embodiments of the present invention provide a storage medium, including any type of volatile or non-volatile storage device, or a combination thereof. The non-volatile memory may be a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), or an Erasable Programmable Read (EPROM). Only Memory), etc., an executable program is stored in the storage medium, and when the executable program is executed by the processor, the following operations are performed:

When in the computing node mode, the user sub-matrix and the item sub-matrix are iteratively calculated according to a subset of the training data, the item sub-matrix obtained from the parameter service node, and the item sub-matrix obtained after each iteration calculation , transmitted to the corresponding parameter service node;

In an alternative embodiment of the invention, when the executable program is executed by the processor, the following operations are also performed:

When in the control node mode, the scores for the plurality of items included in the training data are divided by the user, and a plurality of subsets of the training data are obtained, and the plurality of subsets are allocated to at least two computing nodes.

When in the control node mode, when the suspension condition of the calculation node iterative calculation is satisfied, the user sub-matrix stored by each computing node is combined to obtain a user matrix; the item sub-matrix stored by each parameter service node is combined to obtain an item matrix;

When in the control node mode, the target user's score for the target item is obtained according to the product of the corresponding target user's vector in the user matrix and the vector of the corresponding target item in the item matrix.

When in the computing node mode, determining the scored items included in the subset according to the assigned subset, and obtaining a vector corresponding to the scored items from the item sub-matrix stored by the parameter service node;

When in the computing node mode, iteratively calculates a vector corresponding to a part of users in the user sub-matrix and a vector corresponding to the scored item in the item sub-matrix, and some users are users of the subset including users who generate a scoring behavior for the scored items;

When in the computing node mode, the vector corresponding to the scored item obtained after each iteration calculation is transmitted to the corresponding parameter service node.

When in the computing node mode, the vector corresponding to the scored item is obtained in batches from the item sub-matrix stored by the parameter service node;

When in the compute node mode, iteratively calculates the vector corresponding to the corresponding batch user in the user sub-matrix and the vector corresponding to the scored item of the corresponding batch, and the corresponding batch user scores the scored items for the batch among the partial users. Behavioral user;

When in the computing node mode, the vector corresponding to the scored item of the corresponding batch obtained after each iteration calculation is transmitted to the corresponding parameter service node.

When in the compute node mode, the number of batches is determined according to the memory space of the compute node, wherein the storage space occupied by the vector corresponding to the scored item of each batch is smaller than the memory space of the compute node.

When in the parameter service node mode, the item sub-matrix stored by the parameter service node is updated according to the vector corresponding to the scored item transmitted by the calculation node.

When in the computing node mode, the score prediction value is compared with the score actual value included in the subset of the training data to obtain a predicted difference value;

When in the computing node mode, the product of the prediction difference and the item sub-matrix is superimposed with the user sub-matrix to obtain an updated user sub-matrix;

When in the compute node mode, the product of the predicted difference and the updated user submatrix is superimposed with the item submatrix to obtain an updated item submatrix.

It can be understood that when the above storage medium is set in the node of the distributed computing system, some nodes are in the computing node mode, and some nodes are in the parameter service grounding mode. As an example, as shown in the data diagram 7, the distributed computing system can be based on the training data. Performing iterative calculation, for the scoring matrix of the training data shown in FIG. 1, the scoring matrix can be decomposed into the product of the user matrix and the item matrix as shown in FIG. 1, and according to the model shown in FIG. 1, the user pair can be calculated. The scores of different items indicate the degree of interest of the user to the project. According to the descending order of the scores, the items of interest to the user can be accurately selected and recommended to the user.

The following is a description of an application scenario. Referring to FIG. 12, FIG. 12 is a schematic diagram of an optional application scenario of the big data platform 200 shown in FIG. 2 according to an embodiment of the present invention. The distributed computing system 200 deployed by the big data platform 100 can employ the architecture of the distributed computing system 200 as shown in FIG.

An online shopping system 700 is shown in FIG. 12. The online shopping system 700 provides a page-based access method to support user access through a browser and a shopping APP. For a user who logs into the online shopping system 700, the online shopping system 700 The behavior data collection function is enabled to collect behavior data in the following forms: user ID, access time, browse product, purchase item, return item, and item rating.

The online shopping system 700 opens the permission of the behavior data to the data collection system 300 of the big data platform 100. The data collection system 300 periodically or irregularly obtains the behavior data of the accessing user of the online shopping system 700, and cleans the behavior data, such as removing The malicious scoring data, and the high score with cheating behavior, construct the training data in the user-oriented dimension of the scoring data, and each record of the training data includes a user ID, a product ID, and a product score.

The training data is submitted to the distributed computing system 200 of the big data platform 100 for iterative calculation. Based on the user's score on the scored products, the user's score on the unrated products is predicted to form a matrix decomposition model as shown in FIG. The user's rating for each item is represented by the product of the vector corresponding to the user in the user matrix and the vector corresponding to the item in the item matrix, and the parameters of the user model and the product model are returned to the online shopping system 700.

The online shopping system 700 can calculate the user's rating of different commodities according to the matrix decomposition model. For example, when the linear shopping system 700 needs to perform online promotion for an item, in order to accurately locate the potential consumer of the product, according to the matrix decomposition model Calculate the predetermined number of users who have the highest rating for the product, and push the promotion information of the product to the user to achieve accurate marketing.

It can be understood that the above-mentioned shopping system 700 can also be replaced with an online APP store to accurately recommend an APP of interest to the user. For example, the APP store can calculate the user's rating (interesting degree) for different APPs according to the matrix decomposition model, according to the calculated calculation. Scoring, the user is pushed to a specific APP; the shopping system 700 described above may also be a social platform system that recommends interested contacts to the user. Next, the social platform system is used to recommend contacts to users as an example.

The social platform system provides a page-based access method to support user access through a browser and a social platform APP. For a user logging in to the social platform system, the social networking system is enabled to open a data collection function, and collect behavior data in the following form: user ID, user Various behavioral data (such as publishing original content, comments, attention information, etc.) reflecting the similarity between users in a social network; or collecting user data in the following forms: gender, age, work, location, and the like.

The social platform system opens data permissions to the data collection system of the big data platform, and the data collection system periodically or irregularly obtains behavior data and/or user data of the accessing user of the social platform system, and cleans the data, such as removing malicious comments, and contacting The person rating data constructs training data in a user dimension, each record of the training data including a first user ID, a second user ID, and a second user rating.

The training data is iteratively calculated by the distributed computing system 200 submitted to the big data platform 100, and based on the score of the scored second user, the user's score on the un-scorsized second user is predicted to form a matrix decomposition model as shown in FIG. In FIG. 1, the first user's score for each second user is represented by a product of a vector corresponding to the first user in the first user matrix and a vector corresponding to the second user in the second user matrix, the first user The parameters of the model and the second user model are returned to the social platform system.

The social platform system can calculate the scores of the first user for different second users according to the matrix decomposition model. For example, the social platform system needs to perform friend recommendation to the first user, in order to accurately locate the second user recommended to the first user, according to The matrix decomposition model calculates a predetermined number of second users that score higher on the second user, and pushes related information of the second user to the first user to implement accurate friend recommendation.

In summary, the embodiments of the present invention have the following beneficial effects:

1) The user matrix is distributed and stored in the user sub-matrix, and the project matrix is distributed and stored in the item sub-matrix, which reduces the occupation of the memory space of the node, and overcomes the related technology for storing the complete user matrix for the single-machine memory. And the limitations of the project matrix, enabling large-scale calculations in distributed computing systems with limited memory;

2) The plurality of computing nodes calculate the stored user sub-matrix and the item sub-matrix obtained from the parameter service node based on the subset of the training data, on the one hand, reduce the computational complexity of the single node, and on the other hand, the computing node The way of parallel computing effectively improves the computational efficiency;

3) The project matrix and the user matrix are distributed and stored in a sub-matrix manner, which effectively reduces the capacity of the transmission item sub-matrix between the computing node and the parameter service node. On the one hand, the communication overhead of a single node is effectively reduced, eliminating communication. When the overhead encounters the network bandwidth bottleneck, it is beneficial to the equalization of the network communication load. On the other hand, the transmission efficiency is high, which avoids the situation that the computing node is idle due to waiting data, and improves the calculation efficiency.

4) Only the vector corresponding to the scored item and the update value are transmitted between the calculation node and the parameter service node. Since it is not necessary to transmit the vector related to the unrated item, the communication overhead and transmission delay between the calculation node and the parameter service node are reduced. It is beneficial to improve the calculation efficiency.

5) By dividing the user matrix into sub-matrices and assigning them to multiple computing nodes, the project matrix is decomposed into multiple project sub-matrix distributions stored in the parameter service nodes, and the project vectors are obtained in batches in each iteration, and the large-scale matrix decomposition model is solved. The computational problem can be extended linearly by increasing the number of parameter service nodes and the number of compute nodes to support very large-scale calculations.

The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Industrial applicability

The distributed computing system in the embodiment of the present invention includes: at least two computing nodes and at least two parameter service nodes; wherein the computing node is configured to initialize corresponding users in the user matrix according to users included in the subset of training data. Deriving the user's vector to obtain a user sub-matrix composed of the initialized vectors; the computing node is configured to iteratively calculate the user according to the subset of the training data, the item sub-matrix obtained from the parameter service node Sub-matrix, and the item sub-matrix, the item sub-matrix obtained after each iteration calculation is transmitted to a corresponding parameter service node; the parameter service node is configured to initialize a vector corresponding to the partial item, and obtain the vector initialized by a project sub-matrix, the partial item is a part of the items included in the training data; the parameter service node is configured to update the parameter service node to store according to the item sub-matrix transmitted by the computing node Project submatrix. In this way, the project matrix and the user matrix are distributed and stored in a sub-matrix manner, which reduces the occupation of the memory space of a single node, and overcomes the limitation that the related technology needs to be able to store a complete user matrix and a project matrix for a single-node memory. Large-scale computing is realized in a distributed computing system with limited memory resources; the communication overhead of a single node is effectively reduced, eliminating the situation that the communication overhead encounters a network bandwidth bottleneck, which is beneficial to the equalization of the network communication load and avoids waiting The data leads to the idle state of the computing node, which improves the computational efficiency; multiple computing nodes perform iterative calculation on the stored user sub-matrix and the project sub-matrix based on the subset of training data. On the one hand, the computational complexity is reduced and the pair is reduced. The computational resource overhead of a single node reduces the computational complexity of a single node. On the other hand, the parallel computation of computational nodes effectively improves computational efficiency.

Claims

A distributed computing system comprising:

At least two computing nodes and at least two parameter service nodes; wherein

The computing node is configured to initialize a vector corresponding to the user in the user matrix according to a user included in the subset of the training data, to obtain a user sub-matrix formed by the initialized vector;

The computing node is configured to iteratively calculate the user sub-matrix and the item sub-matrix according to the subset of the training data, the item sub-matrix obtained from the parameter service node, and obtain the calculation after each iteration The project sub-matrix is transmitted to the corresponding parameter service node;

The parameter service node is configured to initialize a vector corresponding to the partial item, and obtain a project sub-matrix composed of the initialized vector, where the partial item is a part of the items included in the training data;

The parameter service node is configured to update an item sub-matrix stored by the parameter service node according to an item sub-matrix transmitted by the computing node;

The user sub-matrix stored by each of the computing nodes is used to combine to obtain a user matrix, and the item sub-matrix stored by each parameter service node is used to combine to obtain an item matrix;

A vector corresponding to the target user in the user matrix and a vector of the corresponding target item in the item matrix are used to obtain a score of the target user for the target item.
The distributed computing system of claim 1 further comprising:

a control node configured to divide, according to a user dimension, a score for the plurality of the items included in the training data, obtain a plurality of subsets of the training data, and assign the plurality of subsets to the at least two calculations node.
The distributed computing system of claim 1 further comprising:

And a control node configured to combine the user sub-matrix stored by each of the computing nodes to obtain a user matrix when the suspension condition of the calculation node is calculated by the iterative calculation; and combine the item sub-matrix stored by each parameter service node to obtain an item matrix ;

The control node is further configured to obtain a score of the target user for the target item according to a product of a vector of a corresponding target user in the user matrix and a vector of a corresponding target item in the item matrix.
The distributed computing system of claim 1 wherein

The computing node is configured to determine, according to the subset, the scored items included in the subset, and obtain, from the item sub-matrix stored by the parameter service node, the scored item Vector

The computing node is configured to iteratively calculate a vector corresponding to a part of users in the user sub-matrix and a vector corresponding to the scored item in the item sub-matrix, where the partial users are among the users included in the subset a user who generates a scoring behavior for the scored item;

The computing node is configured to transmit a vector corresponding to the scored item obtained after each iteration calculation to a corresponding parameter service node.
The distributed computing system of claim 4 wherein

The computing node is configured to acquire, from the item sub-matrix stored by the parameter service node, a vector corresponding to the scored item in batches;

The computing node is configured to iteratively calculate a vector corresponding to a corresponding batch user in the user sub-matrix and a vector corresponding to the scored item of the corresponding batch, where the corresponding batch user is the part of the user The user of the batched graded item that generated the scoring behavior;

The computing node is configured to transmit a vector corresponding to the scored item of the corresponding batch obtained after each iteration calculation to a corresponding parameter service node.
The distributed computing system of claim 5 wherein

The computing node is further configured to determine the quantity of the batch according to a memory space of the computing node, where a storage space occupied by a vector corresponding to the scored item of each batch is smaller than the computing node Memory space.
The distributed computing system of claim 4 wherein

The parameter service node is configured to update the item sub-matrix stored by the parameter service node according to a vector corresponding to the scored item transmitted by the computing node.
A distributed computing system according to any one of claims 1 to 7, wherein

The computing node is configured to perform a difference between the score prediction value and the score actual value included in the subset of the training data to obtain a predicted difference value;

The computing node is configured to superimpose the product of the predicted difference value and the item sub-matrix with the user sub-matrix to obtain an updated user sub-matrix;

The computing node is configured to superimpose the product of the predicted difference value and the updated user sub-matrix with the item sub-matrix to obtain an updated item sub-matrix.
A distributed computing method for a distributed computing system comprising at least two computing nodes and at least two parameter service nodes;

The computing node initializes a vector corresponding to the user in the user matrix according to a user included in the subset of the training data, and obtains a user sub-matrix composed of the initialized vector;

The computing node iteratively calculates the user sub-matrix and the item sub-matrix according to the subset of the training data, the item sub-matrix obtained from the parameter service node, and the item obtained after each iteration calculation a matrix, transmitted to the corresponding parameter service node;

The parameter service node initializes a vector corresponding to the partial item, and obtains a project sub-matrix composed of the initialized vector, where the partial item is a part of the items included in the training data;

And the parameter service node updates the item sub-matrix stored by the parameter service node according to the item sub-matrix transmitted by the computing node;

The user sub-matrix stored by each of the computing nodes is used to combine to obtain a user matrix, and the item sub-matrix stored by each parameter service node is used to combine to obtain an item matrix;

A vector corresponding to the target user in the user matrix and a vector of the corresponding target item in the item matrix are used to obtain a score of the target user for the target item.
The distributed computing method according to claim 9, further comprising:

The control node in the distributed computing system divides the scores of the training data for a plurality of the items by using a user dimension, obtains a plurality of subsets of the training data, and allocates the plurality of subsets to the Said at least two computing nodes.
The distributed computing method according to claim 9, further comprising:

When the suspension condition of the iterative calculation of the computing node is satisfied, the control node in the distributed computing system combines the user sub-matrix stored by each computing node to obtain a user matrix; and combines the storage of the parameter service node Project submatrix, get the project matrix;

And obtaining a score of the target user for the target item according to a product of a vector of a corresponding target user in the user matrix and a vector of a corresponding target item in the item matrix.
The distributed computing method according to claim 9, wherein

The computing node initializes a vector corresponding to the user in the user matrix according to a user included in the subset of the training data, including:

The computing node determines, according to the subset, the scored items included in the subset, and obtains a vector corresponding to the scored item from the item sub-matrix stored by the parameter service node;

The computing node iteratively calculates the user sub-matrix and the item sub-matrix according to the item sub-matrix obtained from the parameter service node, including:

The computing node iteratively calculates a vector corresponding to a part of users in the user sub-matrix and a vector corresponding to the scored item in the item sub-matrix, where the partial user is the user included in the subset The user who has scored the item to generate a scoring behavior;

Transmitting the item sub-matrix obtained after each iteration calculation to the corresponding parameter service node, including:

The vector corresponding to the scored item obtained after each iteration calculation is transmitted to the corresponding parameter service node.
The distributed computing method according to claim 12, wherein

Obtaining, from the item sub-matrix stored by the parameter service node, a vector corresponding to the scored item, including:

The computing node obtains, from the item sub-matrix stored by the parameter service node, a vector corresponding to the scored item in batches;

Iteratively calculating a vector corresponding to the corresponding batch user in the user sub-matrix and a vector corresponding to the scored item of the corresponding batch, the corresponding batch user being: the scored item for the batch among the partial users The user who generated the scoring behavior;

The vector corresponding to the scored item of the corresponding batch obtained after each iteration calculation is transmitted to the corresponding parameter service node.
The distributed computing method according to claim 13, further comprising:

The computing node determines the quantity of the batch according to the memory space of the computing node, wherein a storage space occupied by a vector corresponding to the scored item of each batch is smaller than a memory space of the computing node.
A storage medium storing an executable program that, when executed by a processor, performs the following operations:

When in the computing node mode, according to the user included in the subset of the training data, the vector corresponding to the user in the user matrix is initialized, and a user sub-matrix composed of the initialized vector is obtained;

When in the computing node mode, iteratively calculates the user sub-matrix and the item sub-matrix according to the subset of the training data, the item sub-matrix obtained from the parameter service node, and obtains after each iteration calculation The project sub-matrix is transmitted to the corresponding parameter service node;

When in the parameter service node mode, the vector corresponding to the partial item is initialized, and a project sub-matrix composed of the initialized vector is obtained, and the partial item is a part of the items included in the training data;

When in the parameter service node mode, updating the item sub-matrix stored by the parameter service node according to the item sub-matrix transmitted by the computing node;

The user sub-matrix stored by each of the computing nodes is used to combine to obtain a user matrix, and the item sub-matrix stored by each parameter service node is used to combine to obtain an item matrix;

A vector corresponding to the target user in the user matrix and a vector of the corresponding target item in the item matrix are used to obtain a score of the target user for the target item.