Disclosure of Invention
The invention provides an intelligent service recommendation algorithm based on cloud computing, which is small in calculation amount, high in efficiency, high in accuracy and good in stability, and aims to solve the technical problems of large calculation amount, low efficiency, low accuracy and instability of the existing service recommendation algorithm.
The invention comprises the following two steps:
(1) model training
Firstly, training a plurality of accurate service characteristics, dispersing training tasks of different accurate service selection decision data to different computing nodes by adopting a distributed training method to obtain different types of service selection model parameters, and storing the different types of service selection model parameters into a model database; the model database is established according to historical accurate service selection decision data, and different service characteristic data are extracted, arranged and updated regularly aiming at different services; the model database comprises a service characteristic training library and a service selection decision model library;
(2) service selection decision
And (3) performing accurate service selection decision by using a logistic regression algorithm, after receiving the service characteristic data of the demander, firstly filtering abnormal index information in the detected service characteristic data according to a detection standard data model, determining a required service range, sending the service characteristic data to each computing node, sending the service characteristic data to reduce nodes for comprehensive analysis after the computation is finished, and finally determining the required service.
Preferably, the computation process of each computing node in step (2) is the same, and the following lists a model for each computing node to train and make an accurate service selection decision:
(1) fitting function
The method adopts a fitting function for concentrating the probability obtained by calculation in a [0,1] interval and leading the calculation result to approach 0 or 1 as much as possible;
from the fitting function:
wherein x is the inputted service characteristic data, y is the service selection decision result, ftSelecting a decision threshold for the service; h isθ(x) Is a fitting function, theta is a fitting parameter, namely a service selection decision model parameter, and T is transposition;
(2) loss function
When y is 1:
when y is 0:
merging the loss functions:
cost(hθ(x),y)=-y log(hθ(x))-(1-y)log(1-hθ(x))
the loss function is adopted, so that when the prediction result is close to the actual value, the loss approaches to 0, and when the difference between the prediction result and the actual value is very large, the loss approaches to infinity;
(3) fitting parameter theta
Updating all parameters theta simultaneously
Wherein alpha is a set threshold value; m is the number of service characteristic data x; i is a subscript of the service characteristic data and represents the ith service characteristic data; g is an index of the service selection decision model parameter θ, representing the g-th θ.
(4) Feature scaling
Feature scaling is a method used to unify the independent variables or feature ranges, making the different features have the same effect on the differences;
where x is the input service characteristic data, xmaxIs the maximum value, x, thereofminIs the minimum value thereof; since x is plural, μ represents an average value of the plural x.
(5) Description of algorithm implementation
MRLRFDD accurate service selection decision algorithm
Inputting: service characteristic data provided by all service demanders, namely X ═ X (X)1,x2,…,xn) And whether it is the desired service Z ═ (Z)1,z2,…,zn)
And (3) outputting: the predicted result H ═ H1,h2,…,hn)
①:for i=1:n
②:for j=1:m
④:for i=1:Iteration
⑥:for g=1:m
And (v): substituting various service characteristic data provided by the current service demander into H theta (x) to obtain a prediction result H;
wherein x is1,x2,…,xnRefers to the service characteristic data, z, provided by the 1 st, 2 nd, … th n service demanders1,z2,…,znWhether the service is needed by the 1 st, 2 nd, … th n service demanders or not is judged; n in X represents n service characteristic data which are in one-to-one correspondence with n in Z, n in H and n in for loop; m represents service characteristic data x for each submissioniM components in the sequence; i is a subscript of the service characteristic data, and represents the ith service characteristic data, i is 1,2, … n; g is a subscript of the service selection decision model parameter θ, representing the g-th θ; x is the number ofijThe jth service characteristic data submitted for the ith time.
Preferably, on the basis that a cloud computing-based precision service selection decision algorithm (MRLRFDD) is implemented, establishing a collaborative filtering recommendation algorithm (UDMDCFUB) of a multidimensional vector associated with a precision service selection decision requires the following steps:
(1) let C be the set of service requesters in all systems and S be the set of all service plans that can be recommended to the service requesters. In practice, the size of the C and S sets is usually large. The utility function u can be used for calculating the recommendation degree of the service scheme S to the service demander C, namely u: C × S → R, R is a full-order non-negative real number in a certain range, and the problem to be researched by recommendation is to find the objects S with the maximum recommendation degree R*As shown in formula (1):
(2) the demander has a score (0 for non-scoring) for each service plan, and the score of a demander for a service plan can be represented in the form of a one-dimensional matrix, i.e. Si'=(s(i,1),s(i,2),…,s(i,m)) All of the demanders' scores for the service plan can be expressed in the form of a multi-dimensional matrix, i.e. S ═ S (S)1',S2',…,Si',…,Sn'), wherein m is the number of scores of a demander to the service scheme, n is the number of demanders, and is in corresponding relation with m and n; setting a flag matrix F, F (i, j) to indicate whether a demander i scores a service scheme j, wherein F (i, j) is 1 when the demander i scores the service scheme j, and F (i, j) is 0 when the demander i does not score the service scheme j;
(3) mean value normalization processing, namely limiting the processed data within a certain range, wherein the normalization is to facilitate the subsequent data processing and ensure the accelerated convergence when the program runs;
Pj”=Pj'-aj (2)
wherein j is the jth service scheme, ajAverage value of scores for jth service plan, Pj' all scores for all requesters for the jth service plan, PjIs' to oneAll scores of all demanders to the jth service scheme after the treatment are processed;
(4) learning parameters X and theta; let the service characteristic data set of the preference of the demander be X ═ X1,x2,…,xm) M is the number of service characteristic data, and the data set of the service selection decision model parameters is theta ═ theta (theta)1,θ2,…,θn) N is the number of parameters of a service selection decision model, two data sets are initialized, and parameters X and theta are learned by a gradient descent method;
wherein s (i, j) is the score of the demander i on the service scheme j; obtaining the trained parameters X and theta, wherein X multiplied by theta is the preference degree of the demander to the service scheme; beta is a coefficient, lambda is a given parameter, and k is the kth term;
(5) description of algorithms
Collaborative filtering recommendation algorithm for UDMDCFUB multi-dimensional vector
Inputting: all service demanders score a service plan, i.e. S ═ S (S)1',S2',…,Sn') a flag matrix F, where n is the number of requesters;
and (3) outputting: predicting the scoring of all service plans by a demander
1:for i=1:n
2:for j=1:m
3:s(i,j)=s(i,j)-aj
4:for i=1:Iteration
7: prediction score θT*X+μ;
Wherein s is(i,j)The scores of the service proposal j of the demander i are expressed, and the meaning of i and j is the same as that of the formulas (3) and (4).
The invention provides a novel service decision and service recommendation algorithm based on cloud computing from the aspect of network accurate service recommendation so as to enable service demanders to obtain the best service supply, and the algorithm is simple, high in accuracy and good in stability.
Detailed Description
In this process, two stages are divided, briefly listed below:
1. map-reduce based precision service decision algorithm (MRLRFDD). The current popular services in the society are various, each service needs to provide accurate service content, so that a logistic regression algorithm is used for finding out a characteristic equation of the service, and therefore, each service needs to be trained to obtain required parameters, and the required parameters are stored in a service selection model parameter library for decision making. The service type is various, the training data amount is large, and in order to recommend precision to the service, the training is required to be carried out by using newly obtained data regularly, which is one of the characteristics of the precise service decision algorithm. The second characteristic is that the trained service selection model can not be directly selected according to the service characteristics, and since the service demander has the possibility of needing each service, the service requirement characteristic data of the service demander needs to be substituted into each model for measurement and calculation, and finally, the measurement and calculation results are combined to obtain the required service. Aiming at the two characteristics, the logistic regression algorithm based on map-reduce is designed, each computing node firstly trains a plurality of accurate service characteristics to obtain model parameters, and the model parameters are stored in a model database. After receiving the service characteristic data of the demander, the data are sent to each computing node, and after the computing is finished, the data are sent to the reduce node for comprehensive analysis, and finally the required service is determined.
2. Collaborative filtering recommendation algorithm (UDMDCFUB) of multidimensional vectors associated with precise service decisions. And filtering the service schemes provided by all service providers according to the result obtained by the accurate service decision algorithm, combining the historical selection result and evaluation of a plurality of similar service demanders to form data to be trained by the collaborative filtering recommendation algorithm based on the multidimensional matrix, and obtaining the personal preference attribute value of the service demander to the service scheme and the attribute values of all aspects of the service scheme after a series of calculations. When the system deduces the accurate demand of the service demander, the system can obtain the probability of the service demander for selecting a certain service scheme according to the personal preference of the service demander and the attribute value of the service scheme, and recommend the service scheme to the demander according to the set threshold value of the service recommendation probability.
In order to realize the aim, the invention adopts the following technical scheme:
1. and establishing a service characteristic training library and a service selection decision model library. When a logistic regression algorithm is used for accurate service selection decision making, a large amount of data is needed for training, then a sample base is established according to historical accurate service decision making data, and the sample base is updated regularly, so that the latest training data can be obtained, and a basis is provided for more accurate service decision making. The sample library is derived from common and reliable accurate service selection decision data, but usually, the information cannot be directly used, and different data are extracted and sorted for different services because data items and related indexes of each service investigation are different. The invention stores the data items to be extracted by different services, the extraction rules, the parameters obtained by the latest training and other information into the model database.
2. Accurate service decision algorithm based on map-reduce
The precise service selection decision is divided into two steps: model training and service selection decisions. Model parameters are obtained through model training, and the parameters and the demand data provided by the service demander are substituted into an accurate service selection decision equation, so that the accurate service demand of the service demander can be obtained. The more data is trained, the more accurate the result is, and the common service requirement reaches hundreds of thousands, so the process of data training is actually a large data mining process. Training on a single node is inefficient. In addition, because the accurate range of the service cannot be determined when the decision is selected for accurate service, and the requirement of the online accurate service selection decision on the internet on time is considered, the invention adopts a logistic regression accurate service selection decision algorithm based on map-reduce. The model training adopts a distributed training method, the training tasks of different accurate service selection decision data are dispersed to different computing nodes, and the training results are stored in a model database. Due to the uncertainty of the exact service selection decision, the exact service selection decision is divided into two phases. Firstly, according to a detection standard data model, abnormal index information in detection data is filtered out, and a possibly needed service range is determined. And then sent to different computing nodes for accurate service selection decision. Because the computation process is the same for each node, a model for each node to train and make accurate service decisions is listed below.
(1) Fitting function
The method adopts the fitting function to concentrate the probability obtained by calculation in the interval of [0,1] and make the calculation result approach to 0 or 1 as much as possible.
From the fitting function:
wherein x is the inputted service characteristic data, y is the service selection decision result, ftA decision threshold is selected for the service.
(2) Loss function
Such a loss function is employed in order that the loss approaches 0 when the prediction result is close to the actual value and approaches infinity when the prediction result is very different from the actual value.
When y is 1:
when y is 0:
merging the loss functions:
cost(hθ(x),y)=-y log(hθ(x))-(1-y)log(1-hθ(x))
(3) fitting parameter theta
Updating all parameters theta simultaneously
(4) Feature scaling
Feature scaling is a method used to unify the argument or feature range so that different features have the same effect on the difference.
(mu is the average of a plurality of X)
(5) Description of algorithm implementation
MRLRFDD accurate service selection decision algorithm
Inputting: all service demanders provide service requirement item, namely X ═ X1,x2,…,xn) And whether it is the desired service Z ═ (Z)1,z2,…,zn)
And (3) outputting: the predicted result H ═ H1,h2,…,hn)
①:for i=1:n
②:for j=1:m
④:for i=1:Iteration
⑥:for j=1:m
And (v): substituting various service characteristic data provided by the current service demander into hθ(x) Obtaining a predicted structure H;
wherein x is1,x2,…,xnRefers to the service characteristic data, z, provided by the 1 st, 2 nd, … th n service demanders1,z2,…,znWhether the service is needed by the 1 st, 2 nd, … th n service demanders or not is judged; n in X represents n service characteristic data which are in one-to-one correspondence with n in Z, n in H and n in for loop; m represents service characteristic data x for each submissioniM components in the sequence; i is a subscript of the service characteristic data, and represents the ith service characteristic data, i is 1,2, … n; j is an index of the service selection decision model parameter θ, representing the jth θ.
3. Collaborative filtering recommendation algorithm (UDMDCFUB) of multidimensional vectors associated with precise service decisions.
(1) Let C be the set of service requesters in all systems and S be the set of all service plans that can be recommended to the service requesters. In practice, the size of the C and S sets is usually large. The utility function u can be used for calculating the recommendation degree of the service scheme S to the service demander C, namely u: C × S → R, R is a full-order non-negative real number in a certain range, and the problem to be researched by recommendation is to find the objects S with the maximum recommendation degree R*As shown in formula (1):
(2) demander to each serverIf there is a score (0 indicates no score), the score of a requester on the service plan can be represented in the form of a one-dimensional matrix, i.e., Si'=(s(i,1),s(i,2),…,s(i,m)) All of the demanders' scores for the service plan can be expressed in the form of a multi-dimensional matrix, i.e. S ═ S (S)1',S2',…,Si',…,Sn') where m is the number of service plan scores of a requester and n is the number of requesters, and m and n are in corresponding relationship with each other. And setting a flag matrix F, F (i, j) to indicate whether the demander i scores the service scheme j, wherein F (i, j) is 1 when the demander i scores the service scheme j, and F (i, j) is 0 when the demander i does not score the service scheme j.
(3) The mean value normalization processing is to limit the processed data within a certain range, and the first normalization is to facilitate the subsequent data processing and ensure the accelerated convergence when the program runs.
Pj”=Pj'-aj (2)
Wherein j is the jth service scheme, ajAverage value of scores for jth service plan, Pj' all scores for all requesters for the jth service plan, Pj"all scores of all demanders to the jth service scheme after normalization processing;
(4) the parameters X, θ are learned.
Let the service characteristic data set of the preference of the demander be X ═ X1,x2,…,xm) M is the number of service characteristic data, and the data set of the service scheme attribute parameter is theta ═ theta1,θ2,…,θn) And n is the number of parameters of the service selection decision model, two data sets are initialized, and the parameters X and theta are learned by a gradient descent method.
Note: s (i, j) is the score of the service plan j for the demander i. Obtaining the trained parameters X and theta, wherein X multiplied by theta is the preference degree of the demander to the service scheme; beta is a coefficient, lambda is a given parameter, and k is the kth term;
(5) description of algorithms
UDMDCFUB algorithm
Inputting: rating of service plan by all service demanders, i.e. S ═ S (S)1,S2,…,Sn) A mark matrix F, wherein n is the number of scores of all service demanders for the service scheme;
and (3) outputting: predicting the scoring of all service plans by a demander
1:for i=1:n
2:for j=1:m
3:s(i,j)=s(i,j)-aj
4:for i=1:Iteration
7: prediction score θT*X+μ。
Wherein s (i, j) represents the score of the demander i on the service scheme j, and the meaning of i and j is the same as that of the formulas (3) and (4).
The technical solution of the present invention will be described in detail by specific embodiments.
The core of the invention is to develop service characteristics and establish a service characteristic model by researching a large number of service cases. When a service demander puts forward a service demand with corresponding characteristics, corresponding services can be recommended. To ensure that the designed algorithm is practically usable, an accurate source of information must be found and the resulting service taken as a case. Therefore, the explanation of the concrete implementation is given here by taking the endowment service as an example. The method comprises the steps of obtaining characteristic association relations between detection indexes such as symptoms and biochemical pathologies and related diseases by mining historical information of diagnosis and treatment medical cases, establishing a mathematical model, inputting the detection indexes such as the symptoms and biochemical pathologies of a current patient, judging the probability of the patient suffering from one or more diseases, entering a medical service recommendation system as an input source after diagnosis is confirmed, and selecting a current disease service scheme suitable for the patient from a plurality of medical service schemes to recommend the patient according to a collaborative filtering recommendation algorithm of multidimensional vectors associated with disease diagnosis and a large number of patient evaluations by the system. For this purpose, the following steps are required.
1. And training a sample database.
The training sample database includes two types, one is sample data prepared for disease diagnosis, which is actually data extracted, washed and collated from the patient physical examination database, including urine routine, blood routine test, and the like; and the other is a patient evaluation sample library recommended for the medical service scheme, which comprises medical service unit information (including unit name, level, position and the like), the medical service scheme (medical service team information, treatment scheme, price, patient cure rate and the like), an evaluation matrix of the user on the medical service scheme and the like.
2. And establishing a model database.
During model training, the variables and related parameters for different diseases are different and may be adjusted at each stage, and therefore, cannot be fixed in the program. The invention stores variables, parameters, the number of the variables, the number of the parameters and the mapping relation between the variables and the training samples in a model database. The function of the method comprises two aspects: (1) extracting variables, variable quantity and mapping relation between the variables and a training data set during training at each stage, training, and storing training results back in a parameter table corresponding to a model database; (2) in each diagnosis, variables, parameters, the number of the variables and the number of the parameters are extracted from the model base to establish a dynamic model, and the disease probability is calculated.
3. Different diseases are associated with different symptoms, different disease sample data are used, and the trained parameter theta isThe number and the value are different, so the invention adopts a disease diagnosis algorithm based on map-reduce, each calculation node unit adopts a logistic regression algorithm, and the calculation is carried out in stages to obtain a training model. I.e. multiple nodes train on multiple diseases. Taking X from model database in each stage of each nodei(xi,j+1,xi,j+2,…,xi,j+k) Mapping the data to a sample database, extracting the latest sample, training, and obtaining a parameter thetai(θi,j+1,θi,j+2,…,θi,j+k) And storing the data back into the model library. When a certain patient is diagnosed, disease data of the patient is input firstly, then the disease data is sent to each map computing node, the computing nodes extract corresponding data from disease data according to diseases which can be processed by the computing nodes, disease judgment is carried out, results are sent to reduce nodes for screening, and the disease of the patient is obtained according to probability.
4. The system sends the disease of the patient obtained in the step 3 and the patient information to a service recommendation node, the service recommendation node extracts the associated evaluation matrix according to the previous diagnosis and treatment information of the patient and the evaluation of history relevant patients on the diagnosis and treatment of the disease, then the calculation is carried out according to the collaborative filtering recommendation algorithm of the multidimensional vector, and the most suitable diagnosis and treatment service scheme is recommended to the patient.
The algorithm involved in the present invention is described below using an example. In this example, we have used three common diseases for testing, namely hypertension (coded as I10.x02), coronary heart disease (coded as I25.101) and nephropathy (coded as N28.901), which have certain relationship, for example, patients suffering from hypertension for a long time may have influence on heart and kidney, and serious patients may have diseases of corresponding organs. In order to be able to make an accurate diagnosis, training and corresponding tests are carried out using blood routine and urine routine sample banks provided by the corresponding medical department. The sample data mapping table 1. Three cloud computing nodes are used, and data (physical examination, blood routine and urine routine) extracted from three sample libraries according to the mapping relation of the table 1 are used for training and diagnosing three diseases respectively.
TABLE 1 sample data mapping Table
When a disease is diagnosed, it is necessary to provide a patient with available clinical care solutions provided by hospitals, and table 2 abstracts six more typical hospital-provided medical care solutions, each code representing a medical care solution, for example, Jh _ s1 represents a medical care solution provided by a department in a hospital for hypertension, and the code corresponds to details of the services provided by a department in a hospital for hypertension. Because of the wide variety of diseases and the large number of medical units, the medical care plan is a database with huge data volume, and the evaluation matrix generated by the medical care plan is also quite huge. Therefore, when the service scheme is recommended, the service scheme matrix is extracted according to the disease, and then the multi-dimensional vector evaluation vector matrix is extracted.
TABLE 2 medical care plan table
After the preparation work is completed, we extract data from the sample library illustrating 2 algorithms.
1) MRLRFDD algorithm: 130 data are randomly extracted from a sample library to train hypertension diagnosis, 200 data are randomly extracted to train coronary heart disease diagnosis, and 50 data are randomly extracted to train kidney disease diagnosis. The theta values of hypertension (0.29042339526016764,2.320070076449664,3.1247681396613323,1.9912852617768821, -2.939785693846358 and 2.546702905413339), coronary heart disease (-0.5732159592392719,3.963633095493384,3.3987536729350865,3.8920547540621193 and 3.398593162956388) and kidney disease (0.2561313490409353,1.6548079722273288,4.911460779107366,4.2401342923473715,3.906841801494906 and 2.8548633117398188) are obtained from formula (2) in the MRLRFDD algorithm. Then, disease prediction was performed on 10 suspected patients using formula (2) and formula (1) in the MRLRFDD algorithm, and the results are shown in table 3.
TABLE 3 comparison of disease diagnosis results
2) UDMDCFUB algorithm: after the diagnosis is confirmed, a multi-dimensional evaluation matrix is filtered from the evaluation sample library according to the table 2, and a recommended prediction matrix of the medical care service scheme which needs to be selected for the patients with hypertension, coronary heart disease and nephropathy can be respectively obtained according to the formulas (3) and (4) in the UDMDCFUB algorithm.
Hypertension healthcare service plan prediction matrix:
coronary heart disease medical service scheme prediction matrix:
nephropathy healthcare service plan prediction matrix:
it can be seen from the above prediction matrix that the medical care schemes proposed by the military hospitals and the third hospitals are popular, and some specialized hospitals have their own advantages when treating certain diseases, for example, the people's hospitals are generally favored by patients when treating kidney diseases, and the recommended score reaches the full score (i.e. 5).
However, the above description is only exemplary of the present invention, and the scope of the present invention should not be limited thereby, and the replacement of the equivalent components or the equivalent changes and modifications made according to the protection scope of the present invention should be covered by the claims of the present invention.