CN113076486B

CN113076486B - Drug information pushing method, device, computer equipment and storage medium

Info

Publication number: CN113076486B
Application number: CN202110473086.XA
Authority: CN
Inventors: 徐卓扬; 孙行智; 胡岗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2023-07-25
Anticipated expiration: 2041-04-29
Also published as: CN113076486A; WO2022227176A1

Abstract

The embodiment of the application discloses a drug information pushing method, a device, computer equipment and a storage medium, wherein the method is applicable to the field of digital medical treatment and comprises the following steps: acquiring target user attribute information of a target user, and inputting the target user attribute information into a drug rewards prediction model; outputting each first target rewarding parameter and each second target rewarding parameter of the target user under the action of each drug through the drug rewarding prediction model; determining the rewarding parameters of the target users under the action of the medicaments on the basis of the first target rewarding parameters of the target users and/or the second target rewarding parameters of the target users; and determining the maximum user rewarding parameter from the user rewarding parameters, and outputting the drug information of the target drug with the maximum user rewarding parameter to a user interface so as to display the target drug to the target user. By adopting the embodiment of the application, the expandability of the drug rewarding prediction model can be enhanced, so that the accuracy of drug information pushing is improved.

Description

Drug information pushing method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and apparatus for pushing drug information, a computer device, and a storage medium.

Background

Currently, more and more practical problems can be solved by using a deep reinforcement learning (deep reinforcement learning, DRL) model. In running the DRL model, patient sample data may be entered into the DRL model to output a Q value (value), where the Q value may be used to evaluate the expected rewards (reward, e.g., the degree of impact of the drug) of different actions (actions, e.g., physician's prescription). Since the DRL model often considers both the short-term and long-term outcomes and the DRL model has only one return factor, the Q value will evaluate both the expected rewards for the short-term and long-term outcomes, resulting in the same nature for the expected rewards for the long-term and short-term outcomes. However, since the long-term and short-term outcomes have substantial differences, which are mainly reflected in the difference in the distance of action between the long-term and short-term outcomes (e.g., the short-term outcome is mainly affected by the drug in the last time and the long-term outcome is mainly affected by the drug in the last time), the scalability of the DRL model is poor.

Disclosure of Invention

The embodiment of the application provides a drug information pushing method, device, computer equipment and storage medium, which can enhance the expandability of a drug rewarding prediction model, thereby improving the accuracy of drug information pushing.

In a first aspect, the present application provides a method for pushing drug information, the method including:

acquiring target user attribute information of a target user, and inputting the target user attribute information into a drug rewarding prediction model, wherein the target user attribute information comprises at least one of demographic information, health indexes aiming at target disease medication and historical medication information;

outputting first target rewarding parameters and second target rewarding parameters of target users under the action of each drug through a drug rewarding prediction model, wherein the drug rewarding prediction model comprises a first network parameter and a second network parameter, the first network parameter is used for determining the first rewarding parameters of any user with any user attribute information under the action of each drug, the second network parameter is used for determining the second rewarding parameters of any user under the action of each drug, the first rewarding parameters and the second rewarding parameters of any user under the action of one drug are corresponding, and the drug action time corresponding to the first rewarding parameters is longer than the drug action time corresponding to the second rewarding parameters;

determining each user rewarding parameter of the target user under the action of each drug based on each first target rewarding parameter of the target user and/or each second target rewarding parameter of the target user, wherein the target user corresponds to one user rewarding parameter under the action of one drug;

And determining the maximum user rewarding parameter from the user rewarding parameters, and outputting the drug information of the target drug with the maximum user rewarding parameter to a user interface so as to display the target drug to the target user.

With reference to the first aspect, in one possible implementation manner, the method further includes:

acquiring sample data of at least two users, wherein the sample data of one user comprises user attribute information and sample medicine information of the user;

acquiring first sample reward parameters and second sample reward parameters of each user under the action of sample medicines indicated by sample medicine information, and inputting sample data of at least two users, the first sample reward parameters and the second sample reward parameters into a medicine reward prediction model;

training the first network parameters and the second network parameters of the drug rewards prediction model based on the user attribute information of at least two users, the first sample rewards parameters and the second sample rewards parameters to obtain the ability of predicting the first rewards parameters and the second rewards parameters of any user under the action of each drug based on the user attribute information of any user.

With reference to the first aspect, in one possible implementation manner, the first network parameter includes a first model parameter and a first backhaul parameter, and the second network parameter includes a second model parameter and a second backhaul parameter;

Training first and second network parameters of a medication rewards prediction model based on user attribute information of at least two users, respective first sample rewards parameters, and respective second sample rewards parameters, comprising:

determining first expected reward parameters of each user under the action of the sample medicine based on the first model parameters and the first return parameters, and determining second expected reward parameters of each user under the action of the sample medicine based on the second model parameters and the second return parameters;

determining each loss value corresponding to the sample data of each user based on the first return parameter, the second return parameter, each first sample reward parameter, each second sample reward parameter, each first expected reward parameter and each second expected reward parameter;

and iteratively updating the parameter values of the first model parameter and the parameter values of the second model parameter based on the loss values until the loss values are unchanged, so as to obtain the capability of predicting the first rewarding parameter and the second rewarding parameter of any user under the action of each drug based on the user attribute information of any user.

With reference to the first aspect, in one possible implementation manner, determining, based on each first target reward parameter of the target user and each second target reward parameter of the target user, each user reward parameter of the target user under each drug action includes:

Determining a first weighting factor for the first target prize parameter and a second weighting factor for the second target prize parameter;

determining each first weighted reward parameter corresponding to each first target reward parameter based on the first weighting coefficient and each first target reward parameter of the target user, and determining each second weighted reward parameter corresponding to each second target reward parameter based on the second weighting coefficient and each second target reward parameter of the target user;

determining user rewarding parameters of the target user under the action of each drug based on the first weighted rewarding parameters and the second weighted rewarding parameters, wherein one first weighted rewarding parameter and one second weighted rewarding parameter correspond to one user rewarding parameter.

With reference to the first aspect, in one possible implementation manner, determining, based on each first target reward parameter of the target user, each user reward parameter of the target user under each drug action includes:

determining each first target rewarding parameter of the target user as each user rewarding parameter of the target user under the action of each drug;

the maximum user rewarding parameter is the maximum first target rewarding parameter in the first target rewarding parameters.

With reference to the first aspect, in one possible implementation manner, determining, based on each second target reward parameter of the target user, each user reward parameter of the target user under each drug action includes:

Determining each second target rewarding parameter of the target user as each user rewarding parameter of the target user under the action of each drug;

the maximum user rewarding parameter is the maximum second target rewarding parameter in the second target rewarding parameters.

With reference to the first aspect, in one possible implementation manner, the user interface includes an attribute information input area;

acquiring target user attribute information of a target user, including:

upon detecting an input instruction on the attribute information input area, target user attribute information of a target user is acquired based on the input instruction.

In a second aspect, the present application provides a medication information pushing device, the device comprising:

the information input module is used for acquiring target user attribute information of a target user, inputting the target user attribute information into the drug rewarding prediction model, wherein the target user attribute information comprises at least one of demographic information, health indexes aiming at target disease drug administration and historical drug administration information;

the system comprises a parameter output module, a drug reward prediction module and a control module, wherein the parameter output module is used for outputting each first target reward parameter and each second target reward parameter of a target user under each drug action through the drug reward prediction module, the drug reward prediction module comprises a first network parameter and a second network parameter, the first network parameter is used for determining the first reward parameter of any user under each drug action with any user attribute information, the second network parameter is used for determining the second reward parameter of any user under each drug action, any user corresponds to one first reward parameter and one second reward parameter under one drug action, and the drug action time corresponding to the first reward parameter is longer than the drug action time corresponding to the second reward parameter;

The parameter determining module is used for determining each user rewarding parameter of the target user under the action of each drug based on each first target rewarding parameter of the target user and/or each second target rewarding parameter of the target user, wherein the target user corresponds to one user rewarding parameter under the action of one drug;

and the information display module is used for determining the maximum user rewarding parameter from the user rewarding parameters and outputting the drug information of the target drug with the maximum user rewarding parameter to the user interface so as to display the target drug to the target user.

With reference to the second aspect, in a possible implementation manner, the apparatus further includes:

the data acquisition module is used for acquiring sample data of at least two users, wherein the sample data of one user comprises user attribute information and sample medicine information of the user;

the sample input module is used for acquiring each first sample rewarding parameter and each second sample rewarding parameter of each user under the action of the sample medicament indicated by the sample medicament information, and inputting sample data of at least two users, each first sample rewarding parameter and each second sample rewarding parameter into the medicament rewarding prediction model;

and the parameter training module is used for training the first network parameter and the second network parameter of the drug rewarding prediction model based on the user attribute information of at least two users, the first sample rewarding parameters and the second sample rewarding parameters so as to acquire the capability of predicting the first rewarding parameters and the second rewarding parameters of any user under the action of each drug based on the user attribute information of any user.

With reference to the second aspect, in one possible implementation manner, the first network parameter includes a first model parameter and a first backhaul parameter, and the second network parameter includes a second model parameter and a second backhaul parameter;

the parameter training module comprises:

the expected parameter determining unit is used for determining first expected rewarding parameters of the users under the action of the sample medicine based on the first model parameters and the first feedback parameters, and determining second expected rewarding parameters of the users under the action of the sample medicine based on the second model parameters and the second feedback parameters;

the loss value determining unit is used for determining each loss value corresponding to the sample data of each user based on the first return parameter, the second return parameter, each first sample rewarding parameter, each second sample rewarding parameter, each first expected rewarding parameter and each second expected rewarding parameter;

and the parameter updating unit is used for iteratively updating the parameter values of the first model parameter and the parameter values of the second model parameter based on the loss values until the loss values are unchanged so as to acquire the capability of predicting the first rewarding parameter and the second rewarding parameter of any user under the action of each drug based on the user attribute information of any user.

With reference to the second aspect, in one possible implementation manner, the parameter determining module includes:

a weighting coefficient determining unit configured to determine a first weighting coefficient of a first target bonus parameter and a second weighting coefficient of a second target bonus parameter;

the first reward parameter determining unit is used for determining each first weighted reward parameter corresponding to each first target reward parameter based on the first weighting coefficient and each first target reward parameter of the target user, and determining each second weighted reward parameter corresponding to each second target reward parameter based on the second weighting coefficient and each second target reward parameter of the target user;

and the second rewarding parameter determining unit is used for determining each user rewarding parameter of the target user under the action of each medicine based on each first weighted rewarding parameter and each second weighted rewarding parameter, and one first weighted rewarding parameter and one second weighted rewarding parameter correspond to one user rewarding parameter.

a third rewarding parameter determining unit, configured to determine each first target rewarding parameter of the target user as each user rewarding parameter of the target user under the action of each drug;

a fourth rewarding parameter determining unit, configured to determine each second target rewarding parameter of the target user as each user rewarding parameter of the target user under the action of each drug;

With reference to the second aspect, in a possible implementation manner, the user interface includes an attribute information input area;

the information input module includes:

and an information acquisition unit configured to acquire target user attribute information of a target user based on the input instruction when the input instruction on the attribute information input area is detected.

In a third aspect, the present application provides a computer device comprising: a processor, a memory, a network interface;

the processor is connected to the memory and the network interface, wherein the network interface is used for providing a data communication function, the memory is used for storing a computer program, and the processor is used for calling the computer program to execute the drug information pushing method in the first aspect in the embodiment of the application.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, perform the method of pushing medication information of the first aspect of the present application.

In the embodiment of the application, the computer equipment can input the attribute information of the target user into the drug rewarding prediction model, and output each first target rewarding parameter and each second target rewarding parameter of the target user under the action of each drug through the drug rewarding prediction model, so that the drug rewarding prediction model can output the first target rewarding parameter and the second target rewarding parameter simultaneously, the rewarding parameter of the long-term ending is evaluated through the first target rewarding parameter, and the rewarding parameter of the short-term ending is evaluated through the second target rewarding parameter, thereby enhancing the expandability of the drug rewarding prediction model, and improving the interpretability, the safety, the selectivity and the traceability of the model. Further, the computer device may determine respective user reward parameters for the target user with respective medications based on respective first target reward parameters for the target user and/or respective second target reward parameters for the target user. At this time, the computer device can determine the maximum user rewarding parameter from the user rewarding parameters, and output the drug information of the target drug with the maximum user rewarding parameter to the user interface so as to display the target drug to the target user, thereby improving the accuracy of drug information pushing and having strong applicability.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a network architecture provided in the present application;

fig. 2 is a schematic flow chart of a method for pushing drug information provided in the present application;

FIG. 3 is a schematic diagram of the structure of a drug reward prediction model provided herein;

FIG. 4 is a schematic structural view of a drug information pushing device provided in the present application;

fig. 5 is a schematic structural diagram of a computer device provided in the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a network architecture provided in the present application. As shown in fig. 1, the network architecture may include a server 10 and a user terminal cluster, which may include a plurality of user terminals, as shown in fig. 1, and in particular may include a user terminal 100a, a user terminal 100b, a user terminal 100c, …, and a user terminal 100n.

The server 10 may be an independent physical server, or may be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (content delivery network, CDN), big data, and artificial intelligence platforms. Each user terminal in the user terminal cluster may include, but is not limited to: intelligent terminals such as smart phones, tablet computers, notebook computers, desktop computers, intelligent sound boxes, intelligent watches, and the like.

It may be understood that the computer device in the present application may be an entity terminal with a drug information pushing function, and the entity terminal may be the server 10 shown in fig. 1 or a user terminal, which is not limited herein.

As shown in fig. 1, the user terminals 100a, 100b, 100c, …, 100n may respectively perform network connection with the server 10, so that each user terminal may perform data interaction with the server 10 through the network connection. For example, the server 10 may output the drug information of the target drug to a user interface corresponding to a user terminal of the target user, so that the target user views the target drug on the user interface, where the user terminal of the target user may be any user terminal (such as the user terminal 100 a) in the user terminal cluster. The medicament for pushing to the target user, which is determined based on the medicament rewarding prediction model, can be collectively called as target medicament, and the model with the function of predicting the first rewarding parameter and the second rewarding parameter of any user under the action of each medicament can be called as medicament rewarding prediction model.

The drug information pushing method provided by the application can be suitable for drug information pushing scenes aiming at any disease, such as diabetes drug information pushing scenes, hypertension drug information pushing scenes or drug information pushing scenes of other diseases. Assuming that the target user is a doctor, the doctor can input basic information of the patient into the medicine rewards prediction model, and can output medicine information of the pushed target medicine to the user interface based on the basic information of the patient, at this time, the doctor can view the target medicine (the target medicine can be used as a preliminary diagnosis result) on the user interface, and then determine the medicine (such as the target medicine) suitable for the patient in combination with further diagnosis results of the doctor on the patient. Assuming that the target user is a patient, the patient can input own basic information to a self-service terminal (or a self-service machine for short) provided by medical institutions such as a hospital, a sanitary station or a social health, and the like, the self-service machine comprises the medicine rewarding prediction model, and medicine information of the recommended target medicine can be output to a user interface of the self-service machine based on the basic information of the patient. The patient can view the target medication in the user interface of the kiosk, and subsequent patients can purchase the target medication directly or can have doctors further diagnose the medication (e.g., the target medication) appropriate for the patient.

For convenience of description, a scenario of pushing diabetes drug information will be described below as an example, and will not be described in detail. The drug information pushing method, the drug information pushing device and the computer device of the present application will be described below with reference to fig. 2 to 5.

Referring to fig. 2, fig. 2 is a flowchart of a method for pushing drug information according to an embodiment of the present application. As shown in fig. 2, the method may include the following steps S101 to S104:

step S101, obtaining target user attribute information of a target user, and inputting the target user attribute information into a drug rewards prediction model.

It will be appreciated that prior to performing step S101, the computer device may first train model parameters of the medication rewards prediction model by sample data of at least two users and actual rewards parameters of each user, thereby obtaining a medication rewards prediction model for outputting a first rewards parameter and a second rewards parameter of any user under each medication. The drug rewards prediction model may be deep q-network (DQN) model, and the reinforcement learning method of the DQN model is an artificial intelligence method for optimizing a policy by taking actions (such as drugs) for states (such as user attribute information) through a policy (policy), and then optimizing the policy through the obtained expected rewards. The parameter values corresponding to the expected rewards may be expected rewards parameters (such as a first expected rewards parameter and a second expected rewards parameter described below), in other words, the values of the expected rewards parameters are used to represent the expected rewards. Where the policy refers to the way that certain actions should be taken in a particular state to maximize the expected rewards.

In some possible embodiments, the computer device may obtain sample data of at least two users, wherein the sample data of at least two users may be used to train the drug rewards prediction model, one user corresponding to one sample data, and one sample data may include user attribute information and sample drug information of the user. The user attribute information herein may include at least one of demographic information, health indicators for a drug of a target disease, and historical drug information (i.e., drug history), the drug indicated by the sample drug information being a sample drug. Demographic information may include gender, age, health, occupation, marital, cultural level, income, and other information, and health indicators may be understood as examination indicators corresponding to the target disease. The sample drugs used by different users for the target diseases may be the same or different.

Further, the computer device may obtain each first sample reward parameter and each second sample reward parameter for each user under the influence of the sample medication, and input the sample data of at least two users, each first sample reward parameter, and each second sample reward parameter into the medication reward prediction model. The actual long-term rewards parameters of the user under the action of the sample medicament may be referred to as the first sample rewards parameters. The actual short-term rewards parameter of the user under the influence of the sample medication may also be referred to as a second sample rewards parameter. Wherein the time period of the drug action corresponding to the first sample reward parameter is longer than the time period of the drug action corresponding to the second sample reward parameter. The rewards are understood to be the influence degree of the user on the health index after taking the sample medicine for a period of time, and the value of the rewards parameter is used for indicating the influence degree. For example, a prize parameter 1 is used to indicate a degree of influence 1, and a prize parameter 2 is used to indicate a degree of influence 2. If the prize parameter 1 is greater than the prize parameter 2, then it is indicated that the degree of influence 1 is greater than the degree of influence 2.

Further, the computer device may train the first network parameters and the second network parameters of the medication rewards prediction model based on the user attribute information of at least two users, each first sample rewards parameter, and each second sample rewards parameter to obtain the ability to predict the first rewards parameter and the second rewards parameter of any user under each medication based on the user attribute information (e.g., target user attribute information) of any user (e.g., target user). The first network parameter may be used to determine a first reward parameter (may also be referred to as a long-term reward parameter) of any user with any user attribute information under various actions of the drug, and the second network parameter may be used to determine a second reward parameter (may also be referred to as a short-term reward parameter) of any user under various actions of the drug, where the time period of action of the drug corresponding to the first reward parameter is longer than the time period of action of the drug corresponding to the second reward parameter. The first network parameters may include a first model parameter and a first backhaul parameter, and the second network parameters may include a second model parameter and a second backhaul parameter. The parameters iteratively updated based on the loss value in the drug reward prediction model may be collectively referred to herein as model parameters (e.g., a first model parameter and a second model parameter). The backhaul parameter corresponding to the first prize parameter in the first network parameter may be referred to as a first backhaul parameter (may also be referred to as a first backhaul factor), and the backhaul parameter corresponding to the second prize parameter in the second network parameter may also be referred to as a second backhaul parameter (may also be referred to as a second backhaul factor). The feedback parameters are understood herein to be parameters that are constant during the training of the medication rewards prediction model. The time period of the drug action corresponding to the first reward parameter is longer than the time period of the drug action corresponding to the second reward parameter, so that the first return parameter is greater than the second return parameter, for example, the first return parameter is 0.9 or other values, and the second return parameter is 0.2 or other values.

In some casesIn a possible embodiment, the computer device may determine first expected reward parameters of the users under the action of the sample medicament based on the first model parameters and the first return parameters, and determine second expected reward parameters of the users under the action of the sample medicament based on the second model parameters and the second return parameters. Wherein one user corresponds to a first expected reward parameter with a sample drug and one user corresponds to a second expected reward parameter with a sample drug. At this time, the computer device may determine each loss value corresponding to the sample data of each user by using the loss function through the first backhaul parameter, the second backhaul parameter, each first sample reward parameter, each second sample reward parameter, each first expected reward parameter, and each second expected reward parameter. Wherein a first sample prize parameter, a second sample prize parameter, a first expected prize parameter, and a second expected prize parameter correspond to a loss value corresponding to sample data of a user. Wherein the computer device can determine the loss value l corresponding to the sample data of the user according to the following formula (1) _loss ：

l _loss ＝(Q _short (s _t ,a _t )+Q _long (s _t ,a _t )-(r _short +r _long +max _a (γ _short *Q _short (s _t+1 ,a)+γ _long *Q _long (s _t+1 ,a)))) ² Formula (1)

Wherein a is _t Can represent the sample drug (i.e., sample drug in sample data) entered into the drug reward prediction model at the current time t, s _t Can represent the user attribute information (i.e., user attribute information of the user in the sample data) of the input drug rewards prediction model at the current time t _t+1 User attribute information, Q, which may represent the input of a medication rewards prediction model at the next time t+1 _long (s _t ,a _t ) Can represent the first expected reward parameter, Q, of the user at the current time t _short (s _t ,a _t ) Can represent a second expected reward parameter, r, of the user at the current time t _long Can be expressed at the present timeThe first sample reward parameter of the user at time t, r _short A second sample prize parameter, gamma, indicative of the user at the current time t _long Can represent the first return coefficient, gamma _short Can represent the second return coefficient, Q _long (s _t+1 A) may represent the first expected reward parameter, Q, for the user at the next time t+1 _short (s _t+1 A) may represent a second expected reward parameter for the user at the next time t+1.

After obtaining each loss value based on the above formula (1), the computer device may iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter based on each loss value until the loss value is unchanged, at this time, the training of the drug rewards prediction model is stopped, the iteratively updated first model parameter is used as a final first model parameter of the drug rewards prediction model, and the iteratively updated second model parameter is used as a final second model parameter of the drug rewards prediction model. The medication rewards prediction model is also shown to have the ability to predict the first rewards parameter and the second rewards parameter of any user under each medication based on the user attribute information of any user.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a drug reward prediction model of the present application. As shown in fig. 3, the drug rewards prediction model may include multiple convolution layers (such as convolution layers 10a to 10 c) and multiple fully-connected layers (such as fully-connected layers 20a and 20 b), the input of the drug rewards prediction model is user attribute information of the user, and the output of the drug rewards prediction model is a first rewards parameter (such as Q _long ) And a second prize parameter (e.g. Q _short ). When the feature vector corresponding to the user attribute information is a one-dimensional vector (e.g., the user attribute information is the follow-up information of the patient), the drug reward prediction model may include the full connection layer 20a and the full connection layer 20b, and does not include the convolution layers 10a to 10c. The drug rewards prediction model here includes a first network parameter and a second network parameter, wherein the fully connected layer 20b (i.e. the second fully connected layer) may be configured with the first network parameter and the second network parameter, and as shown in fig. 3, the fully connected layer 20b may include two fully connected layersA connection layer (such as a full connection layer 200b and a full connection layer 201 b), wherein the full connection layer 200b is configured with a first network parameter, and the full connection layer 200b is used for processing the user attribute information based on the first network parameter to output a first rewarding parameter Q of any user under the action of each drug _long The method comprises the steps of carrying out a first treatment on the surface of the The full connection layer 201b is configured with a second network parameter, and the full connection layer 201b is configured to process the user attribute information based on the second network parameter to output a second rewarding parameter Q of any user under the action of each drug _short 。

For convenience of description, in a diabetes drug information pushing scenario (may also be referred to as a grouping scenario of diabetes patients, the grouping refers to a doctor's prescription scheme), the computer device may obtain sample data of at least two users, where the sample data of at least two users may be long-term follow-up data of a large number of diabetes patients, and one sample data may include one patient's one-time follow-up data. User attribute information may be included in the sample data herein, and may include, but is not limited to, age, gender, medical history, sample medications (i.e., medications in a physician prescribing regimen or those actually taken by a patient, such as biguanides or sulfonylureas), glycosylated hemoglobin values, creatinine values, and other health indicators for diabetes. At this time, the computer device may acquire each first sample reward parameter and each first sample reward parameter of each user under the action of the sample medicament, and input the user attribute information of each user, each first sample reward parameter and each first sample reward parameter into the medicament reward prediction model. For example, the first sample reward parameter may indicate whether a diabetic complication occurred in the last follow-up after the diabetic took the medication, the first sample reward parameter being 0 when the diabetic complication occurred, and the first sample reward parameter being 1 when the diabetic complication did not occur. For example, the second sample prize parameter may indicate whether the glycosylated hemoglobin value meets the standard in the next follow-up after the diabetic patient takes the drug, the second sample prize parameter being 1 when the glycosylated hemoglobin value of the diabetic patient meets the standard, and the second sample prize parameter being 0 when the glycosylated hemoglobin value of the diabetic patient does not meet the standard.

Further, the computer device may output each first expected reward parameter of each user under the action of the sample medicament based on the fully-connected layer 200b, and each second expected reward parameter of each user under the action of the sample medicament based on the fully-connected layer 201 b. Further, the computer device may calculate the first return parameter, the second return parameter, each first sample reward parameter, each second sample reward parameter, each first expected reward parameter, and each second expected reward parameter by using the loss function, so as to obtain each loss value corresponding to the sample data of each user. At this time, the computer device may iteratively update the parameter values of the first model parameter and the parameter values of the second model parameter according to the loss values corresponding to all the sample data until the loss values are substantially unchanged (e.g., the loss values are minimal), indicating that the model training of the medication reward prediction model has been completed (i.e., the medication reward prediction model converges). The first network parameters configured in the fully connected layer 200b include the first backhaul parameters and the iteratively updated first model parameters, and the second network parameters configured in the fully connected layer 201b include the second backhaul parameters and the iteratively updated second model parameters. The first feedback parameter in the full-connection layer 200b and the first model parameter after the iterative update may be used to predict a first rewarding parameter of any user under the action of each drug, and the second feedback parameter in the full-connection layer 201b and the second model parameter after the iterative update may be used to predict a second rewarding parameter of any user under the action of each drug. It follows that the medication rewards prediction model at this time has the ability to predict the first rewards parameter and the second rewards parameter of any user under the action of each medication based on the user attribute information of any user.

After training to obtain the drug reward prediction model, the computer device may, upon detecting an input instruction on an attribute information input area in the user interface, obtain target user attribute information for the target user based on the input instruction and input the target user attribute information into the drug reward prediction model. For example, the target user may input the target user attribute information in the above-described attribute information input area, and click a determination completion button in the user interface after the input is completed, at which time the computer device may detect an input instruction on the attribute information input area, thereby obtaining the target user attribute information of the target user. Wherein the target user attribute information may include at least one of demographic information, health indicators for target disease medications, and historical medication information.

Step S102, outputting each first target rewarding parameter and each second target rewarding parameter of the target user under the action of each drug through the drug rewarding prediction model.

In some possible embodiments, the computer device may determine each first target reward parameter of the target user under each drug action based on the first network parameter (i.e., the first backhaul parameter and the iteratively updated first model parameter), for example, the first network parameter may be the first network parameter in the fully-connected layer 200b after convergence of the drug reward prediction model. Wherein the target user corresponds to a first target reward parameter under the action of a medicament. Further, the computer device may determine each second target reward parameter of the target user under each drug action based on the second network parameter (i.e. the second backhaul parameter and the iteratively updated second model parameter), for example, the second network parameter may be the second network parameter in the fully connected layer 201b after convergence of the drug reward prediction model. Wherein the target user corresponds to a second target reward parameter under the action of a medication.

Step S103, determining the rewarding parameters of the target users under the action of the medicaments based on the first target rewarding parameters of the target users and/or the second target rewarding parameters of the target users.

In some possible implementations, the computer device may determine a first weighting factor for the first target reward parameter and a second weighting factor for the second target reward parameter when the target user has both long-term and short-term drug action requirements. The first weighting factor (e.g., 1 or other value) and the second weighting factor (e.g., 1 or other value) herein may be user-set weighting factors or default configured weighting factors for the medication rewards prediction model. At this time, the computer device may determine first weighted reward parameters corresponding to the first target reward parameters based on the first weighting coefficients and the first target reward parameters of the target user, and determine second weighted reward parameters corresponding to the second target reward parameters based on the second weighting coefficients and the second target reward parameters of the target user. Further, the computer device may sum the first weighted reward parameters and the second weighted reward parameters to obtain user reward parameters for the target user with each medication, one first weighted reward parameter and one second weighted reward parameter corresponding to one user reward parameter. Optionally, the computer device may also directly sum each first target reward parameter and each second target reward parameter to obtain each user reward parameter of the target user under the action of each drug, where one first target reward parameter and one second target reward parameter correspond to one user reward parameter.

Alternatively, in some possible embodiments, the computer device may determine each first target reward parameter of the target user as each user reward parameter of the target user with each medication when the target user has a long-term medication need. Optionally, when the target user has a short-term drug action requirement, the computer device may determine each second target reward parameter of the target user as each user reward parameter of the target user under each drug action, specifically, may be determined according to an actual application scenario, and is not limited herein.

Step S104, determining the maximum user rewarding parameter from the user rewarding parameters, and outputting the drug information of the target drug with the maximum user rewarding parameter to a user interface so as to display the target drug to the target user.

In some possible implementations, the computer device may rank the user reward parameters (e.g., from a major to a minor or from a minor to a major ranking), derive a sequence of user reward parameters, and take the first or last user reward parameter in the sequence of user reward parameters as the maximum user reward parameter. Further, the computer device may output drug information for the target drug with the largest user rewards parameter to the user interface to present the target drug to the target user. Taking the scenario of pushing the diabetes drug information as an example, when the drug action requirement of the target user is that the complications of diabetes do not occur in a long term, the maximum user rewarding parameter can be the maximum first target rewarding parameter in the first target rewarding parameters, and at this time, the computer equipment can output the drug information of the target drug with the maximum first target rewarding parameter to the user interface. When the drug action requirement of the target user is that the glycosylated hemoglobin value reaches the standard in a short period, the maximum user reward parameter can be the maximum second target reward parameter in the second target reward parameters, and the computer equipment can output the drug information of the target drug with the maximum second target reward parameter to the user interface. When the drug action requirement of the target user is that the diabetic complication does not occur in a long term and the glycosylated hemoglobin value reaches the standard in a short term, each user reward parameter can be determined by each first weighted reward parameter and each second weighted reward parameter, and then the computer equipment can output the drug information of the target drug with the maximum user reward parameter to the user interface.

In some possible implementations, the target user may then view the target medication on the user interface and send feedback information for the target medication to the computer device. For example, the feedback information may include that the target medication is different from the historical medication that the target user previously took, or that the target user took the target medication less effectively than the historical medication. Further, after receiving the feedback information (for example, the target drug is different from the historical drug or the effect of the target drug is not as good as that of the historical drug), the computer device can adjust the first network parameter and the second network parameter of the drug rewards prediction model to better predict the first rewards parameter and the second rewards parameter of any user (for example, the target user) under the action of each drug, so as to push the proper drug information to the target user.

Further, referring to fig. 4, fig. 4 is a schematic structural diagram of a drug information pushing device according to an embodiment of the present application. The medication information pushing device may be a computer program (including program code) running in a computer apparatus, for example, the medication information pushing device is an application software; the drug information pushing device can be used for executing corresponding steps in the method provided by the embodiment of the application. As shown in fig. 4, the drug information pushing device 1 may be operated on a computer device, which may be the server 10 in the embodiment corresponding to fig. 1. The medication information pushing apparatus 1 may include: a data acquisition module 10, a sample input module 20, a parameter training module 30, an information input module 40, a parameter output module 50, a parameter determination module 60, and an information presentation module 70.

The information input module 40 is configured to obtain target user attribute information of a target user, and input the target user attribute information into the drug rewards prediction model, where the target user attribute information includes at least one of demographic information, health index for target disease medication, and historical medication information.

In some possible implementations, the user interface includes an attribute information input area;

The information input module 40 includes: an information acquisition unit 401.

An information acquisition unit 401 for, when an input instruction on the attribute information input area is detected, acquiring target user attribute information of a target user based on the input instruction.

The specific implementation of the information obtaining unit 401 may be referred to the description of step S101 in the embodiment corresponding to fig. 2, and the detailed description will not be repeated here.

The parameter output module 50 is configured to output, through a medication reward prediction model, each first target reward parameter and each second target reward parameter of a target user under each medication action, where the medication reward prediction model includes a first network parameter and a second network parameter, the first network parameter is used to determine a first reward parameter of any user under each medication action with any user attribute information, the second network parameter is used to determine a second reward parameter of any user under each medication action, and any user corresponds to one first reward parameter and one second reward parameter under one medication action, and a time period of the medication action corresponding to the first reward parameter is longer than a time period of the medication action corresponding to the second reward parameter.

The parameter determining module 60 is configured to determine each user reward parameter of the target user under each drug action based on each first target reward parameter of the target user and/or each second target reward parameter of the target user, where the target user corresponds to one user reward parameter under one drug action.

In some possible implementations, the parameter determination module 60 includes: a weighting coefficient determination unit 601, a first bonus parameter determination unit 602, and a second bonus parameter determination unit 603.

A weighting coefficient determining unit 601 for determining a first weighting coefficient of a first target bonus parameter and a second weighting coefficient of a second target bonus parameter;

a first reward parameter determining unit 602, configured to determine, based on the first weighting coefficient and each first target reward parameter of the target user, each first weighted reward parameter corresponding to each first target reward parameter, and determine, based on the second weighting coefficient and each second target reward parameter of the target user, each second weighted reward parameter corresponding to each second target reward parameter;

a second prize parameter determining unit 603, configured to determine, based on each first weighted prize parameter and each second weighted prize parameter, each user prize parameter of the target user under each medicament action, where one first weighted prize parameter and one second weighted prize parameter correspond to one user prize parameter.

The specific implementation manner of the weighting coefficient determining unit 601, the first prize parameter determining unit 602, and the second prize parameter determining unit 603 may refer to the description of step S103 in the embodiment corresponding to fig. 2, and will not be further described herein.

In some possible embodiments, the parameter determination module 60 further includes: the third prize parameter determining unit 604.

A third rewarding parameter determining unit 604, configured to determine each first target rewarding parameter of the target user as each user rewarding parameter of the target user under the action of each drug;

The specific implementation manner of the third prize parameter determining unit 604 may refer to the description of step S103 in the embodiment corresponding to fig. 2, and the detailed description will not be repeated here.

In some possible embodiments, the parameter determination module 60 further includes: fourth bonus parameter determining unit 605.

A fourth rewarding parameter determining unit 605 for determining each second target rewarding parameter of the target user as each user rewarding parameter of the target user under each drug action;

The specific implementation of the fourth prize parameter determining unit 605 may be referred to the description of step S103 in the embodiment corresponding to fig. 2, and will not be further described herein.

The information display module 70 is configured to determine a maximum user rewarding parameter from the user rewarding parameters, and output drug information of the target drug with the maximum user rewarding parameter to the user interface to display the target drug to the target user.

In some possible embodiments, the above-mentioned drug information pushing device 1 further includes:

a data acquisition module 10, configured to acquire sample data of at least two users, where the sample data of one user includes user attribute information and sample drug information of the user;

the sample input module 20 is configured to obtain each first sample reward parameter and each second sample reward parameter of each user under the action of the sample drug indicated by the sample drug information, and input sample data of at least two users, each first sample reward parameter and each second sample reward parameter into the drug reward prediction model;

The parameter training module 30 is configured to train the first network parameter and the second network parameter of the drug rewards prediction model based on the user attribute information of at least two users, the first sample rewards parameters and the second sample rewards parameters, so as to obtain the ability of predicting the first rewards parameters and the second rewards parameters of any user under the action of each drug based on the user attribute information of any user.

In some possible implementations, the first network parameters include a first model parameter and a first backhaul parameter, and the second network parameters include a second model parameter and a second backhaul parameter;

the parameter training module 30 includes: an expected parameter determination unit 301, a loss value determination unit 302, and a parameter update unit 303.

An expected parameter determining unit 301, configured to determine, based on the first model parameter and the first feedback parameter, each first expected reward parameter of each user under the action of the sample drug, and determine, based on the second model parameter and the second feedback parameter, each second expected reward parameter of each user under the action of the sample drug;

a loss value determining unit 302, configured to determine each loss value corresponding to the sample data of each user based on the first backhaul parameter, the second backhaul parameter, each first sample reward parameter, each second sample reward parameter, each first expected reward parameter, and each second expected reward parameter;

And a parameter updating unit 303, configured to iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter based on each loss value until the loss value is unchanged, so as to obtain the ability of predicting the first rewarding parameter and the second rewarding parameter of any user under the action of each drug based on the user attribute information of any user.

The specific implementation manners of the expected parameter determining unit 301, the loss value determining unit 302, and the parameter updating unit 303 may refer to the description of the model training of the drug rewards prediction model in step S101 in the embodiment corresponding to fig. 2, and will not be described in detail here.

The specific implementation manners of the data acquisition module 10, the sample input module 20, the parameter training module 30, the information input module 40, the parameter output module 50, the parameter determination module 60, and the information display module 70 can be referred to the description of step S101 to step S104 in the embodiment corresponding to fig. 2, and the detailed description will not be repeated here. In addition, the description of the beneficial effects of the same method is omitted.

Further, referring to fig. 5, fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 5, the computer device 1000 may be the server 10 in the corresponding embodiment of fig. 1, and the computer device 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, a memory 1005, at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a display (display), a keyboard (keypad), and the network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the aforementioned processor 1001. As shown in fig. 5, an operating system, a network communication module, a user interface module, and a device control application program may be included in the memory 1005, which is one type of computer storage medium.

In the computer device 1000 shown in fig. 5, the network interface 1004 is mainly used for network communication with a user terminal; while user interface 1003 is primarily used as an interface for providing input to a user; and the processor 1001 may be used to invoke a device control application stored in the memory 1005 to implement:

It should be understood that the computer device 1000 described in the embodiment of the present application may perform the description of the method for pushing drug information in the embodiment corresponding to fig. 2, and may also perform the description of the device 1 for pushing drug information in the embodiment corresponding to fig. 4, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.

Furthermore, it should be noted here that: the embodiment of the present application further provides a computer readable storage medium, in which a computer program executed by the aforementioned drug information pushing device 1 is stored, and the computer program includes program instructions, when executed by the processor, can execute the description of the drug information pushing method in the embodiment corresponding to fig. 2, and therefore, a description thereof will not be repeated here. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application. As an example, program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or, alternatively, across multiple computing devices distributed across multiple sites and interconnected by a communication network, where the multiple computing devices distributed across multiple sites and interconnected by a communication network may constitute a blockchain system.

In one aspect of the present application, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the drug information pushing method provided in the embodiments of the present application.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of computer programs, which may be stored on a computer-readable storage medium, and which, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.

The computer readable storage medium may be the drug information pushing device provided in any of the foregoing embodiments or an internal storage unit of the foregoing device, for example, a hard disk or a memory of an electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the electronic device. The computer readable storage medium may also include a magnetic disk, an optical disk, a read-only memory (ROM), a random access memory (random access memory, RAM), or the like. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

The terms first, second and the like in the claims and in the description and drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments. The term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims

1. A medication information pushing method, comprising:

acquiring target user attribute information of a target user, and inputting the target user attribute information into a drug rewarding prediction model, wherein the target user attribute information comprises at least one of demographic information, health indexes aiming at target disease drug administration and historical drug administration information;

outputting each first target rewarding parameter and each second target rewarding parameter of the target user under each drug action through the drug rewarding prediction model, wherein the drug rewarding prediction model comprises a first network parameter and a second network parameter, the first network parameter is used for determining the first rewarding parameter of any user with any user attribute information under each drug action, the second network parameter is used for determining the second rewarding parameter of any user under each drug action, the first rewarding parameter and the second rewarding parameter of any user correspond to one drug action, and the drug action time corresponding to the first rewarding parameter is longer than the drug action time corresponding to the second rewarding parameter;

determining a maximum user rewarding parameter from the user rewarding parameters, and outputting medicine information of a target medicine with the maximum user rewarding parameter to a user interface so as to display the target medicine to the target user;

wherein the first network parameters include a first model parameter and a first backhaul parameter, and the second network parameters include a second model parameter and a second backhaul parameter;

before the obtaining the target user attribute information of the target user, the method further includes:

acquiring first sample reward parameters and second sample reward parameters of each user under the action of the sample medicine indicated by the sample medicine information, and inputting sample data of at least two users, the first sample reward parameters and the second sample reward parameters into a medicine reward prediction model;

Determining first expected reward parameters of the users under the action of the sample medicine based on the first model parameters and the first return parameters, and determining second expected reward parameters of the users under the action of the sample medicine based on the second model parameters and the second return parameters;

determining loss values corresponding to sample data of the users based on the first feedback parameter, the second feedback parameter, the first sample reward parameter, the second sample reward parameter, the first expected reward parameter and the second expected reward parameter;

and iteratively updating the parameter values of the first model parameter and the parameter values of the second model parameter based on the loss values until the loss values are unchanged, so as to obtain the capability of predicting the first rewarding parameter and the second rewarding parameter of any user under the action of each drug based on the user attribute information of the any user.

2. The method of claim 1, wherein said determining respective user rewards parameters of said target user for said respective medications based on respective first target rewards parameters of said target user and respective second target rewards parameters of said target user comprises:

determining first weighted reward parameters corresponding to the first target reward parameters based on the first weighting coefficients and the first target reward parameters of the target user, and determining second weighted reward parameters corresponding to the second target reward parameters based on the second weighting coefficients and the second target reward parameters of the target user;

determining each user rewarding parameter of the target user under the action of each medicament based on each first weighted rewarding parameter and each second weighted rewarding parameter, wherein one first weighted rewarding parameter and one second weighted rewarding parameter correspond to one user rewarding parameter.

3. The method of claim 1, wherein said determining respective user rewards parameters of the target user for the respective medications based on respective first target rewards parameters of the target user comprises:

Wherein the maximum user rewarding parameter is the maximum first target rewarding parameter in the first target rewarding parameters.

4. The method of claim 1, wherein said determining respective user rewards parameters of said target user for said respective medications based on respective second target rewards parameters of said target user comprises:

wherein the maximum user rewarding parameter is the maximum second target rewarding parameter in the second target rewarding parameters.

5. The method of any one of claims 1-4, wherein the user interface includes an attribute information input area;

the obtaining the target user attribute information of the target user includes:

and when an input instruction on the attribute information input area is detected, acquiring target user attribute information of the target user based on the input instruction.

6. A medication information pushing apparatus, comprising:

The parameter output module is used for outputting each first target rewarding parameter and each second target rewarding parameter of the target user under each drug action through the drug rewarding prediction model, wherein the drug rewarding prediction model comprises a first network parameter and a second network parameter, the first network parameter is used for determining the first rewarding parameter of any user under each drug action with any user attribute information, the second network parameter is used for determining the second rewarding parameter of any user under each drug action, the first rewarding parameter and the second rewarding parameter of any user under one drug action are corresponding, and the drug action duration corresponding to the second rewarding parameter is longer when the drug corresponding to the first rewarding parameter acts;

the information display module is used for determining the maximum user rewarding parameter from the user rewarding parameters, and outputting the drug information of the target drug with the maximum user rewarding parameter to a user interface so as to display the target drug to the target user;

Wherein the first network parameters include a first model parameter and a first backhaul parameter, and the second network parameters include a second model parameter and a second backhaul parameter; the apparatus further comprises:

the sample input module is used for acquiring first sample rewarding parameters and second sample rewarding parameters of each user under the action of the sample medicine indicated by the sample medicine information, and inputting the sample data of the at least two users, the first sample rewarding parameters and the second sample rewarding parameters into the medicine rewarding prediction model;

an expected parameter determining unit, configured to determine, based on the first model parameter and the first feedback parameter, first expected reward parameters of the users under the action of the sample drug, and determine, based on the second model parameter and the second feedback parameter, second expected reward parameters of the users under the action of the sample drug;

a loss value determining unit, configured to determine each loss value corresponding to the sample data of each user based on the first backhaul parameter, the second backhaul parameter, each first sample reward parameter, each second sample reward parameter, each first expected reward parameter, and each second expected reward parameter;

And the parameter updating unit is used for iteratively updating the parameter values of the first model parameter and the parameter values of the second model parameter based on the loss values until the loss values are unchanged so as to acquire the capability of predicting the first rewarding parameter and the second rewarding parameter of any user under the action of each drug based on the user attribute information of the any user.

7. The apparatus of claim 6, wherein the parameter determination module comprises:

a weighting coefficient determining unit configured to determine a first weighting coefficient of the first target bonus parameter and a second weighting coefficient of the second target bonus parameter;

a first reward parameter determining unit, configured to determine, based on the first weighting coefficient and each first target reward parameter of the target user, each first weighted reward parameter corresponding to each first target reward parameter, and determine, based on the second weighting coefficient and each second target reward parameter of the target user, each second weighted reward parameter corresponding to each second target reward parameter;

and the second rewarding parameter determining unit is used for determining each user rewarding parameter of the target user under the action of each drug based on each first weighted rewarding parameter and each second weighted rewarding parameter, and one first weighted rewarding parameter and one second weighted rewarding parameter correspond to one user rewarding parameter.

8. The apparatus of claim 6, wherein the parameter determination module comprises:

9. A computer device, comprising: a processor, a memory, and a network interface;

the processor is connected to a memory for providing data communication functions, a network interface for storing program code, and for invoking the program code to perform the method of any of claims 1-5.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method of any of claims 1-5.