WO2022227176A1 - Drug information pushing method and apparatus, computer device, and storage medium - Google Patents

Drug information pushing method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2022227176A1
WO2022227176A1 PCT/CN2021/096712 CN2021096712W WO2022227176A1 WO 2022227176 A1 WO2022227176 A1 WO 2022227176A1 CN 2021096712 W CN2021096712 W CN 2021096712W WO 2022227176 A1 WO2022227176 A1 WO 2022227176A1
Authority
WO
WIPO (PCT)
Prior art keywords
reward
parameter
user
target
drug
Prior art date
Application number
PCT/CN2021/096712
Other languages
French (fr)
Chinese (zh)
Inventor
徐卓扬
孙行智
胡岗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022227176A1 publication Critical patent/WO2022227176A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a method, device, computer equipment and storage medium for pushing drug information.
  • DRL deep reinforcement learning
  • the inventors realized that due to the essential difference between long-term outcomes and short-term outcomes, the essential difference is mainly reflected in the difference in the distance of action between long-term outcomes and short-term outcomes (for example, short-term outcomes are mainly affected by the most recent drug, and long-term outcomes are mainly affected by longer-term outcomes. drug effects before time), thus resulting in poor scalability of the DRL model.
  • the embodiments of the present application provide a drug information push method, device, computer equipment and storage medium, which can enhance the scalability of a drug reward prediction model, thereby improving the accuracy of drug information push.
  • the application provides a method for pushing drug information, the method comprising:
  • target user attribute information of the target user input the target user attribute information into the drug reward prediction model, and the target user attribute information includes at least one of demographic information, health indicators for drug use for the target disease, and historical drug use information;
  • Each first target reward parameter and each second target reward parameter of the target user under the action of each drug are output through the drug reward prediction model, wherein the drug reward prediction model includes the first network parameter and the second network parameter, and the first network parameter uses It is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine the second reward parameter of any user under the action of various drugs, and any user is under the action of various drugs.
  • a drug corresponds to a first reward parameter and a second reward parameter, and the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter;
  • each user reward parameter of the target user under the action of each drug is determined, wherein the target user corresponds to one user under the action of one drug reward parameters;
  • the maximum user reward parameter is determined from each user reward parameter, and the drug information of the target drug with the maximum user reward parameter is output to the user interface to display the target drug to the target user.
  • the above-mentioned device further includes:
  • a data acquisition module configured to acquire sample data of at least two users, and the sample data of one user includes user attribute information and sample drug information of the user;
  • the sample input module is used to obtain each first sample reward parameter and each second sample reward parameter of each user under the action of the sample drug indicated by the sample drug information, and combine the sample data of at least two users, each first sample This reward parameter and each second sample reward parameter are input into the drug reward prediction model;
  • the parameter training module is used to train the first network parameters and the second network parameters of the drug reward prediction model based on the user attribute information of at least two users, each first sample reward parameter and each second sample reward parameter, so as to obtain the parameters based on any parameter.
  • the user attribute information of a user predicts the ability of the first reward parameter and the second reward parameter of any user under the action of each drug.
  • the present application provides a computer device, including: a processor, a memory, and a network interface;
  • the processor is connected to a memory and a network interface, wherein the network interface is used to provide a data communication function, the memory is used to store a computer program, and the processor is used to call the computer program to execute the first aspect in the embodiment of the present application.
  • the drug information push method, the drug push method includes:
  • target user attribute information of the target user input the target user attribute information into the drug reward prediction model, and the target user attribute information includes at least one of demographic information, health indicators for drug use for the target disease, and historical drug use information;
  • Each first target reward parameter and each second target reward parameter of the target user under the action of each drug are output through the drug reward prediction model, wherein the drug reward prediction model includes the first network parameter and the second network parameter, and the first network parameter uses It is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine the second reward parameter of any user under the action of various drugs, and any user is under the action of various drugs.
  • a drug corresponds to a first reward parameter and a second reward parameter, and the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter;
  • each user reward parameter of the target user under the action of each drug is determined, wherein the target user corresponds to one user under the action of one drug reward parameters;
  • the maximum user reward parameter is determined from each user reward parameter, and the drug information of the target drug with the maximum user reward parameter is output to the user interface to display the target drug to the target user.
  • the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program includes program instructions that, when executed by a processor, execute the above-mentioned first step in the present application.
  • the drug information push method in one aspect, the drug push method includes:
  • target user attribute information of the target user input the target user attribute information into the drug reward prediction model, and the target user attribute information includes at least one of demographic information, health indicators for drug use for the target disease, and historical drug use information;
  • Each first target reward parameter and each second target reward parameter of the target user under the action of each drug are output through the drug reward prediction model, wherein the drug reward prediction model includes the first network parameter and the second network parameter, and the first network parameter uses It is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine the second reward parameter of any user under the action of various drugs, and any user is under the action of various drugs.
  • a drug corresponds to a first reward parameter and a second reward parameter, and the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter;
  • each user reward parameter of the target user under the action of each drug is determined, wherein the target user corresponds to one user under the action of one drug reward parameters;
  • the maximum user reward parameter is determined from each user reward parameter, and the drug information of the target drug with the maximum user reward parameter is output to the user interface to display the target drug to the target user.
  • the embodiment of the present application enhances the scalability of the drug reward prediction model, and improves the interpretability, security, selectivity and traceability of the model, thereby improving the accuracy of drug information push and having strong applicability.
  • FIG. 1 is a schematic structural diagram of a network architecture provided by the application.
  • Fig. 2 is the schematic flow chart of the drug information push method provided by the application
  • Fig. 3 is the structural representation of the drug reward prediction model provided by the application.
  • FIG. 4 is a schematic structural diagram of a drug information push device provided by the present application.
  • FIG. 5 is a schematic structural diagram of a computer device provided by the present application.
  • the technical solutions of the present application may relate to the technical field of artificial intelligence, and may be applied to scenarios such as smart medical treatment such as medical information push, so as to realize digital medical treatment and promote the construction of smart cities.
  • the data involved in this application such as attribute information and/or target drug information, may be stored in a database, or may be stored in a blockchain, such as distributed storage through a blockchain, which is not limited in this application.
  • FIG. 1 is a schematic structural diagram of a network architecture provided by the present application.
  • the network architecture may include a server 10 and a user terminal cluster, and the user terminal cluster may include multiple user terminals, as shown in FIG. ..., the user terminal 100n.
  • the server 10 may be an independent physical server, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content distribution networks (content delivery network, CDN), big data and artificial intelligence platforms and other basic cloud computing services cloud servers.
  • Each user terminal in the user terminal cluster may include, but is not limited to, smart terminals such as smart phones, tablet computers, notebook computers, desktop computers, smart speakers, and smart watches.
  • the computer device in this application may be an entity terminal with a drug information push function
  • the entity terminal may be the server 10 as shown in FIG. 1 or a user terminal, which is not limited herein.
  • the user terminal 100a, the user terminal 100b, the user terminal 100c, . . . , and the user terminal 100n can be respectively connected to the above-mentioned server 10 through a network, so that each user terminal can exchange data with the server 10 through the network connection.
  • the server 10 may output the drug information of the target drug to the user interface corresponding to the user terminal of the target user, so that the target user can view the target drug on the user interface, wherein the user terminal of the target user may be a user terminal in the user terminal cluster.
  • Any one of the user terminals eg, user terminal 100a).
  • the drugs determined based on the drug reward prediction model and used for pushing to target users may be collectively referred to as target drugs.
  • a functional model is called a drug reward prediction model.
  • the drug information push method provided in this application can be applied to a drug information push scenario for any disease, such as a diabetes drug information push scenario, a hypertension drug information push scenario, or a drug information push scenario for other diseases.
  • the target user is a doctor
  • the doctor can input the patient's basic information into the drug reward prediction model, and can output the pushed drug information of the target drug to the user interface based on the patient's basic information.
  • the doctor can view the information on the user interface.
  • the target drug here the target drug can be used as the preliminary diagnosis result), and then combined with the further diagnosis results of the patient to determine the appropriate drug for the patient (such as the above-mentioned target drug).
  • the patient can input their basic information to the self-service terminal (or simply self-service machine, etc.) provided by medical institutions such as hospitals, health stations or social health institutions.
  • the self-service machine contains the above-mentioned drug reward prediction model, which can be based on
  • the basic information of the patient outputs the drug information of the recommended target drug to the user interface of the self-service machine.
  • the patient can view the target drug in the user interface of the self-service machine, and the patient can purchase the target drug directly, or the doctor can further diagnose and determine the drug suitable for the patient (such as the above-mentioned target drug).
  • FIG. 2 is a schematic flowchart of a method for pushing drug information provided by an embodiment of the present application. As shown in FIG. 2, the method may include the following steps S101-S104:
  • step S101 target user attribute information of the target user is acquired, and the target user attribute information is input into a drug reward prediction model.
  • the computer device can first train the model parameters of the drug reward prediction model through the sample data of at least two users and the actual reward parameters of each user, so as to obtain the model parameters used to output any user's performance in each drug.
  • Drug reward prediction model under the action of the first reward parameter and the second reward parameter.
  • the drug reward prediction model here can be a deep reinforcement learning (deep q-network, DQN) model.
  • the reinforcement learning method of the DQN model is to take actions (such as
  • the artificial intelligence method is an artificial intelligence method that optimizes the strategy through the expected reward obtained after obtaining the expected reward.
  • the parameter value corresponding to the expected reward may be the expected reward parameter (such as the following first expected reward parameter and second expected reward parameter), in other words, the value of the expected reward parameter is used to represent the expected reward.
  • the policy refers to the method in which an action should be taken in a specific state to maximize the expected reward.
  • the computer device may acquire sample data of at least two users, wherein the sample data of at least two users may be used to train a drug reward prediction model, one user corresponds to one sample data, and one sample data may include User attribute information and sample medication information of the user.
  • the user attribute information here may include at least one of demographic information, health indicators of medication for the target disease, and historical medication information (ie, medication history), and the medication indicated by the sample medication information is a sample medication.
  • the demographic information may include gender, age, health status, occupation, marriage, education level, income, and other information, and the health index may be understood as an inspection index corresponding to the target disease.
  • the sample drugs used by different users for the target disease can be the same or different.
  • the computer device can obtain each first sample reward parameter and each second sample reward parameter of each user under the action of the sample drug, and combine the sample data of at least two users, each first sample reward parameter, and each third sample reward parameter.
  • the two-sample reward parameters are input into the drug reward prediction model.
  • the actual long-term reward parameter of the user under the action of the sample drug may be referred to as the first sample reward parameter.
  • the actual short-term reward parameter of the user under the action of the sample drug may also be referred to as the second sample reward parameter.
  • the drug action duration corresponding to the first sample reward parameter is greater than the drug action duration corresponding to the second sample reward parameter.
  • the reward here can be understood as the degree of influence of the user on their own health indicators after taking the sample drug for a period of time, and the value of the reward parameter is used to represent the degree of influence.
  • reward parameter 1 is used to represent influence degree 1
  • reward parameter 2 is used to represent influence degree 2. If reward parameter 1 is greater than reward parameter 2, it indicates that influence degree 1 is greater than influence degree 2.
  • the computer device can train the first network parameters and the second network parameters of the drug reward prediction model based on the user attribute information of at least two users, each first sample reward parameter and each second sample reward parameter, so as to obtain the first network parameters and the second network parameters of the drug reward prediction model based on any The ability of the user attribute information (eg target user attribute information) of a user (eg target user) to predict the first reward parameter and the second reward parameter of any user under the action of each drug.
  • the user attribute information eg target user attribute information
  • the first network parameter can be used to determine the first reward parameter (also called long-term reward parameter) of any user with any user attribute information under the action of various drugs
  • the second network parameter can be used to determine any user
  • the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter.
  • the first network parameter here may include the first model parameter and the first backhaul parameter
  • the second network parameter may include the second model parameter and the second backhaul parameter.
  • the parameters that are iteratively updated based on the loss value in the drug reward prediction model may be collectively referred to as model parameters (eg, the first model parameter and the second model parameter).
  • the application may refer to the return parameter corresponding to the first reward parameter in the first network parameter as the first return parameter (also referred to as the first return factor), and the application may also refer to the second network parameter as the second return parameter.
  • the return parameter corresponding to the reward parameter is called the second return parameter (may also be referred to as the second return factor).
  • the return parameters here can be understood as parameters that remain unchanged during the training process of the drug reward prediction model.
  • the first return parameter is greater than the second return parameter, for example, the first return parameter is 0.9 or other values, the The second return parameter is 0.2 or other values.
  • the computer device may determine each first expected reward parameter of each user under the action of the sample drug based on the first model parameter and the first return parameter, and based on the second model parameter and the second return parameter Each second expected reward parameter of each user under the action of the sample drug is determined.
  • a user corresponds to a first expected reward parameter under the action of a sample drug
  • a user corresponds to a second expected reward parameter under the action of a sample drug.
  • the computer device can use the loss function to pass the first return parameter, the second return parameter, each first sample reward parameter, each second sample reward parameter, each first expected reward parameter, and each second expected reward The parameter determines each loss value corresponding to each user's sample data.
  • a first sample reward parameter, a second sample reward parameter, a first expected reward parameter, and a second expected reward parameter correspond to a loss value corresponding to a user's sample data.
  • the computer device can determine the loss value l loss corresponding to the user's sample data according to the following formula (1):
  • At can represent the sample drug input into the drug reward prediction model at the current time t (that is, the sample drug in the sample data), and s t can represent the user attribute information input into the drug reward prediction model at the current time t (ie, the sample data in the sample data).
  • s t+1 can represent the user attribute information input into the drug reward prediction model at the next time t +1
  • Q long (s t , at ) can represent the user’s first expected reward at the current time t parameters
  • Q short (s t , at ) can represent the second expected reward parameter of the user at the current time t
  • r long can represent the first sample reward parameter of the user at the current time t
  • r short can represent the current time t.
  • the user's second sample reward parameter, ⁇ long can represent the first return coefficient
  • ⁇ short can represent the second return coefficient
  • Q long (s t+1 , a) can represent the user's first return at the next moment t+1
  • the expected reward parameter, Q short (s t+1 ,a) can represent the second expected reward parameter of the user at the next moment t+1.
  • the computer device may iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter based on each loss value until the loss value remains unchanged, and then stop the prediction of drug reward
  • the model is trained, and the iteratively updated first model parameters are used as the final first model parameters of the drug reward prediction model, and the iteratively updated second model parameters are used as the final second model parameters of the drug reward prediction model.
  • the drug reward prediction model has the ability to predict the first reward parameter and the second reward parameter of any user under the action of each drug based on the user attribute information of any user.
  • the drug reward prediction model may include multiple convolutional layers (eg, convolutional layers 10a to 10c) and multiple fully connected layers (eg, fully connected layer 20a and fully connected layer 20b).
  • the input of the reward prediction model is the user attribute information of the user
  • the output of the drug reward prediction model is the first reward parameter (eg Q long ) and the second reward parameter (eg Q short ) of any user under the action of each drug.
  • the drug reward prediction model may include the fully connected layer 20a and the fully connected layer 20b, but not the convolutional layer 10a to the volume Laminate 10c.
  • the drug reward prediction model here includes a first network parameter and a second network parameter, wherein the fully connected layer 20b (ie, the second fully connected layer) can be configured with the first network parameter and the second network parameter, as shown in FIG. 3 .
  • the fully-connected layer 20b may include two fully-connected layers (such as the fully-connected layer 200b and the fully-connected layer 201b), wherein the fully-connected layer 200b is configured with the first network parameters, and the fully-connected layer 200b is configured based on the first network
  • the parameter processes the user attribute information to output the first reward parameter Q long of any user under the action of each drug; the fully connected layer 201b is configured with the second network parameter, and the fully connected layer 201b is used for the user based on the second network parameter.
  • the attribute information is processed to output the second reward parameter Q short of any user under the action of each drug.
  • the computer device can obtain sample data of at least two users, of which at least two users'
  • the sample data can be long-term follow-up data of a large number of diabetic patients, and one sample data can include one-time follow-up data of one patient.
  • the sample data here may include user attribute information, and the user attribute information may include, but is not limited to, age, gender, medication history, sample drugs (that is, drugs prescribed by doctors or drugs actually taken by patients, such as biguanides or sulfonylureas) ), HbA1c value, creatinine value, and other health indicators for diabetes.
  • the computer device can obtain each first sample reward parameter and each first sample reward parameter of each user under the action of the sample drug, and use the user attribute information of each user, each first sample reward parameter and each first sample reward parameter for each user.
  • a sample reward parameter is input into the above drug reward prediction model.
  • the first sample reward parameter may indicate whether diabetic complications occurred at the last follow-up after taking the drug.
  • the first sample reward parameter is 0 when diabetic complications occur in diabetic patients, and 0 when diabetic patients do not appear.
  • the first sample reward parameter is 1 for complications of diabetes.
  • the second sample reward parameter can indicate whether the glycated hemoglobin value of the diabetic patient reaches the target at the next follow-up after taking the drug. When the second sample reward parameter is 0.
  • the computer device may output each first expected reward parameter of each user under the action of the sample drug based on the above-mentioned fully connected layer 200b, and output each second expected reward parameter of each user under the action of the sample drug based on the above-mentioned fully connected layer 201b.
  • the computer device can use the above-mentioned loss function to evaluate the first return parameter, the second return parameter, each first sample reward parameter, each second sample reward parameter, each first expected reward parameter, and each second expected reward. The parameters are calculated to obtain each loss value corresponding to the sample data of each user.
  • the computer device can iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter according to the loss value corresponding to all the sample data until the loss value is basically unchanged (for example, the loss value is the smallest), indicating that the drug reward prediction model Model training has been completed (i.e. the drug reward prediction model has converged).
  • the first network parameters configured in the fully connected layer 200b include the first return parameters and the iteratively updated first model parameters
  • the second network parameters configured in the fully connected layer 201b include the second return parameters and the iteratively updated first model parameters. Updated second model parameters.
  • the first return parameter in the fully connected layer 200b and the iteratively updated first model parameter can be used to predict the first reward parameter of any user under the action of each drug
  • the iteratively updated second model parameters can be used to predict the second reward parameters of any user under the action of each drug. It can be seen that the drug reward prediction model at this time has the ability to predict the first reward parameter and the second reward parameter of any user under the action of each drug based on the user attribute information of any user.
  • the computer device can acquire the target user attribute information of the target user based on the input instruction, and input the target user attribute information into the drug Reward prediction model.
  • the target user can input the attribute information of the target user in the above attribute information input area, and click the OK button in the user interface after the input is completed.
  • the computer device can detect the input instruction on the attribute information input area, so as to obtain the target user.
  • User's target user attribute information may include at least one of demographic information, health indicators of medication for the target disease, and historical medication information.
  • Step S102 outputting each first target reward parameter and each second target reward parameter of the target user under the action of each drug through the drug reward prediction model.
  • the computer device may determine each first target reward parameter of the target user under the action of each drug based on the first network parameters (ie, the first return parameter and the iteratively updated first model parameter), for example , the first network parameter may be the first network parameter in the fully connected layer 200b after the drug reward prediction model converges.
  • the target user corresponds to a first target reward parameter under the action of a drug.
  • the computer device may determine each second target reward parameter of the target user under the action of each drug, for example, the second network parameter based on the second network parameter (ie, the second return parameter and the iteratively updated second model parameter).
  • the second network parameter in the fully connected layer 201b after convergence of the drug reward prediction model may be.
  • the target user corresponds to a second target reward parameter under the action of a drug.
  • Step S103 Determine each user reward parameter of the target user under the action of each drug based on each first target reward parameter of the target user and/or each second target reward parameter of the target user.
  • the computer device may determine a first weighting coefficient for the first target reward parameter and a second weighting coefficient for the second target reward parameter.
  • the first weighting coefficient (eg, 1 or other numerical values) and the second weighting coefficient (eg, 1 or other numerical values) here may be the weighting coefficients set by the user or the weighting coefficients configured by default in the drug reward prediction model.
  • the computer device may determine each first weighted reward parameter corresponding to each first target reward parameter based on the first weighting coefficient and each first target reward parameter of the target user, and based on the second weighting coefficient and each second target user's second reward parameter
  • the target reward parameter determines each second weighted reward parameter corresponding to each second target reward parameter.
  • the computer equipment can sum up each first weighted reward parameter and each second weighted reward parameter to obtain each user reward parameter of the target user under the action of each drug, and a first weighted reward parameter corresponds to a second weighted reward parameter.
  • a user reward parameter A user reward parameter.
  • the computer equipment can also directly sum up each first target reward parameter and each second target reward parameter to obtain each user reward parameter of the target user under the action of each drug, a first target reward parameter and a second target reward parameter.
  • the reward parameter corresponds to a user reward parameter.
  • the computer device may determine each first target reward parameter of the target user as each user reward parameter of the target user under each drug action.
  • the computer device may determine each second target reward parameter of the target user as each user reward parameter of the target user under the action of each drug, which can be specifically determined according to the actual application scenario, There is no restriction here.
  • step S104 the maximum user reward parameter is determined from the user reward parameters, and the drug information of the target drug with the maximum user reward parameter is output to the user interface to display the target drug to the target user.
  • the computer device may sort each user reward parameter (such as from large to small or from small to large) to obtain a sequence of user reward parameters, and assign the first or The last user reward parameter is used as the maximum user reward parameter. Further, the computer device may output the drug information of the target drug with the maximum user reward parameter to the user interface to present the target drug to the target user. Taking the scenario of diabetes drug information push as an example, when the target user's drug action requirement is that there will be no complications of diabetes in the long term, the maximum user reward parameter can be the maximum first target reward parameter among the first target reward parameters. At this time, The computer device may output medication information for the target medication having the largest first target reward parameter to the user interface.
  • the maximum user reward parameter may be the largest second target reward parameter among the second target reward parameters, and the computer device may have the largest second target reward parameter.
  • the drug information of the target drug is output to the user interface.
  • the reward parameters of each user can be determined by each first weighted reward parameter and each second weighted reward parameter.
  • the computer equipment Medication information for the target medication with the maximum user reward parameter can be output to the user interface.
  • the target user can view the target drug on the user interface at this time, and send feedback information for the target drug to the computer device.
  • the feedback information may include that the target drug is different from the historical drug previously taken by the target user, or the effect of the target user taking the target drug is not as good as the effect of taking the historical drug.
  • the computer device can adjust the first network parameter and the second network parameter of the drug reward prediction model to It can better predict the first reward parameter and the second reward parameter of any user (such as the target user) under the action of each drug, and then push appropriate drug information to the target user.
  • the computer device may input the attribute information of the target user into the drug reward prediction model, and output each first target reward parameter and each second target reward parameter of the target user under the action of each drug through the drug reward prediction model,
  • the drug reward prediction model can output the first target reward parameter and the second target reward parameter at the same time, the reward parameter of the long-term outcome is evaluated by the first target reward parameter, and the reward parameter of the short-term outcome is evaluated by the second target reward parameter,
  • the scalability of the drug reward prediction model is enhanced, and the interpretability, safety, selectivity and traceability of the model are improved.
  • the computer device may determine each user reward parameter of the target user under the action of each drug based on each first target reward parameter of the target user and/or each second target reward parameter of the target user. At this time, the computer device can determine the maximum user reward parameter from the user reward parameters, and output the drug information of the target drug with the maximum user reward parameter to the user interface, so as to display the target drug to the target user, thereby improving the drug information Pushing accuracy, strong applicability.
  • FIG. 4 is a schematic structural diagram of a drug information push device provided by an embodiment of the present application.
  • the drug information push device may be a computer program (including program code) running in a computer device, for example, the drug information push device is an application software; the drug information push device may be used to execute the method provided by the embodiments of the present application corresponding steps in .
  • the drug information pushing apparatus 1 may run on a computer device, and the computer device may be the server 10 in the embodiment corresponding to FIG. 1 above.
  • the drug information pushing device 1 may include: a data acquisition module 10 , a sample input module 20 , a parameter training module 30 , an information input module 40 , a parameter output module 50 , a parameter determination module 60 and an information display module 70 .
  • the information input module 40 is used to obtain the target user attribute information of the target user, and input the target user attribute information into the drug reward prediction model. at least one.
  • the user interface includes an attribute information input area
  • the above-mentioned information input module 40 includes: an information acquisition unit 401 .
  • the information acquisition unit 401 is configured to acquire target user attribute information of the target user based on the input instruction when an input instruction on the attribute information input area is detected.
  • step S101 for the specific implementation manner of the information obtaining unit 401, reference may be made to the description of step S101 in the embodiment corresponding to FIG. 2, which will not be repeated here.
  • the parameter output module 50 is configured to output each first target reward parameter and each second target reward parameter of the target user under the action of each drug through a drug reward prediction model, wherein the drug reward prediction model includes a first network parameter and a second network parameters, the first network parameter is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine the second reward parameter of any user under the action of various drugs.
  • the drug reward prediction model includes a first network parameter and a second network parameters
  • the first network parameter is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs
  • the second network parameter is used to determine the second reward parameter of any user under the action of various drugs.
  • Reward parameters any user corresponds to a first reward parameter and a second reward parameter under the action of a drug, and the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter.
  • the parameter determination module 60 is configured to determine each user reward parameter of the target user under the action of each drug based on each first target reward parameter of the target user and/or each second target reward parameter of the target user, wherein the target user is a Each drug corresponds to a user reward parameter.
  • the parameter determination module 60 includes: a weighting coefficient determination unit 601 , a first reward parameter determination unit 602 and a second reward parameter determination unit 603 .
  • Weighting coefficient determination unit 601 for determining the first weighting coefficient of the first target reward parameter and the second weighting coefficient of the second target reward parameter
  • the first reward parameter determination unit 602 is configured to determine each first weighted reward parameter corresponding to each first target reward parameter based on the first weighting coefficient and each first target reward parameter of the target user, and based on the second weighting coefficient and the target user The second target reward parameters of each second target reward parameter determine each second weighted reward parameter corresponding to each second target reward parameter;
  • the second reward parameter determining unit 603 is configured to determine, based on each first weighted reward parameter and each second weighted reward parameter, each user reward parameter of the target user under the action of each drug, a first weighted reward parameter and a second weighted reward The parameter corresponds to a user reward parameter.
  • weighting coefficient determination unit 601, the first reward parameter determination unit 602, and the second reward parameter determination unit 603 can refer to the description of step S103 in the embodiment corresponding to FIG. 2, and will not be continued here. Repeat.
  • the above parameter determination module 60 further includes: a third reward parameter determination unit 604 .
  • the third reward parameter determining unit 604 is configured to determine each first target reward parameter of the target user as each user reward parameter of the target user under the action of each drug;
  • the maximum user reward parameter is the maximum first target reward parameter among the first target reward parameters.
  • the specific implementation of the third reward parameter determining unit 604 may refer to the description of step S103 in the above-mentioned embodiment corresponding to FIG. 2 , which will not be repeated here.
  • the above parameter determination module 60 further includes: a fourth reward parameter determination unit 605 .
  • the fourth reward parameter determination unit 605 is configured to determine each second target reward parameter of the target user as each user reward parameter of the target user under the action of each drug;
  • the maximum user reward parameter is the maximum second target reward parameter among the second target reward parameters.
  • the specific implementation of the fourth reward parameter determination unit 605 may refer to the description of step S103 in the above-mentioned embodiment corresponding to FIG. 2 , which will not be repeated here.
  • the information display module 70 is used for determining the maximum user reward parameter from each user reward parameter, and outputting the drug information of the target drug with the maximum user reward parameter to the user interface to display the target drug to the target user.
  • the above-mentioned drug information push device 1 further includes:
  • the data acquisition module 10 is used for acquiring sample data of at least two users, and the sample data of one user includes user attribute information and sample drug information of the user;
  • the sample input module 20 is used to obtain each first sample reward parameter and each second sample reward parameter of each user under the action of the sample drug indicated by the sample drug information, and combine the sample data of at least two users, each first sample reward parameter
  • the sample reward parameters and the second sample reward parameters are input into the drug reward prediction model
  • the parameter training module 30 is used for training the first network parameters and the second network parameters of the drug reward prediction model based on the user attribute information of at least two users, each first sample reward parameter and each second sample reward parameter, so as to obtain the first network parameter and the second network parameter of the drug reward prediction model.
  • the user attribute information of any user predicts the ability of the first reward parameter and the second reward parameter of any user under the action of each drug.
  • the first network parameters include first model parameters and first backhaul parameters
  • the second network parameters include second model parameters and second backhaul parameters
  • the above-mentioned parameter training module 30 includes: an expected parameter determination unit 301 , a loss value determination unit 302 and a parameter update unit 303 .
  • An expected parameter determination unit 301 configured to determine each first expected reward parameter of each user under the action of the sample drug based on the first model parameter and the first returned parameter, and determine each user based on the second model parameter and the second returned parameter each second expected reward parameter under the action of the sample drug;
  • the loss value determination unit 302 is configured to determine based on the first return parameter, the second return parameter, each first sample reward parameter, each second sample reward parameter, each first expected reward parameter and each second expected reward parameter Each loss value corresponding to each user's sample data;
  • the parameter updating unit 303 is configured to iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter based on each loss value until the loss value remains unchanged, so as to obtain a prediction based on the user attribute information of any user in each user.
  • the ability of the first reward parameter and the second reward parameter under the action of the drug is configured to iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter based on each loss value until the loss value remains unchanged, so as to obtain a prediction based on the user attribute information of any user in each user. The ability of the first reward parameter and the second reward parameter under the action of the drug.
  • the specific implementation of the expected parameter determination unit 301, the loss value determination unit 302 and the parameter update unit 303 can be referred to the description of the model training of the drug reward prediction model in step S101 of the above-mentioned embodiment corresponding to FIG. 2, which will not be discussed here. Let's go on and on.
  • step S101 to step S104 in will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • FIG. 5 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device may include a processor, memory, and a network interface.
  • the computer device may also include a user interface.
  • the computer device 1000 may be the server 10 in the above-mentioned embodiment corresponding to FIG. 1 , and the computer device 1000 may include: at least one processor 1001 , such as a CPU, at least one network interface 1004 , and user interface 1003 , memory 1005 , at least one communication bus 1002 .
  • the communication bus 1002 is used to realize the connection and communication between these components.
  • the user interface 1003 may include a display screen (display) and a keyboard (keyboard), and the network interface 1004 may optionally include a standard wired interface and a wireless interface (eg, a WI-FI interface).
  • the memory 1005 may be high-speed RAM memory or non-volatile memory, such as at least one disk memory.
  • the memory 1005 may optionally also be at least one storage device located remotely from the aforementioned processor 1001 .
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 is mainly used for network communication with the user terminal;
  • the user interface 1003 is mainly used to provide an input interface for the user;
  • device control application to achieve:
  • target user attribute information of the target user input the target user attribute information into the drug reward prediction model, and the target user attribute information includes at least one of demographic information, health indicators for drug use for the target disease, and historical drug use information;
  • Each first target reward parameter and each second target reward parameter of the target user under the action of each drug are output through the drug reward prediction model, wherein the drug reward prediction model includes the first network parameter and the second network parameter, and the first network parameter uses It is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine the second reward parameter of any user under the action of various drugs, and any user is under the action of various drugs.
  • a drug corresponds to a first reward parameter and a second reward parameter, and the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter;
  • each user reward parameter of the target user under the action of each drug is determined, wherein the target user corresponds to one user under the action of one drug reward parameters;
  • the maximum user reward parameter is determined from each user reward parameter, and the drug information of the target drug with the maximum user reward parameter is output to the user interface to display the target drug to the target user.
  • the computer device 1000 described in the embodiment of the present application can execute the description of the method for pushing drug information in the embodiment corresponding to FIG. 2 above, and can also execute the device for pushing drug information in the embodiment corresponding to FIG. 4 above.
  • the description of 1 will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the embodiments of the present application further provide a computer-readable storage medium, and the computer-readable storage medium stores the computer program executed by the aforementioned drug information pushing device 1, and the computer program is stored in the computer-readable storage medium.
  • the computer program includes program instructions, and when the processor executes the program instructions, it can execute the description of the drug information pushing method in the embodiment corresponding to FIG. 2 above, and therefore will not be repeated here. In addition, the description of the beneficial effects of using the same method will not be repeated.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
  • program instructions may be deployed to execute on one computing device, or on multiple computing devices located at one site, or alternatively, on multiple computing devices distributed across multiple sites and interconnected by a communications network
  • program instructions may be deployed to execute on one computing device, or on multiple computing devices located at one site, or alternatively, on multiple computing devices distributed across multiple sites and interconnected by a communications network
  • multiple computing devices distributed in multiple locations and interconnected by a communication network can form a blockchain system.
  • a computer program product or computer program including computer instructions stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method for pushing drug information provided in the embodiments of the present application.
  • the above-mentioned storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM) or the like.
  • the above-mentioned computer-readable storage medium may be the drug information pushing apparatus provided in any of the foregoing embodiments or an internal storage unit of the above-mentioned device, such as a hard disk or a memory of an electronic device.
  • the computer-readable storage medium can also be an external storage device of the electronic device, such as a pluggable hard disk, a smart media card (SMC), a secure digital (SD) card equipped on the electronic device, Flash card (flash card), etc.
  • the above-mentioned computer-readable storage medium may also include a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM), and the like.
  • the computer-readable storage medium may also include both an internal storage unit of the electronic device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device.
  • the computer-readable storage medium can also be used to temporarily store data that has been or will be output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Biomedical Technology (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

Disclosed in embodiments of the present application are a drug information pushing method and apparatus, a computer device, and a storage medium. The method is applicable to the field of digital medicine, and comprises: obtaining target user attribute information of a target user, and inputting the target user attribute information into a drug reward prediction model; by means of the drug reward prediction model, outputting first target reward parameters and second target reward parameters of the target user under the action of drugs; on the basis of the first target reward parameters of the target user and/or the second target reward parameters of the target user, determining user reward parameters of the target user under the action of the drugs; determining the maximum user reward parameter from among the user reward parameters, and outputting drug information of the target drug having the maximum user reward parameter to a user interface to display the target drug to the target user. By using the embodiments of the present application, the scalability of the drug reward prediction model can be enhanced, thereby improving the accuracy of drug information pushing.

Description

药物信息推送方法、装置、计算机设备及存储介质Drug information push method, device, computer equipment and storage medium
本申请要求于2021年4月29日提交中国专利局、申请号为202110473086.X,发明名称为“药物信息推送方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on April 29, 2021 with the application number 202110473086.X and the title of the invention is "drug information push method, device, computer equipment and storage medium", the entire contents of which are Incorporated herein by reference.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种药物信息推送方法、装置、计算机设备及存储介质。The present application relates to the technical field of artificial intelligence, and in particular, to a method, device, computer equipment and storage medium for pushing drug information.
背景技术Background technique
目前,运用深度强化学习(deep reinforcement learning,DRL)模型可解决越来越多的实际问题。发明人发现,在运行DRL模型时,可将患者的样本数据输入DRL模型以输出一个Q值(value),这里的Q值可用于评估不同行动(action,如医生的开药方案)的预期奖励(reward,如药物的影响程度)。由于DRL模型往往会考虑短期结局和长期结局且DRL模型只有一个回传因子,因此Q值会同时评估短期结局的预期奖励和长期结局的预期奖励,从而会导致长期结局的预期奖励和短期结局的预期奖励的本质相同。然而,发明人意识到,由于长期结局和短期结局具有本质差异,该本质差异主要体现在长期结局和短期结局的行动距离不同(如短期结局主要受最近时间的药物影响,长期结局主要受更久时间之前的药物影响),因此导致了DRL模型的可扩展性差。At present, the use of deep reinforcement learning (DRL) models can solve more and more practical problems. The inventors found that when running the DRL model, the patient's sample data can be input into the DRL model to output a Q value (value), where the Q value can be used to evaluate the expected reward of different actions (such as a doctor's prescription) (reward, such as the degree of influence of the drug). Since DRL models tend to consider both short-term and long-term outcomes and the DRL model has only one return factor, the Q-value evaluates both the expected reward for the short-term outcome and the expected reward for the long-term outcome, resulting in the expected reward for the long-term outcome and the expected reward for the short-term outcome. The expected reward is essentially the same. However, the inventors realized that due to the essential difference between long-term outcomes and short-term outcomes, the essential difference is mainly reflected in the difference in the distance of action between long-term outcomes and short-term outcomes (for example, short-term outcomes are mainly affected by the most recent drug, and long-term outcomes are mainly affected by longer-term outcomes. drug effects before time), thus resulting in poor scalability of the DRL model.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种药物信息推送方法、装置、计算机设备及存储介质,可增强药物奖励预测模型的可扩展性,从而提高了药物信息推送的精准度。The embodiments of the present application provide a drug information push method, device, computer equipment and storage medium, which can enhance the scalability of a drug reward prediction model, thereby improving the accuracy of drug information push.
第一方面,本申请提供了一种药物信息推送方法,该方法包括:In a first aspect, the application provides a method for pushing drug information, the method comprising:
获取目标用户的目标用户属性信息,将目标用户属性信息输入药物奖励预测模型,目标用户属性信息包括人口统计学信息、针对目标疾病用药的健康指标以及历史用药信息中的至少一种;Obtain target user attribute information of the target user, input the target user attribute information into the drug reward prediction model, and the target user attribute information includes at least one of demographic information, health indicators for drug use for the target disease, and historical drug use information;
通过药物奖励预测模型输出目标用户在各药物作用下的各第一目标奖励参数和各第二目标奖励参数,其中,药物奖励预测模型包括第一网络参数和第二网络参数,第一网络参数用于确定具有任一用户属性信息的任一用户在各种药物作用下的第一奖励参数,第二网络参数用于确定任一用户在各种药物作用下的第二奖励参数,任一用户在一种药物作用下对应一个第一奖励参数和一个第二奖励参数,第一奖励参数对应的药物作用时长大于第二奖励参数对应的药物作用时长;Each first target reward parameter and each second target reward parameter of the target user under the action of each drug are output through the drug reward prediction model, wherein the drug reward prediction model includes the first network parameter and the second network parameter, and the first network parameter uses It is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine the second reward parameter of any user under the action of various drugs, and any user is under the action of various drugs. A drug corresponds to a first reward parameter and a second reward parameter, and the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter;
基于目标用户的各第一目标奖励参数和/或目标用户的各第二目标奖励参数,确定目标用户在各药物作用下的各用户奖励参数,其中,目标用户在一种药物作用下对应一个用户奖励参数;Based on each first target reward parameter of the target user and/or each second target reward parameter of the target user, each user reward parameter of the target user under the action of each drug is determined, wherein the target user corresponds to one user under the action of one drug reward parameters;
从各用户奖励参数中确定出最大用户奖励参数,并将具有最大用户奖励参数的目标药物的药物信息输出至用户界面,以向目标用户展示目标药物。The maximum user reward parameter is determined from each user reward parameter, and the drug information of the target drug with the maximum user reward parameter is output to the user interface to display the target drug to the target user.
结合第二方面,在一种可能的实施方式中,上述装置还包括:In combination with the second aspect, in a possible implementation manner, the above-mentioned device further includes:
数据获取模块,用于获取至少两个用户的样本数据,一个用户的样本数据包括用户的用户属性信息和样本药物信息;a data acquisition module, configured to acquire sample data of at least two users, and the sample data of one user includes user attribute information and sample drug information of the user;
样本输入模块,用于获取各用户在样本药物信息所指示的样本药物作用下的各第一样本奖励参数和各第二样本奖励参数,并将至少两个用户的样本数据、各第一样本奖励参数以及各第二样本奖励参数输入药物奖励预测模型;The sample input module is used to obtain each first sample reward parameter and each second sample reward parameter of each user under the action of the sample drug indicated by the sample drug information, and combine the sample data of at least two users, each first sample This reward parameter and each second sample reward parameter are input into the drug reward prediction model;
参数训练模块,用于基于至少两个用户的用户属性信息、各第一样本奖励参数以及各第二样本奖励参数训练药物奖励预测模型的第一网络参数和第二网络参数,以获取基于任一用户的用户属性信息预测任一用户在各药物作用下的第一奖励参数和第二奖励参数的能 力。The parameter training module is used to train the first network parameters and the second network parameters of the drug reward prediction model based on the user attribute information of at least two users, each first sample reward parameter and each second sample reward parameter, so as to obtain the parameters based on any parameter. The user attribute information of a user predicts the ability of the first reward parameter and the second reward parameter of any user under the action of each drug.
第三方面,本申请提供了一种计算机设备,包括:处理器、存储器、网络接口;In a third aspect, the present application provides a computer device, including: a processor, a memory, and a network interface;
该处理器与存储器、网络接口相连,其中,网络接口用于提供数据通信功能,该存储器用于存储计算机程序,该处理器用于调用该计算机程序,以执行本申请实施例中上述第一方面中的药物信息推送方法,该药物推送方法包括:The processor is connected to a memory and a network interface, wherein the network interface is used to provide a data communication function, the memory is used to store a computer program, and the processor is used to call the computer program to execute the first aspect in the embodiment of the present application. The drug information push method, the drug push method includes:
获取目标用户的目标用户属性信息,将目标用户属性信息输入药物奖励预测模型,目标用户属性信息包括人口统计学信息、针对目标疾病用药的健康指标以及历史用药信息中的至少一种;Obtain target user attribute information of the target user, input the target user attribute information into the drug reward prediction model, and the target user attribute information includes at least one of demographic information, health indicators for drug use for the target disease, and historical drug use information;
通过药物奖励预测模型输出目标用户在各药物作用下的各第一目标奖励参数和各第二目标奖励参数,其中,药物奖励预测模型包括第一网络参数和第二网络参数,第一网络参数用于确定具有任一用户属性信息的任一用户在各种药物作用下的第一奖励参数,第二网络参数用于确定任一用户在各种药物作用下的第二奖励参数,任一用户在一种药物作用下对应一个第一奖励参数和一个第二奖励参数,第一奖励参数对应的药物作用时长大于第二奖励参数对应的药物作用时长;Each first target reward parameter and each second target reward parameter of the target user under the action of each drug are output through the drug reward prediction model, wherein the drug reward prediction model includes the first network parameter and the second network parameter, and the first network parameter uses It is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine the second reward parameter of any user under the action of various drugs, and any user is under the action of various drugs. A drug corresponds to a first reward parameter and a second reward parameter, and the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter;
基于目标用户的各第一目标奖励参数和/或目标用户的各第二目标奖励参数,确定目标用户在各药物作用下的各用户奖励参数,其中,目标用户在一种药物作用下对应一个用户奖励参数;Based on each first target reward parameter of the target user and/or each second target reward parameter of the target user, each user reward parameter of the target user under the action of each drug is determined, wherein the target user corresponds to one user under the action of one drug reward parameters;
从各用户奖励参数中确定出最大用户奖励参数,并将具有最大用户奖励参数的目标药物的药物信息输出至用户界面,以向目标用户展示目标药物。The maximum user reward parameter is determined from each user reward parameter, and the drug information of the target drug with the maximum user reward parameter is output to the user interface to display the target drug to the target user.
第四方面,本申请提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序包括程序指令,该程序指令当被处理器执行时,执行本申请中上述第一方面中的药物信息推送方法,该药物推送方法包括:In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program includes program instructions that, when executed by a processor, execute the above-mentioned first step in the present application. The drug information push method in one aspect, the drug push method includes:
获取目标用户的目标用户属性信息,将目标用户属性信息输入药物奖励预测模型,目标用户属性信息包括人口统计学信息、针对目标疾病用药的健康指标以及历史用药信息中的至少一种;Obtain target user attribute information of the target user, input the target user attribute information into the drug reward prediction model, and the target user attribute information includes at least one of demographic information, health indicators for drug use for the target disease, and historical drug use information;
通过药物奖励预测模型输出目标用户在各药物作用下的各第一目标奖励参数和各第二目标奖励参数,其中,药物奖励预测模型包括第一网络参数和第二网络参数,第一网络参数用于确定具有任一用户属性信息的任一用户在各种药物作用下的第一奖励参数,第二网络参数用于确定任一用户在各种药物作用下的第二奖励参数,任一用户在一种药物作用下对应一个第一奖励参数和一个第二奖励参数,第一奖励参数对应的药物作用时长大于第二奖励参数对应的药物作用时长;Each first target reward parameter and each second target reward parameter of the target user under the action of each drug are output through the drug reward prediction model, wherein the drug reward prediction model includes the first network parameter and the second network parameter, and the first network parameter uses It is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine the second reward parameter of any user under the action of various drugs, and any user is under the action of various drugs. A drug corresponds to a first reward parameter and a second reward parameter, and the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter;
基于目标用户的各第一目标奖励参数和/或目标用户的各第二目标奖励参数,确定目标用户在各药物作用下的各用户奖励参数,其中,目标用户在一种药物作用下对应一个用户奖励参数;Based on each first target reward parameter of the target user and/or each second target reward parameter of the target user, each user reward parameter of the target user under the action of each drug is determined, wherein the target user corresponds to one user under the action of one drug reward parameters;
从各用户奖励参数中确定出最大用户奖励参数,并将具有最大用户奖励参数的目标药物的药物信息输出至用户界面,以向目标用户展示目标药物。The maximum user reward parameter is determined from each user reward parameter, and the drug information of the target drug with the maximum user reward parameter is output to the user interface to display the target drug to the target user.
本申请实施例增强了药物奖励预测模型的可扩展性,并且提高了模型的可解释性、安全性、可选择性以及可追溯性,从而提高了药物信息推送的精准度,适用性强。The embodiment of the present application enhances the scalability of the drug reward prediction model, and improves the interpretability, security, selectivity and traceability of the model, thereby improving the accuracy of drug information push and having strong applicability.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.
图1是本申请提供的网络架构的结构示意图;1 is a schematic structural diagram of a network architecture provided by the application;
图2是本申请提供的药物信息推送方法的流程示意图;Fig. 2 is the schematic flow chart of the drug information push method provided by the application;
图3是本申请提供的药物奖励预测模型的结构示意图;Fig. 3 is the structural representation of the drug reward prediction model provided by the application;
图4是本申请提供的药物信息推送装置的结构示意图;4 is a schematic structural diagram of a drug information push device provided by the present application;
图5是本申请提供的计算机设备的结构示意图。FIG. 5 is a schematic structural diagram of a computer device provided by the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
本申请的技术方案可涉及人工智能技术领域,如可应用于智慧医疗如医疗信息推送等场景中,以实现数字医疗,推动智慧城市的建设。可选的,本申请涉及的数据如属性信息和/或目标药物的信息等可存储于数据库中,或者可以存储于区块链中,比如通过区块链分布式存储,本申请不做限定。The technical solutions of the present application may relate to the technical field of artificial intelligence, and may be applied to scenarios such as smart medical treatment such as medical information push, so as to realize digital medical treatment and promote the construction of smart cities. Optionally, the data involved in this application, such as attribute information and/or target drug information, may be stored in a database, or may be stored in a blockchain, such as distributed storage through a blockchain, which is not limited in this application.
请参见图1,图1是本申请提供的网络架构的结构示意图。如图1所示,该网络架构可以包括服务器10和用户终端集群,该用户终端集群可以包括多个用户终端,如图1所示,具体可以包括用户终端100a、用户终端100b、用户终端100c、…、用户终端100n。Please refer to FIG. 1 , which is a schematic structural diagram of a network architecture provided by the present application. As shown in FIG. 1, the network architecture may include a server 10 and a user terminal cluster, and the user terminal cluster may include multiple user terminals, as shown in FIG. ..., the user terminal 100n.
其中,服务器10可以为独立的物理服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(content delivery network,CDN)、大数据以及人工智能平台等基础云计算服务的云服务器。用户终端集群中的每个用户终端均可以包括但不限于:智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等智能终端。The server 10 may be an independent physical server, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content distribution networks ( content delivery network, CDN), big data and artificial intelligence platforms and other basic cloud computing services cloud servers. Each user terminal in the user terminal cluster may include, but is not limited to, smart terminals such as smart phones, tablet computers, notebook computers, desktop computers, smart speakers, and smart watches.
可以理解的是,本申请中的计算机设备可以为具有药物信息推送功能的实体终端,该实体终端可以为如图1所示的服务器10,也可以为用户终端,在此不做限定。It can be understood that the computer device in this application may be an entity terminal with a drug information push function, and the entity terminal may be the server 10 as shown in FIG. 1 or a user terminal, which is not limited herein.
如图1所示,用户终端100a、用户终端100b、用户终端100c、…、用户终端100n可以分别与上述服务器10进行网络连接,以便于每个用户终端可以通过该网络连接与服务器10进行数据交互。例如,服务器10可以将目标药物的药物信息输出至目标用户的用户终端对应的用户界面,以使目标用户对用户界面上的目标药物进行查看,其中,目标用户的用户终端可以为用户终端集群中的任意一个用户终端(如用户终端100a)。本申请可以将基于药物奖励预测模型确定的用于向目标用户推送的药物统称为目标药物,本申请也可以将具有预测任一用户在各药物作用下的第一奖励参数和第二奖励参数的功能的模型称之为药物奖励预测模型。As shown in FIG. 1, the user terminal 100a, the user terminal 100b, the user terminal 100c, . . . , and the user terminal 100n can be respectively connected to the above-mentioned server 10 through a network, so that each user terminal can exchange data with the server 10 through the network connection. . For example, the server 10 may output the drug information of the target drug to the user interface corresponding to the user terminal of the target user, so that the target user can view the target drug on the user interface, wherein the user terminal of the target user may be a user terminal in the user terminal cluster. Any one of the user terminals (eg, user terminal 100a). In this application, the drugs determined based on the drug reward prediction model and used for pushing to target users may be collectively referred to as target drugs. A functional model is called a drug reward prediction model.
本申请提供的药物信息推送方法可适用于针对任一疾病的药物信息推送场景,比如糖尿病药物信息推送场景、高血压药物信息推送场景或者其它疾病的药物信息推送场景。假设目标用户为医生,医生可以将患者的基本信息输入至药物奖励预测模型,可以基于患者的基本信息将推送的目标药物的药物信息输出至用户界面,这时医生可以在该用户界面上查看该目标药物(这里的目标药物可以作为初步诊断结果),再结合自己对患者的进一步诊断结果确定适合该患者的药物(如上述目标药物)。假设目标用户为患者,患者可以将自己的基本信息输入至医院、卫生站或者社康等医疗机构提供的自助终端(或简称自助机等),该自助机中包含上述药物奖励预测模型,可以基于患者的基本信息将推荐的目标药物的药物信息输出至该自助机的用户界面。患者可以在该自助机的用户界面中查看该目标药物,后续患者可以直接购买该目标药物,也可以让医生进一步诊断确定适合该患者的药物(如上述目标药物)。The drug information push method provided in this application can be applied to a drug information push scenario for any disease, such as a diabetes drug information push scenario, a hypertension drug information push scenario, or a drug information push scenario for other diseases. Assuming that the target user is a doctor, the doctor can input the patient's basic information into the drug reward prediction model, and can output the pushed drug information of the target drug to the user interface based on the patient's basic information. At this time, the doctor can view the information on the user interface. The target drug (here the target drug can be used as the preliminary diagnosis result), and then combined with the further diagnosis results of the patient to determine the appropriate drug for the patient (such as the above-mentioned target drug). Assuming that the target user is a patient, the patient can input their basic information to the self-service terminal (or simply self-service machine, etc.) provided by medical institutions such as hospitals, health stations or social health institutions. The self-service machine contains the above-mentioned drug reward prediction model, which can be based on The basic information of the patient outputs the drug information of the recommended target drug to the user interface of the self-service machine. The patient can view the target drug in the user interface of the self-service machine, and the patient can purchase the target drug directly, or the doctor can further diagnose and determine the drug suitable for the patient (such as the above-mentioned target drug).
为方便描述,下面将以糖尿病药物信息推送场景为例进行说明,以下不再赘述。下面将结合图2至图5对本申请的药物信息推送方法、药物信息推送装置以及计算机设备进行 说明。For the convenience of description, the following description will take the scenario of diabetes drug information push as an example, which will not be repeated below. The medicine information pushing method, medicine information pushing device and computer equipment of the present application will be described below in conjunction with Fig. 2 to Fig. 5 .
请参见图2,图2是本申请实施例提供的药物信息推送方法的流程示意图。如图2所示,该方法可以包括以下步骤S101-步骤S104:Please refer to FIG. 2 , which is a schematic flowchart of a method for pushing drug information provided by an embodiment of the present application. As shown in FIG. 2, the method may include the following steps S101-S104:
步骤S101,获取目标用户的目标用户属性信息,将目标用户属性信息输入药物奖励预测模型。In step S101, target user attribute information of the target user is acquired, and the target user attribute information is input into a drug reward prediction model.
可以理解,在执行步骤S101之前,计算机设备可以先通过至少两个用户的样本数据以及各用户的实际奖励参数对药物奖励预测模型的模型参数进行训练,从而得到用于输出任一用户在各药物作用下的第一奖励参数和第二奖励参数的药物奖励预测模型。这里的药物奖励预测模型可以为深度强化学习(deep q-network,DQN)模型,DQN模型的强化学习方法为一种通过一定的策略(policy),针对状态(如用户属性信息)采取动作(如药物)后得到预期奖励,再通过所获得的预期奖励来优化策略的人工智能方法。这里预期奖励对应的参数值可以为预期奖励参数(如下述第一预期奖励参数和第二预期奖励参数),换言之,预期奖励参数的取值用于表示预期奖励。其中,该策略是指在特定状态下应该采取某个动作,以使预期奖励最大的方法。It can be understood that, before step S101 is executed, the computer device can first train the model parameters of the drug reward prediction model through the sample data of at least two users and the actual reward parameters of each user, so as to obtain the model parameters used to output any user's performance in each drug. Drug reward prediction model under the action of the first reward parameter and the second reward parameter. The drug reward prediction model here can be a deep reinforcement learning (deep q-network, DQN) model. The reinforcement learning method of the DQN model is to take actions (such as The artificial intelligence method is an artificial intelligence method that optimizes the strategy through the expected reward obtained after obtaining the expected reward. Here, the parameter value corresponding to the expected reward may be the expected reward parameter (such as the following first expected reward parameter and second expected reward parameter), in other words, the value of the expected reward parameter is used to represent the expected reward. Among them, the policy refers to the method in which an action should be taken in a specific state to maximize the expected reward.
在一些可行的实施方式中,计算机设备可以获取至少两个用户的样本数据,其中,至少两个用户的样本数据可用于训练药物奖励预测模型,一个用户对应一个样本数据,一个样本数据中可包括用户的用户属性信息和样本药物信息。这里的用户属性信息可包括人口统计学信息、针对目标疾病用药的健康指标以及历史用药信息(即用药史)中的至少一种,该样本药物信息所指示的药物为样本药物。其中,人口统计学信息可包括性别、年龄、健康状况、职业、婚姻、文化水平、收入以及其它信息,健康指标可以理解为目标疾病对应的检查指标。不同用户针对目标疾病使用的样本药物可以相同,也可以不同。In some feasible implementations, the computer device may acquire sample data of at least two users, wherein the sample data of at least two users may be used to train a drug reward prediction model, one user corresponds to one sample data, and one sample data may include User attribute information and sample medication information of the user. The user attribute information here may include at least one of demographic information, health indicators of medication for the target disease, and historical medication information (ie, medication history), and the medication indicated by the sample medication information is a sample medication. The demographic information may include gender, age, health status, occupation, marriage, education level, income, and other information, and the health index may be understood as an inspection index corresponding to the target disease. The sample drugs used by different users for the target disease can be the same or different.
进一步地,计算机设备可以获取各用户在样本药物作用下的各第一样本奖励参数和各第二样本奖励参数,并将至少两个用户的样本数据、各第一样本奖励参数以及各第二样本奖励参数输入药物奖励预测模型。本申请可以将用户在样本药物作用下的实际长期奖励参数称之为第一样本奖励参数。本申请也可以将用户在样本药物作用下的实际短期奖励参数称之为第二样本奖励参数。其中第一样本奖励参数对应的药物作用时长大于第二样本奖励参数对应的药物作用时长。这里的奖励可以理解为用户采用样本药物一段时间后对自身健康指标的影响程度,奖励参数的取值用于表示影响程度。例如,奖励参数1用于表示影响程度1,奖励参数2用于表示影响程度2,若奖励参数1大于奖励参数2,则表明了影响程度1大于影响程度2。Further, the computer device can obtain each first sample reward parameter and each second sample reward parameter of each user under the action of the sample drug, and combine the sample data of at least two users, each first sample reward parameter, and each third sample reward parameter. The two-sample reward parameters are input into the drug reward prediction model. In this application, the actual long-term reward parameter of the user under the action of the sample drug may be referred to as the first sample reward parameter. In this application, the actual short-term reward parameter of the user under the action of the sample drug may also be referred to as the second sample reward parameter. The drug action duration corresponding to the first sample reward parameter is greater than the drug action duration corresponding to the second sample reward parameter. The reward here can be understood as the degree of influence of the user on their own health indicators after taking the sample drug for a period of time, and the value of the reward parameter is used to represent the degree of influence. For example, reward parameter 1 is used to represent influence degree 1, and reward parameter 2 is used to represent influence degree 2. If reward parameter 1 is greater than reward parameter 2, it indicates that influence degree 1 is greater than influence degree 2.
进一步地,计算机设备可以基于至少两个用户的用户属性信息、各第一样本奖励参数以及各第二样本奖励参数训练药物奖励预测模型的第一网络参数和第二网络参数,以获取基于任一用户(如目标用户)的用户属性信息(如目标用户属性信息)预测任一用户在各药物作用下的第一奖励参数和第二奖励参数的能力。其中,第一网络参数可用于确定具有任一用户属性信息的任一用户在各种药物作用下的第一奖励参数(也可以称为长期奖励参数),第二网络参数可用于确定任一用户在各种药物作用下的第二奖励参数(也可以称为短期奖励参数),第一奖励参数对应的药物作用时长大于第二奖励参数对应的药物作用时长。这里的第一网络参数可包括第一模型参数和第一回传参数,第二网络参数可包括第二模型参数和第二回传参数。本申请可以将药物奖励预测模型中基于损失值迭代更新的参数统称为模型参数(如第一模型参数和第二模型参数)。本申请可以将第一网络参数中第一奖励参数对应的回传参数称之为第一回传参数(也可以称为第一回传因子),本申请也可以将第二网络参数中第二奖励参数对应的回传参数称之为第二回传参数(也可以称为第二回传因子)。这里的回传参数可以理解为药物奖励预测模型的训练过程中不变的参数。其中,由于第一奖励参数对应的药物作用时长大于第二奖励参数对应的药物作用时长,因此第一回传参数 大于第二回传参数,例如,第一回传参数为0.9或者其它数值,第二回传参数为0.2或者其它数值。Further, the computer device can train the first network parameters and the second network parameters of the drug reward prediction model based on the user attribute information of at least two users, each first sample reward parameter and each second sample reward parameter, so as to obtain the first network parameters and the second network parameters of the drug reward prediction model based on any The ability of the user attribute information (eg target user attribute information) of a user (eg target user) to predict the first reward parameter and the second reward parameter of any user under the action of each drug. Wherein, the first network parameter can be used to determine the first reward parameter (also called long-term reward parameter) of any user with any user attribute information under the action of various drugs, and the second network parameter can be used to determine any user For the second reward parameter (which may also be referred to as a short-term reward parameter) under the action of various drugs, the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter. The first network parameter here may include the first model parameter and the first backhaul parameter, and the second network parameter may include the second model parameter and the second backhaul parameter. In the present application, the parameters that are iteratively updated based on the loss value in the drug reward prediction model may be collectively referred to as model parameters (eg, the first model parameter and the second model parameter). The application may refer to the return parameter corresponding to the first reward parameter in the first network parameter as the first return parameter (also referred to as the first return factor), and the application may also refer to the second network parameter as the second return parameter. The return parameter corresponding to the reward parameter is called the second return parameter (may also be referred to as the second return factor). The return parameters here can be understood as parameters that remain unchanged during the training process of the drug reward prediction model. Wherein, since the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter, the first return parameter is greater than the second return parameter, for example, the first return parameter is 0.9 or other values, the The second return parameter is 0.2 or other values.
在一些可行的实施方式中,计算机设备可以基于第一模型参数和第一回传参数确定各用户在样本药物作用下的各第一预期奖励参数,并基于第二模型参数和第二回传参数确定各用户在样本药物作用下的各第二预期奖励参数。其中,一个用户在一种样本药物作用下对应一个第一预期奖励参数,且一个用户在一种样本药物作用下对应一个第二预期奖励参数。这时,计算机设备可以采用损失函数,通过第一回传参数、第二回传参数、各第一样本奖励参数、各第二样本奖励参数、各第一预期奖励参数以及各第二预期奖励参数确定各用户的样本数据对应的各损失值。其中,一个第一样本奖励参数、一个第二样本奖励参数、一个第一预期奖励参数以及一个第二预期奖励参数,对应一个用户的样本数据对应的损失值。其中,计算机设备可以根据下述公式(1)确定用户的样本数据对应的损失值l lossIn some feasible embodiments, the computer device may determine each first expected reward parameter of each user under the action of the sample drug based on the first model parameter and the first return parameter, and based on the second model parameter and the second return parameter Each second expected reward parameter of each user under the action of the sample drug is determined. Among them, a user corresponds to a first expected reward parameter under the action of a sample drug, and a user corresponds to a second expected reward parameter under the action of a sample drug. At this time, the computer device can use the loss function to pass the first return parameter, the second return parameter, each first sample reward parameter, each second sample reward parameter, each first expected reward parameter, and each second expected reward The parameter determines each loss value corresponding to each user's sample data. Among them, a first sample reward parameter, a second sample reward parameter, a first expected reward parameter, and a second expected reward parameter correspond to a loss value corresponding to a user's sample data. Wherein, the computer device can determine the loss value l loss corresponding to the user's sample data according to the following formula (1):
l loss=(Q short(s t,a t)+Q long(s t,a t)-(r short+r long+max ashort*Q short(s t+1,a)+γ long*Q long(s t+1,a)))) 2,公式(1) l loss =(Q short (s t ,a t )+Q long (s t ,a t )-(r short +r long +max ashort *Q short (s t+1 ,a)+γ long *Q long (s t+1 ,a)))) 2 , formula (1)
其中,a t可以表示在当前时刻t输入药物奖励预测模型的样本药物(即样本数据中的样本药物),s t可以表示在当前时刻t输入药物奖励预测模型的用户属性信息(即样本数据中用户的用户属性信息),s t+1可以表示在下一时刻t+1输入药物奖励预测模型的用户属性信息,Q long(s t,a t)可以表示在当前时刻t用户的第一预期奖励参数,Q short(s t,a t)可以表示在当前时刻t用户的第二预期奖励参数,r long可以表示在当前时刻t用户的第一样本奖励参数,r short可以表示在当前时刻t用户的第二样本奖励参数,γ long可以表示第一回传系数,γ short可以表示第二回传系数,Q long(s t+1,a)可以表示在下一时刻t+1用户的第一预期奖励参数,Q short(s t+1,a)可以表示在下一时刻t+1用户的第二预期奖励参数。 Among them, at can represent the sample drug input into the drug reward prediction model at the current time t (that is, the sample drug in the sample data), and s t can represent the user attribute information input into the drug reward prediction model at the current time t (ie, the sample data in the sample data). User attribute information of the user), s t+1 can represent the user attribute information input into the drug reward prediction model at the next time t +1, Q long (s t , at ) can represent the user’s first expected reward at the current time t parameters, Q short (s t , at ) can represent the second expected reward parameter of the user at the current time t , r long can represent the first sample reward parameter of the user at the current time t, and r short can represent the current time t. The user's second sample reward parameter, γ long can represent the first return coefficient, γ short can represent the second return coefficient, Q long (s t+1 , a) can represent the user's first return at the next moment t+1 The expected reward parameter, Q short (s t+1 ,a) can represent the second expected reward parameter of the user at the next moment t+1.
在基于上述公式(1)得到各损失值之后,计算机设备可以基于各损失值迭代更新第一模型参数的参数值和第二模型参数的参数值直至损失值不变,这时停止对药物奖励预测模型进行训练,并将迭代更新后的第一模型参数作为药物奖励预测模型最终的第一模型参数,将迭代更新后的第二模型参数作为药物奖励预测模型最终的第二模型参数。这时也表明了药物奖励预测模型具有基于任一用户的用户属性信息预测任一用户在各药物作用下的第一奖励参数和第二奖励参数的能力。After obtaining each loss value based on the above formula (1), the computer device may iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter based on each loss value until the loss value remains unchanged, and then stop the prediction of drug reward The model is trained, and the iteratively updated first model parameters are used as the final first model parameters of the drug reward prediction model, and the iteratively updated second model parameters are used as the final second model parameters of the drug reward prediction model. At this time, it also shows that the drug reward prediction model has the ability to predict the first reward parameter and the second reward parameter of any user under the action of each drug based on the user attribute information of any user.
请参见图3,图3是本申请的药物奖励预测模型的结构示意图。如图3所示,药物奖励预测模型中可包括多层卷积层(如卷积层10a至卷积层10c)和多层全连接层(如全连接层20a和全连接层20b),药物奖励预测模型的输入为用户的用户属性信息,药物奖励预测模型的输出为任一用户在各药物作用下的第一奖励参数(如Q long)和第二奖励参数(如Q short)。在用户属性信息对应的特征向量为一维向量(如用户属性信息为患者的随访信息)时,药物奖励预测模型可包括全连接层20a和全连接层20b,而不包括卷积层10a至卷积层 10c。这里的药物奖励预测模型中包括第一网络参数和第二网络参数,其中全连接层20b(即第二层全连接层)可配置有第一网络参数和第二网络参数,如图3所示,全连接层20b中可包括两个全连接层(如全连接层200b和全连接层201b),其中全连接层200b中配置有第一网络参数,且全连接层200b用于基于第一网络参数对用户属性信息进行处理以输出任一用户在各药物作用下的第一奖励参数Q long;全连接层201b配置有第二网络参数,且全连接层201b用于基于第二网络参数对用户属性信息进行处理以输出任一用户在各药物作用下的第二奖励参数Q shortPlease refer to FIG. 3 , which is a schematic structural diagram of the drug reward prediction model of the present application. As shown in Figure 3, the drug reward prediction model may include multiple convolutional layers (eg, convolutional layers 10a to 10c) and multiple fully connected layers (eg, fully connected layer 20a and fully connected layer 20b). The input of the reward prediction model is the user attribute information of the user, and the output of the drug reward prediction model is the first reward parameter (eg Q long ) and the second reward parameter (eg Q short ) of any user under the action of each drug. When the feature vector corresponding to the user attribute information is a one-dimensional vector (for example, the user attribute information is the follow-up information of the patient), the drug reward prediction model may include the fully connected layer 20a and the fully connected layer 20b, but not the convolutional layer 10a to the volume Laminate 10c. The drug reward prediction model here includes a first network parameter and a second network parameter, wherein the fully connected layer 20b (ie, the second fully connected layer) can be configured with the first network parameter and the second network parameter, as shown in FIG. 3 . , the fully-connected layer 20b may include two fully-connected layers (such as the fully-connected layer 200b and the fully-connected layer 201b), wherein the fully-connected layer 200b is configured with the first network parameters, and the fully-connected layer 200b is configured based on the first network The parameter processes the user attribute information to output the first reward parameter Q long of any user under the action of each drug; the fully connected layer 201b is configured with the second network parameter, and the fully connected layer 201b is used for the user based on the second network parameter. The attribute information is processed to output the second reward parameter Q short of any user under the action of each drug.
为方便描述,在糖尿病药物信息推送场景(也可以称为糖尿病患者的分群场景,分群是指医生的开药方案)下,计算机设备可获取至少两个用户的样本数据,其中至少两个用户的样本数据可以为大量糖尿病患者的长期随访数据,一个样本数据可包括一个患者的一次随访数据。这里的样本数据中可包括用户属性信息,且用户属性信息可以包括但不限于年龄、性别、用药史、样本药物(即医生开药方案中的药物或者患者实际服用的药物,如双胍或者磺脲)、糖化血红蛋白值、肌酐值、以及针对糖尿病的其它健康指标。这时,计算机设备可获取各用户在样本药物作用下的各第一样本奖励参数和各第一样本奖励参数,并将各用户的用户属性信息、各第一样本奖励参数和各第一样本奖励参数输入上述药物奖励预测模型。例如,第一样本奖励参数可以指示糖尿病患者服用药物之后在最后一次随访中是否出现糖尿病的并发症,在糖尿病患者出现糖尿病的并发症时第一样本奖励参数为0,在糖尿病患者未出现糖尿病的并发症时第一样本奖励参数为1。例如,第二样本奖励参数可以指示糖尿病患者服用药物之后在下一次随访中糖化血红蛋白值是否达标,在糖尿病患者的糖化血红蛋白值达标时第二样本奖励参数为1,在糖尿病患者的糖化血红蛋白值未达标时第二样本奖励参数为0。For the convenience of description, in the scenario of diabetes drug information push (also known as the scenario of grouping diabetes patients, grouping refers to the doctor's prescription), the computer device can obtain sample data of at least two users, of which at least two users' The sample data can be long-term follow-up data of a large number of diabetic patients, and one sample data can include one-time follow-up data of one patient. The sample data here may include user attribute information, and the user attribute information may include, but is not limited to, age, gender, medication history, sample drugs (that is, drugs prescribed by doctors or drugs actually taken by patients, such as biguanides or sulfonylureas) ), HbA1c value, creatinine value, and other health indicators for diabetes. At this time, the computer device can obtain each first sample reward parameter and each first sample reward parameter of each user under the action of the sample drug, and use the user attribute information of each user, each first sample reward parameter and each first sample reward parameter for each user. A sample reward parameter is input into the above drug reward prediction model. For example, the first sample reward parameter may indicate whether diabetic complications occurred at the last follow-up after taking the drug. The first sample reward parameter is 0 when diabetic complications occur in diabetic patients, and 0 when diabetic patients do not appear. The first sample reward parameter is 1 for complications of diabetes. For example, the second sample reward parameter can indicate whether the glycated hemoglobin value of the diabetic patient reaches the target at the next follow-up after taking the drug. When the second sample reward parameter is 0.
进一步地,计算机设备可以基于上述全连接层200b输出各用户在样本药物作用下的各第一预期奖励参数,并基于上述全连接层201b输出各用户在样本药物作用下的各第二预期奖励参数。进一步地,计算机设备可采用上述损失函数对第一回传参数、第二回传参数、各第一样本奖励参数、各第二样本奖励参数、各第一预期奖励参数以及各第二预期奖励参数进行计算,得到各用户的样本数据对应的各损失值。这时,计算机设备可以根据所有样本数据对应的损失值迭代更新第一模型参数的参数值和第二模型参数的参数值直至损失值基本不变(如损失值最小),表明了药物奖励预测模型已完成模型训练(即药物奖励预测模型收敛)。这时全连接层200b中所配置的第一网络参数包括第一回传参数以及迭代更新后的第一模型参数,全连接层201b中所配置的第二网络参数包括第二回传参数以及迭代更新后的第二模型参数。其中,全连接层200b中的第一回传参数以及迭代更新后的第一模型参数可用于预测任一用户在各药物作用下的第一奖励参数,全连接层201b中的第二回传参数以及迭代更新后的第二模型参数可用于预测任一用户在各药物作用下的第二奖励参数。由此可见,这时的药物奖励预测模型具有基于任一用户的用户属性信息预测任一用户在各药物作用下的第一奖励参数和第二奖励参数的能力。Further, the computer device may output each first expected reward parameter of each user under the action of the sample drug based on the above-mentioned fully connected layer 200b, and output each second expected reward parameter of each user under the action of the sample drug based on the above-mentioned fully connected layer 201b. . Further, the computer device can use the above-mentioned loss function to evaluate the first return parameter, the second return parameter, each first sample reward parameter, each second sample reward parameter, each first expected reward parameter, and each second expected reward. The parameters are calculated to obtain each loss value corresponding to the sample data of each user. At this time, the computer device can iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter according to the loss value corresponding to all the sample data until the loss value is basically unchanged (for example, the loss value is the smallest), indicating that the drug reward prediction model Model training has been completed (i.e. the drug reward prediction model has converged). At this time, the first network parameters configured in the fully connected layer 200b include the first return parameters and the iteratively updated first model parameters, and the second network parameters configured in the fully connected layer 201b include the second return parameters and the iteratively updated first model parameters. Updated second model parameters. The first return parameter in the fully connected layer 200b and the iteratively updated first model parameter can be used to predict the first reward parameter of any user under the action of each drug, and the second return parameter in the fully connected layer 201b And the iteratively updated second model parameters can be used to predict the second reward parameters of any user under the action of each drug. It can be seen that the drug reward prediction model at this time has the ability to predict the first reward parameter and the second reward parameter of any user under the action of each drug based on the user attribute information of any user.
在训练得到药物奖励预测模型之后,计算机设备在检测到用户界面中的属性信息输入区域上的输入指令时,可基于该输入指令获取目标用户的目标用户属性信息,并将目标用户属性信息输入药物奖励预测模型。例如,目标用户可以在上述属性信息输入区域输入目标用户属性信息,并在输入完成之后点击用户界面中的确定完成按键,这时计算机设备可以检测到属性信息输入区域上的输入指令,从而获得目标用户的目标用户属性信息。其中,目标用户属性信息可包括人口统计学信息、针对目标疾病用药的健康指标以及历史用药信息中的至少一种。After training the drug reward prediction model, when detecting an input instruction on the attribute information input area in the user interface, the computer device can acquire the target user attribute information of the target user based on the input instruction, and input the target user attribute information into the drug Reward prediction model. For example, the target user can input the attribute information of the target user in the above attribute information input area, and click the OK button in the user interface after the input is completed. At this time, the computer device can detect the input instruction on the attribute information input area, so as to obtain the target user. User's target user attribute information. Wherein, the attribute information of the target user may include at least one of demographic information, health indicators of medication for the target disease, and historical medication information.
步骤S102,通过药物奖励预测模型输出目标用户在各药物作用下的各第一目标奖励参数和各第二目标奖励参数。Step S102, outputting each first target reward parameter and each second target reward parameter of the target user under the action of each drug through the drug reward prediction model.
在一些可行的实施方式中,计算机设备可基于第一网络参数(即第一回传参数和迭代更新后的第一模型参数)确定目标用户在各药物作用下的各第一目标奖励参数,例如,第一网络参数可以为药物奖励预测模型收敛后全连接层200b中的第一网络参数。其中,目标用户在一种药物作用下对应一个第一目标奖励参数。进一步地,计算机设备可基于第二网络参数(即第二回传参数和迭代更新后的第二模型参数)确定目标用户在各药物作用下的各第二目标奖励参数,例如,第二网络参数可以为药物奖励预测模型收敛后全连接层201b中的第二网络参数。其中,目标用户在一种药物作用下对应一个第二目标奖励参数。In some feasible implementations, the computer device may determine each first target reward parameter of the target user under the action of each drug based on the first network parameters (ie, the first return parameter and the iteratively updated first model parameter), for example , the first network parameter may be the first network parameter in the fully connected layer 200b after the drug reward prediction model converges. The target user corresponds to a first target reward parameter under the action of a drug. Further, the computer device may determine each second target reward parameter of the target user under the action of each drug, for example, the second network parameter based on the second network parameter (ie, the second return parameter and the iteratively updated second model parameter). The second network parameter in the fully connected layer 201b after convergence of the drug reward prediction model may be. The target user corresponds to a second target reward parameter under the action of a drug.
步骤S103,基于目标用户的各第一目标奖励参数和/或目标用户的各第二目标奖励参数,确定目标用户在各药物作用下的各用户奖励参数。Step S103: Determine each user reward parameter of the target user under the action of each drug based on each first target reward parameter of the target user and/or each second target reward parameter of the target user.
在一些可行的实施方式中,在目标用户同时具有长期药物作用需求和短期药物作用需求时,计算机设备可以确定第一目标奖励参数的第一加权系数和第二目标奖励参数的第二加权系数。这里的第一加权系数(如1或者其它数值)和第二加权系数(如1或者其它数值)可以为用户设置的加权系数或者药物奖励预测模型默认配置的加权系数。这时,计算机设备可以基于第一加权系数和目标用户的各第一目标奖励参数确定各第一目标奖励参数对应的各第一加权奖励参数,并基于第二加权系数和目标用户的各第二目标奖励参数确定各第二目标奖励参数对应的各第二加权奖励参数。进一步地,计算机设备可以对各第一加权奖励参数和各第二加权奖励参数求和得到目标用户在各药物作用下的各用户奖励参数,一个第一加权奖励参数和一个第二加权奖励参数对应一个用户奖励参数。可选的,计算机设备也可以对各第一目标奖励参数和各第二目标奖励参数直接求和得到目标用户在各药物作用下的各用户奖励参数,一个第一目标奖励参数和一个第二目标奖励参数对应一个用户奖励参数。In some possible implementations, when the target user has both long-term and short-term drug action needs, the computer device may determine a first weighting coefficient for the first target reward parameter and a second weighting coefficient for the second target reward parameter. The first weighting coefficient (eg, 1 or other numerical values) and the second weighting coefficient (eg, 1 or other numerical values) here may be the weighting coefficients set by the user or the weighting coefficients configured by default in the drug reward prediction model. At this time, the computer device may determine each first weighted reward parameter corresponding to each first target reward parameter based on the first weighting coefficient and each first target reward parameter of the target user, and based on the second weighting coefficient and each second target user's second reward parameter The target reward parameter determines each second weighted reward parameter corresponding to each second target reward parameter. Further, the computer equipment can sum up each first weighted reward parameter and each second weighted reward parameter to obtain each user reward parameter of the target user under the action of each drug, and a first weighted reward parameter corresponds to a second weighted reward parameter. A user reward parameter. Optionally, the computer equipment can also directly sum up each first target reward parameter and each second target reward parameter to obtain each user reward parameter of the target user under the action of each drug, a first target reward parameter and a second target reward parameter. The reward parameter corresponds to a user reward parameter.
可选的,在一些可行的实施方式中,在目标用户具有长期药物作用需求时,计算机设备可以将目标用户的各第一目标奖励参数确定为目标用户在各药物作用下的各用户奖励参数。可选的,在目标用户具有短期药物作用需求时,计算机设备可以将目标用户的各第二目标奖励参数确定为目标用户在各药物作用下的各用户奖励参数,具体可根据实际应用场景确定,在此不作限制。Optionally, in some feasible implementations, when the target user has long-term drug action needs, the computer device may determine each first target reward parameter of the target user as each user reward parameter of the target user under each drug action. Optionally, when the target user has a short-term drug action requirement, the computer device may determine each second target reward parameter of the target user as each user reward parameter of the target user under the action of each drug, which can be specifically determined according to the actual application scenario, There is no restriction here.
步骤S104,从各用户奖励参数中确定出最大用户奖励参数,并将具有最大用户奖励参数的目标药物的药物信息输出至用户界面,以向目标用户展示目标药物。In step S104, the maximum user reward parameter is determined from the user reward parameters, and the drug information of the target drug with the maximum user reward parameter is output to the user interface to display the target drug to the target user.
在一些可行的实施方式中,计算机设备可以对各用户奖励参数进行排序(比如从大到小排序或者从小到大排序),得到用户奖励参数序列,并将用户奖励参数序列中的第一个或者最后一个用户奖励参数作为最大用户奖励参数。进一步地,计算机设备可以将具有最大用户奖励参数的目标药物的药物信息输出至用户界面,以向目标用户展示目标药物。以糖尿病药物信息推送场景为例,在目标用户的药物作用需求为长期内不出现糖尿病的并发症时,最大用户奖励参数可以为各第一目标奖励参数中的最大第一目标奖励参数,这时计算机设备可将具有最大第一目标奖励参数的目标药物的药物信息输出至用户界面。在目标用户的药物作用需求为短期内糖化血红蛋白值达标时,最大用户奖励参数可以为各第二目标奖励参数中的最大第二目标奖励参数,这时计算机设备可将具有最大第二目标奖励参数的目标药物的药物信息输出至用户界面。在目标用户的药物作用需求为长期内不出现糖尿病的并发症、且短期内糖化血红蛋白值达标时,各用户奖励参数可由各第一加权奖励参数和各第二加权奖励参数确定,这时计算机设备可将具有最大用户奖励参数的目标药物的药物信息输出至用户界面。In some feasible implementations, the computer device may sort each user reward parameter (such as from large to small or from small to large) to obtain a sequence of user reward parameters, and assign the first or The last user reward parameter is used as the maximum user reward parameter. Further, the computer device may output the drug information of the target drug with the maximum user reward parameter to the user interface to present the target drug to the target user. Taking the scenario of diabetes drug information push as an example, when the target user's drug action requirement is that there will be no complications of diabetes in the long term, the maximum user reward parameter can be the maximum first target reward parameter among the first target reward parameters. At this time, The computer device may output medication information for the target medication having the largest first target reward parameter to the user interface. When the drug action requirement of the target user is that the glycated hemoglobin value reaches the standard in the short term, the maximum user reward parameter may be the largest second target reward parameter among the second target reward parameters, and the computer device may have the largest second target reward parameter. The drug information of the target drug is output to the user interface. When the drug action demand of the target user is that there will be no complications of diabetes in the long term, and the glycated hemoglobin value reaches the standard in the short term, the reward parameters of each user can be determined by each first weighted reward parameter and each second weighted reward parameter. At this time, the computer equipment Medication information for the target medication with the maximum user reward parameter can be output to the user interface.
在一些可行的实施方式中,这时目标用户可以该用户界面上查看该目标药物,并向计 算机设备发送针对该目标药物的反馈信息。例如,反馈信息可包括目标药物与目标用户之前服用的历史药物不同、或者目标用户服用目标药物的效果不如服用历史药物的效果。进一步地,计算机设备在接收到该反馈信息(如目标药物与历史药物不同、或者目标药物的效果不如历史药物的效果)之后,可以调整药物奖励预测模型的第一网络参数和第二网络参数以更好地预测任一用户(如目标用户)在各药物作用下的第一奖励参数和第二奖励参数,进而向目标用户推送合适的药物信息。In some possible implementations, the target user can view the target drug on the user interface at this time, and send feedback information for the target drug to the computer device. For example, the feedback information may include that the target drug is different from the historical drug previously taken by the target user, or the effect of the target user taking the target drug is not as good as the effect of taking the historical drug. Further, after receiving the feedback information (for example, the target drug is different from the historical drug, or the effect of the target drug is not as good as the effect of the historical drug), the computer device can adjust the first network parameter and the second network parameter of the drug reward prediction model to It can better predict the first reward parameter and the second reward parameter of any user (such as the target user) under the action of each drug, and then push appropriate drug information to the target user.
在本申请实施例中,计算机设备可将目标用户属性信息输入药物奖励预测模型,并通过药物奖励预测模型输出目标用户在各药物作用下的各第一目标奖励参数和各第二目标奖励参数,从而可实现药物奖励预测模型同时输出第一目标奖励参数和第二目标奖励参数,通过第一目标奖励参数来评估长期结局的奖励参数,并通过第二目标奖励参数来评估短期结局的奖励参数,从而增强了药物奖励预测模型的可扩展性,并且提高了模型的可解释性、安全性、可选择性以及可追溯性。进一步地,计算机设备可以基于目标用户的各第一目标奖励参数和/或目标用户的各第二目标奖励参数,确定目标用户在各药物作用下的各用户奖励参数。这时,计算机设备可以从各用户奖励参数中确定出最大用户奖励参数,并将具有最大用户奖励参数的目标药物的药物信息输出至用户界面,以向目标用户展示目标药物,从而提高了药物信息推送的精准度,适用性强。In the embodiment of the present application, the computer device may input the attribute information of the target user into the drug reward prediction model, and output each first target reward parameter and each second target reward parameter of the target user under the action of each drug through the drug reward prediction model, Thereby, the drug reward prediction model can output the first target reward parameter and the second target reward parameter at the same time, the reward parameter of the long-term outcome is evaluated by the first target reward parameter, and the reward parameter of the short-term outcome is evaluated by the second target reward parameter, Thus, the scalability of the drug reward prediction model is enhanced, and the interpretability, safety, selectivity and traceability of the model are improved. Further, the computer device may determine each user reward parameter of the target user under the action of each drug based on each first target reward parameter of the target user and/or each second target reward parameter of the target user. At this time, the computer device can determine the maximum user reward parameter from the user reward parameters, and output the drug information of the target drug with the maximum user reward parameter to the user interface, so as to display the target drug to the target user, thereby improving the drug information Pushing accuracy, strong applicability.
进一步地,请参见图4,图4是本申请实施例提供的一种药物信息推送装置的结构示意图。该药物信息推送装置可以是运行于计算机设备中的一个计算机程序(包括程序代码),例如,该药物信息推送装置为一个应用软件;该药物信息推送装置可以用于执行本申请实施例提供的方法中的相应步骤。如图4所示,该药物信息推送装置1可以运行于计算机设备,该计算机设备可以为上述图1所对应实施例中的服务器10。该药物信息推送装置1可以包括:数据获取模块10、样本输入模块20、参数训练模块30、信息输入模块40、参数输出模块50、参数确定模块60以及信息展示模块70。Further, please refer to FIG. 4 , which is a schematic structural diagram of a drug information push device provided by an embodiment of the present application. The drug information push device may be a computer program (including program code) running in a computer device, for example, the drug information push device is an application software; the drug information push device may be used to execute the method provided by the embodiments of the present application corresponding steps in . As shown in FIG. 4 , the drug information pushing apparatus 1 may run on a computer device, and the computer device may be the server 10 in the embodiment corresponding to FIG. 1 above. The drug information pushing device 1 may include: a data acquisition module 10 , a sample input module 20 , a parameter training module 30 , an information input module 40 , a parameter output module 50 , a parameter determination module 60 and an information display module 70 .
信息输入模块40,用于获取目标用户的目标用户属性信息,将目标用户属性信息输入药物奖励预测模型,目标用户属性信息包括人口统计学信息、针对目标疾病用药的健康指标以及历史用药信息中的至少一种。The information input module 40 is used to obtain the target user attribute information of the target user, and input the target user attribute information into the drug reward prediction model. at least one.
在一些可行的实施方式中,用户界面包括属性信息输入区域;In some possible implementations, the user interface includes an attribute information input area;
上述信息输入模块40包括:信息获取单元401。The above-mentioned information input module 40 includes: an information acquisition unit 401 .
信息获取单元401,用于在检测到属性信息输入区域上的输入指令时,基于输入指令获取目标用户的目标用户属性信息。The information acquisition unit 401 is configured to acquire target user attribute information of the target user based on the input instruction when an input instruction on the attribute information input area is detected.
其中,该信息获取单元401的具体实现方式可以参见上述图2所对应实施例中对步骤S101的描述,这里将不再继续进行赘述。For the specific implementation manner of the information obtaining unit 401, reference may be made to the description of step S101 in the embodiment corresponding to FIG. 2, which will not be repeated here.
参数输出模块50,用于通过药物奖励预测模型输出目标用户在各药物作用下的各第一目标奖励参数和各第二目标奖励参数,其中,药物奖励预测模型包括第一网络参数和第二网络参数,第一网络参数用于确定具有任一用户属性信息的任一用户在各种药物作用下的第一奖励参数,第二网络参数用于确定任一用户在各种药物作用下的第二奖励参数,任一用户在一种药物作用下对应一个第一奖励参数和一个第二奖励参数,第一奖励参数对应的药物作用时长大于第二奖励参数对应的药物作用时长。The parameter output module 50 is configured to output each first target reward parameter and each second target reward parameter of the target user under the action of each drug through a drug reward prediction model, wherein the drug reward prediction model includes a first network parameter and a second network parameters, the first network parameter is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine the second reward parameter of any user under the action of various drugs. Reward parameters, any user corresponds to a first reward parameter and a second reward parameter under the action of a drug, and the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter.
参数确定模块60,用于基于目标用户的各第一目标奖励参数和/或目标用户的各第二目标奖励参数,确定目标用户在各药物作用下的各用户奖励参数,其中,目标用户在一种药物作用下对应一个用户奖励参数。The parameter determination module 60 is configured to determine each user reward parameter of the target user under the action of each drug based on each first target reward parameter of the target user and/or each second target reward parameter of the target user, wherein the target user is a Each drug corresponds to a user reward parameter.
在一些可行的实施方式中,参数确定模块60包括:加权系数确定单元601、第一奖励参数确定单元602以及第二奖励参数确定单元603。In some feasible implementations, the parameter determination module 60 includes: a weighting coefficient determination unit 601 , a first reward parameter determination unit 602 and a second reward parameter determination unit 603 .
加权系数确定单元601,用于确定第一目标奖励参数的第一加权系数和第二目标奖励 参数的第二加权系数;Weighting coefficient determination unit 601, for determining the first weighting coefficient of the first target reward parameter and the second weighting coefficient of the second target reward parameter;
第一奖励参数确定单元602,用于基于第一加权系数和目标用户的各第一目标奖励参数确定各第一目标奖励参数对应的各第一加权奖励参数,并基于第二加权系数和目标用户的各第二目标奖励参数确定各第二目标奖励参数对应的各第二加权奖励参数;The first reward parameter determination unit 602 is configured to determine each first weighted reward parameter corresponding to each first target reward parameter based on the first weighting coefficient and each first target reward parameter of the target user, and based on the second weighting coefficient and the target user The second target reward parameters of each second target reward parameter determine each second weighted reward parameter corresponding to each second target reward parameter;
第二奖励参数确定单元603,用于基于各第一加权奖励参数和各第二加权奖励参数确定目标用户在各药物作用下的各用户奖励参数,一个第一加权奖励参数和一个第二加权奖励参数对应一个用户奖励参数。The second reward parameter determining unit 603 is configured to determine, based on each first weighted reward parameter and each second weighted reward parameter, each user reward parameter of the target user under the action of each drug, a first weighted reward parameter and a second weighted reward The parameter corresponds to a user reward parameter.
其中,该加权系数确定单元601、第一奖励参数确定单元602以及第二奖励参数确定单元603的具体实现方式可以参见上述图2所对应实施例中对步骤S103的描述,这里将不再继续进行赘述。The specific implementation of the weighting coefficient determination unit 601, the first reward parameter determination unit 602, and the second reward parameter determination unit 603 can refer to the description of step S103 in the embodiment corresponding to FIG. 2, and will not be continued here. Repeat.
在一些可行的实施方式中,上述参数确定模块60还包括:第三奖励参数确定单元604。In some feasible implementations, the above parameter determination module 60 further includes: a third reward parameter determination unit 604 .
第三奖励参数确定单元604,用于将目标用户的各第一目标奖励参数确定为目标用户在各药物作用下的各用户奖励参数;The third reward parameter determining unit 604 is configured to determine each first target reward parameter of the target user as each user reward parameter of the target user under the action of each drug;
其中,最大用户奖励参数为各第一目标奖励参数中的最大第一目标奖励参数。The maximum user reward parameter is the maximum first target reward parameter among the first target reward parameters.
其中,该第三奖励参数确定单元604的具体实现方式可以参见上述图2所对应实施例中对步骤S103的描述,这里将不再继续进行赘述。The specific implementation of the third reward parameter determining unit 604 may refer to the description of step S103 in the above-mentioned embodiment corresponding to FIG. 2 , which will not be repeated here.
在一些可行的实施方式中,上述参数确定模块60还包括:第四奖励参数确定单元605。In some feasible implementations, the above parameter determination module 60 further includes: a fourth reward parameter determination unit 605 .
第四奖励参数确定单元605,用于将目标用户的各第二目标奖励参数确定为目标用户在各药物作用下的各用户奖励参数;The fourth reward parameter determination unit 605 is configured to determine each second target reward parameter of the target user as each user reward parameter of the target user under the action of each drug;
其中,最大用户奖励参数为各第二目标奖励参数中的最大第二目标奖励参数。The maximum user reward parameter is the maximum second target reward parameter among the second target reward parameters.
其中,该第四奖励参数确定单元605的具体实现方式可以参见上述图2所对应实施例中对步骤S103的描述,这里将不再继续进行赘述。The specific implementation of the fourth reward parameter determination unit 605 may refer to the description of step S103 in the above-mentioned embodiment corresponding to FIG. 2 , which will not be repeated here.
信息展示模块70,用于从各用户奖励参数中确定出最大用户奖励参数,并将具有最大用户奖励参数的目标药物的药物信息输出至用户界面,以向目标用户展示目标药物。The information display module 70 is used for determining the maximum user reward parameter from each user reward parameter, and outputting the drug information of the target drug with the maximum user reward parameter to the user interface to display the target drug to the target user.
在一些可行的实施方式中,上述药物信息推送装置1还包括:In some feasible embodiments, the above-mentioned drug information push device 1 further includes:
数据获取模块10,用于获取至少两个用户的样本数据,一个用户的样本数据包括用户的用户属性信息和样本药物信息;The data acquisition module 10 is used for acquiring sample data of at least two users, and the sample data of one user includes user attribute information and sample drug information of the user;
样本输入模块20,用于获取各用户在样本药物信息所指示的样本药物作用下的各第一样本奖励参数和各第二样本奖励参数,并将至少两个用户的样本数据、各第一样本奖励参数以及各第二样本奖励参数输入药物奖励预测模型;The sample input module 20 is used to obtain each first sample reward parameter and each second sample reward parameter of each user under the action of the sample drug indicated by the sample drug information, and combine the sample data of at least two users, each first sample reward parameter The sample reward parameters and the second sample reward parameters are input into the drug reward prediction model;
参数训练模块30,用于基于至少两个用户的用户属性信息、各第一样本奖励参数以及各第二样本奖励参数训练药物奖励预测模型的第一网络参数和第二网络参数,以获取基于任一用户的用户属性信息预测任一用户在各药物作用下的第一奖励参数和第二奖励参数的能力。The parameter training module 30 is used for training the first network parameters and the second network parameters of the drug reward prediction model based on the user attribute information of at least two users, each first sample reward parameter and each second sample reward parameter, so as to obtain the first network parameter and the second network parameter of the drug reward prediction model. The user attribute information of any user predicts the ability of the first reward parameter and the second reward parameter of any user under the action of each drug.
在一些可行的实施方式中,第一网络参数包括第一模型参数和第一回传参数,第二网络参数包括第二模型参数和第二回传参数;In some feasible implementation manners, the first network parameters include first model parameters and first backhaul parameters, and the second network parameters include second model parameters and second backhaul parameters;
上述参数训练模块30包括:预期参数确定单元301、损失值确定单元302以及参数更新单元303。The above-mentioned parameter training module 30 includes: an expected parameter determination unit 301 , a loss value determination unit 302 and a parameter update unit 303 .
预期参数确定单元301,用于基于第一模型参数和第一回传参数确定各用户在样本药物作用下的各第一预期奖励参数,并基于第二模型参数和第二回传参数确定各用户在样本药物作用下的各第二预期奖励参数;An expected parameter determination unit 301, configured to determine each first expected reward parameter of each user under the action of the sample drug based on the first model parameter and the first returned parameter, and determine each user based on the second model parameter and the second returned parameter each second expected reward parameter under the action of the sample drug;
损失值确定单元302,用于基于第一回传参数、第二回传参数、各第一样本奖励参数、各第二样本奖励参数、各第一预期奖励参数以及各第二预期奖励参数确定各用户的样本数据对应的各损失值;The loss value determination unit 302 is configured to determine based on the first return parameter, the second return parameter, each first sample reward parameter, each second sample reward parameter, each first expected reward parameter and each second expected reward parameter Each loss value corresponding to each user's sample data;
参数更新单元303,用于基于各损失值迭代更新第一模型参数的参数值和第二模型参数的参数值直至损失值不变,以获取基于任一用户的用户属性信息预测任一用户在各药物作用下的第一奖励参数和第二奖励参数的能力。The parameter updating unit 303 is configured to iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter based on each loss value until the loss value remains unchanged, so as to obtain a prediction based on the user attribute information of any user in each user. The ability of the first reward parameter and the second reward parameter under the action of the drug.
其中,该预期参数确定单元301、损失值确定单元302以及参数更新单元303的具体实现方式可以参见上述图2所对应实施例的步骤S101中对药物奖励预测模型进行模型训练的描述,这里将不再继续进行赘述。The specific implementation of the expected parameter determination unit 301, the loss value determination unit 302 and the parameter update unit 303 can be referred to the description of the model training of the drug reward prediction model in step S101 of the above-mentioned embodiment corresponding to FIG. 2, which will not be discussed here. Let's go on and on.
其中,该数据获取模块10、样本输入模块20、参数训练模块30、信息输入模块40、参数输出模块50、参数确定模块60以及信息展示模块70的具体实现方式可以参见上述图2所对应实施例中对步骤S101-步骤S104的描述,这里将不再继续进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。The specific implementation of the data acquisition module 10 , the sample input module 20 , the parameter training module 30 , the information input module 40 , the parameter output module 50 , the parameter determination module 60 and the information display module 70 may refer to the embodiment corresponding to FIG. 2 above. The description of step S101 to step S104 in , will not be repeated here. In addition, the description of the beneficial effects of using the same method will not be repeated.
进一步地,请参见图5,图5是本申请实施例提供的一种计算机设备的结构示意图。该计算机设备可包括处理器、存储器以及网络接口。可选的,该计算机设备还可包括用户接口。例如,如图5所示,该计算机设备1000可以为上述图1对应实施例中的服务器10,该计算机设备1000可以包括:至少一个处理器1001,例如CPU,至少一个网络接口1004,用户接口1003,存储器1005,至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(display)、键盘(keyboard),网络接口1004可选地可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005可选地还可以是至少一个位于远离前述处理器1001的存储装置。如图5所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。Further, please refer to FIG. 5 , which is a schematic structural diagram of a computer device provided by an embodiment of the present application. The computer device may include a processor, memory, and a network interface. Optionally, the computer device may also include a user interface. For example, as shown in FIG. 5 , the computer device 1000 may be the server 10 in the above-mentioned embodiment corresponding to FIG. 1 , and the computer device 1000 may include: at least one processor 1001 , such as a CPU, at least one network interface 1004 , and user interface 1003 , memory 1005 , at least one communication bus 1002 . Among them, the communication bus 1002 is used to realize the connection and communication between these components. Wherein, the user interface 1003 may include a display screen (display) and a keyboard (keyboard), and the network interface 1004 may optionally include a standard wired interface and a wireless interface (eg, a WI-FI interface). The memory 1005 may be high-speed RAM memory or non-volatile memory, such as at least one disk memory. The memory 1005 may optionally also be at least one storage device located remotely from the aforementioned processor 1001 . As shown in FIG. 5 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
在图5所示的计算机设备1000中,网络接口1004主要用于与用户终端进行网络通信;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现:In the computer device 1000 shown in FIG. 5 , the network interface 1004 is mainly used for network communication with the user terminal; the user interface 1003 is mainly used to provide an input interface for the user; device control application to achieve:
获取目标用户的目标用户属性信息,将目标用户属性信息输入药物奖励预测模型,目标用户属性信息包括人口统计学信息、针对目标疾病用药的健康指标以及历史用药信息中的至少一种;Obtain target user attribute information of the target user, input the target user attribute information into the drug reward prediction model, and the target user attribute information includes at least one of demographic information, health indicators for drug use for the target disease, and historical drug use information;
通过药物奖励预测模型输出目标用户在各药物作用下的各第一目标奖励参数和各第二目标奖励参数,其中,药物奖励预测模型包括第一网络参数和第二网络参数,第一网络参数用于确定具有任一用户属性信息的任一用户在各种药物作用下的第一奖励参数,第二网络参数用于确定任一用户在各种药物作用下的第二奖励参数,任一用户在一种药物作用下对应一个第一奖励参数和一个第二奖励参数,第一奖励参数对应的药物作用时长大于第二奖励参数对应的药物作用时长;Each first target reward parameter and each second target reward parameter of the target user under the action of each drug are output through the drug reward prediction model, wherein the drug reward prediction model includes the first network parameter and the second network parameter, and the first network parameter uses It is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine the second reward parameter of any user under the action of various drugs, and any user is under the action of various drugs. A drug corresponds to a first reward parameter and a second reward parameter, and the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter;
基于目标用户的各第一目标奖励参数和/或目标用户的各第二目标奖励参数,确定目标用户在各药物作用下的各用户奖励参数,其中,目标用户在一种药物作用下对应一个用户奖励参数;Based on each first target reward parameter of the target user and/or each second target reward parameter of the target user, each user reward parameter of the target user under the action of each drug is determined, wherein the target user corresponds to one user under the action of one drug reward parameters;
从各用户奖励参数中确定出最大用户奖励参数,并将具有最大用户奖励参数的目标药物的药物信息输出至用户界面,以向目标用户展示目标药物。The maximum user reward parameter is determined from each user reward parameter, and the drug information of the target drug with the maximum user reward parameter is output to the user interface to display the target drug to the target user.
应当理解,本申请实施例中所描述的计算机设备1000可执行前文图2所对应实施例中对该药物信息推送方法的描述,也可执行前文图4所对应实施例中对该药物信息推送装置1的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。It should be understood that the computer device 1000 described in the embodiment of the present application can execute the description of the method for pushing drug information in the embodiment corresponding to FIG. 2 above, and can also execute the device for pushing drug information in the embodiment corresponding to FIG. 4 above. The description of 1 will not be repeated here. In addition, the description of the beneficial effects of using the same method will not be repeated.
此外,这里需要指出的是:本申请实施例还提供了一种计算机可读存储介质,且该计算机可读存储介质中存储有前文提及的药物信息推送装置1所执行的计算机程序,且该计算机程序包括程序指令,当该处理器执行该程序指令时,能够执行前文图2所对应实施例 中对该药物信息推送方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。In addition, it should be pointed out here that the embodiments of the present application further provide a computer-readable storage medium, and the computer-readable storage medium stores the computer program executed by the aforementioned drug information pushing device 1, and the computer program is stored in the computer-readable storage medium. The computer program includes program instructions, and when the processor executes the program instructions, it can execute the description of the drug information pushing method in the embodiment corresponding to FIG. 2 above, and therefore will not be repeated here. In addition, the description of the beneficial effects of using the same method will not be repeated.
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。Optionally, the storage medium involved in this application, such as a computer-readable storage medium, may be non-volatile or volatile.
对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。作为示例,程序指令可被部署为在一个计算设备上执行,或者在位于一个地点的多个计算设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算设备上执行,分布在多个地点且通过通信网络互连的多个计算设备可以组成区块链系统。For technical details not disclosed in the computer-readable storage medium embodiments involved in the present application, please refer to the description of the method embodiments of the present application. By way of example, program instructions may be deployed to execute on one computing device, or on multiple computing devices located at one site, or alternatively, on multiple computing devices distributed across multiple sites and interconnected by a communications network Implemented, multiple computing devices distributed in multiple locations and interconnected by a communication network can form a blockchain system.
本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行本申请实施例中提供的药物信息推送方法。In one aspect of the present application, there is provided a computer program product or computer program, the computer program product or computer program including computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method for pushing drug information provided in the embodiments of the present application.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,上述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,上述的存储介质可为磁碟、光盘、只读存储记忆体(read-only memory,ROM)或随机存储记忆体(randomaccess memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the process in the method of the above embodiment can be implemented by instructing the relevant hardware through a computer program, and the above program can be stored in a computer-readable storage medium, and the program is in During execution, it may include the processes of the embodiments of the above-mentioned methods. Wherein, the above-mentioned storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM) or the like.
上述计算机可读存储介质可以是前述任一实施例提供的药物信息推送装置或者上述设备的内部存储单元,例如电子设备的硬盘或内存。该计算机可读存储介质也可以是该电子设备的外部存储设备,例如该电子设备上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。上述计算机可读存储介质还可以包括磁碟、光盘、只读存储记忆体(read-only memory,ROM)或随机存储记忆体(random access memory,RAM)等。进一步地,该计算机可读存储介质还可以既包括该电子设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该电子设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。The above-mentioned computer-readable storage medium may be the drug information pushing apparatus provided in any of the foregoing embodiments or an internal storage unit of the above-mentioned device, such as a hard disk or a memory of an electronic device. The computer-readable storage medium can also be an external storage device of the electronic device, such as a pluggable hard disk, a smart media card (SMC), a secure digital (SD) card equipped on the electronic device, Flash card (flash card), etc. The above-mentioned computer-readable storage medium may also include a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM), and the like. Further, the computer-readable storage medium may also include both an internal storage unit of the electronic device and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device. The computer-readable storage medium can also be used to temporarily store data that has been or will be output.
本申请的权利要求书和说明书及附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置展示该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。The terms "first", "second" and the like in the claims and description of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices. Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearance of this phrase in various places in the specification is not necessarily all referring to the same embodiment, nor is it a separate or alternative embodiment that is mutually exclusive with other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments. As used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of the two. Interchangeability, the above description has generally described the components and steps of each example in terms of function. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。The above disclosures are only the preferred embodiments of the present application, and of course, the scope of the rights of the present application cannot be limited by this. Therefore, equivalent changes made according to the claims of the present application are still within the scope of the present application.

Claims (20)

  1. 一种药物信息推送方法,包括:A method for pushing drug information, comprising:
    获取目标用户的目标用户属性信息,将所述目标用户属性信息输入药物奖励预测模型,所述目标用户属性信息包括人口统计学信息、针对目标疾病用药的健康指标以及历史用药信息中的至少一种;Obtain target user attribute information of the target user, and input the target user attribute information into the drug reward prediction model, where the target user attribute information includes at least one of demographic information, health indicators for drug use for the target disease, and historical drug use information ;
    通过所述药物奖励预测模型输出所述目标用户在各药物作用下的各第一目标奖励参数和各第二目标奖励参数,其中,所述药物奖励预测模型包括第一网络参数和第二网络参数,所述第一网络参数用于确定具有任一用户属性信息的任一用户在各种药物作用下的第一奖励参数,所述第二网络参数用于确定所述任一用户在各种药物作用下的第二奖励参数,所述任一用户在一种药物作用下对应一个第一奖励参数和一个第二奖励参数,所述第一奖励参数对应的药物作用时长大于所述第二奖励参数对应的药物作用时长;Output each first target reward parameter and each second target reward parameter of the target user under the action of each drug through the drug reward prediction model, wherein the drug reward prediction model includes a first network parameter and a second network parameter , the first network parameter is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine that any user is under the action of various drugs The second reward parameter under the action, the any user corresponds to a first reward parameter and a second reward parameter under the action of a drug, and the drug action duration corresponding to the first reward parameter is longer than the second reward parameter The corresponding duration of action of the drug;
    基于所述目标用户的各第一目标奖励参数和/或所述目标用户的各第二目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,其中,所述目标用户在一种药物作用下对应一个用户奖励参数;Based on each first target reward parameter of the target user and/or each second target reward parameter of the target user, each user reward parameter of the target user under the action of each drug is determined, wherein the target The user corresponds to a user reward parameter under the action of a drug;
    从所述各用户奖励参数中确定出最大用户奖励参数,并将具有所述最大用户奖励参数的目标药物的药物信息输出至用户界面,以向所述目标用户展示所述目标药物。A maximum user reward parameter is determined from the user reward parameters, and drug information of the target drug with the maximum user reward parameter is output to the user interface, so as to display the target drug to the target user.
  2. 根据权利要求1所述的方法,其中,所述方法还包括:The method of claim 1, wherein the method further comprises:
    获取至少两个用户的样本数据,一个用户的样本数据包括所述用户的用户属性信息和样本药物信息;Obtain sample data of at least two users, and the sample data of one user includes user attribute information and sample drug information of the user;
    获取各用户在所述样本药物信息所指示的样本药物作用下的各第一样本奖励参数和各第二样本奖励参数,并将所述至少两个用户的样本数据、所述各第一样本奖励参数以及所述各第二样本奖励参数输入药物奖励预测模型;Obtain each first sample reward parameter and each second sample reward parameter of each user under the action of the sample drug indicated by the sample drug information, and combine the sample data of the at least two users, the first sample data This reward parameter and each of the second sample reward parameters are input into the drug reward prediction model;
    基于所述至少两个用户的用户属性信息、所述各第一样本奖励参数以及所述各第二样本奖励参数训练所述药物奖励预测模型的第一网络参数和第二网络参数,以获取基于任一用户的用户属性信息预测所述任一用户在各药物作用下的第一奖励参数和第二奖励参数的能力。Based on the user attribute information of the at least two users, the first sample reward parameters and the second sample reward parameters, the first network parameters and the second network parameters of the drug reward prediction model are trained to obtain The ability of predicting the first reward parameter and the second reward parameter of any user under the action of each drug based on the user attribute information of any user.
  3. 根据权利要求2所述的方法,其中,所述第一网络参数包括第一模型参数和第一回传参数,所述第二网络参数包括第二模型参数和第二回传参数;The method according to claim 2, wherein the first network parameter comprises a first model parameter and a first backhaul parameter, and the second network parameter comprises a second model parameter and a second backhaul parameter;
    所述基于所述至少两个用户的用户属性信息、所述各第一样本奖励参数以及所述各第二样本奖励参数训练所述药物奖励预测模型的第一网络参数和第二网络参数,包括:the first network parameters and the second network parameters for training the drug reward prediction model based on the user attribute information of the at least two users, the first sample reward parameters and the second sample reward parameters, include:
    基于所述第一模型参数和所述第一回传参数确定所述各用户在所述样本药物作用下的各第一预期奖励参数,并基于所述第二模型参数和所述第二回传参数确定所述各用户在所述样本药物作用下的各第二预期奖励参数;Each first expected reward parameter of each user under the action of the sample drug is determined based on the first model parameter and the first return parameter, and based on the second model parameter and the second return parameter The parameter determines each second expected reward parameter of each user under the action of the sample drug;
    基于所述第一回传参数、所述第二回传参数、所述各第一样本奖励参数、所述各第二样本奖励参数、所述各第一预期奖励参数以及所述各第二预期奖励参数确定所述各用户的样本数据对应的各损失值;Based on the first return parameters, the second return parameters, the first sample reward parameters, the second sample reward parameters, the first expected reward parameters, and the second The expected reward parameter determines each loss value corresponding to the sample data of each user;
    基于所述各损失值迭代更新所述第一模型参数的参数值和所述第二模型参数的参数值直至所述损失值不变,以获取基于任一用户的用户属性信息预测所述任一用户在各药物作用下的第一奖励参数和第二奖励参数的能力。Iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter based on the loss values until the loss value remains unchanged, so as to obtain the prediction of any user based on the user attribute information of any user The ability of the user's first reward parameter and second reward parameter under the action of each drug.
  4. 根据权利要求3所述的方法,其中,所述基于所述目标用户的各第一目标奖励参数和所述目标用户的各第二目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,包括:The method according to claim 3, wherein, based on each first target reward parameter of the target user and each second target reward parameter of the target user, it is determined that the target user is under the action of each drug Each user reward parameter of , including:
    确定所述第一目标奖励参数的第一加权系数和所述第二目标奖励参数的第二加权系数;determining a first weighting coefficient of the first target reward parameter and a second weighting coefficient of the second target reward parameter;
    基于所述第一加权系数和所述目标用户的各第一目标奖励参数确定所述各第一目标奖 励参数对应的各第一加权奖励参数,并基于所述第二加权系数和所述目标用户的各第二目标奖励参数确定所述各第二目标奖励参数对应的各第二加权奖励参数;Each first weighted reward parameter corresponding to each of the first target reward parameters is determined based on the first weighting coefficient and each first target reward parameter of the target user, and based on the second weighting coefficient and the target user each of the second target reward parameters of the
    基于所述各第一加权奖励参数和所述各第二加权奖励参数确定所述目标用户在所述各药物作用下的各用户奖励参数,一个第一加权奖励参数和一个第二加权奖励参数对应一个用户奖励参数。Each user reward parameter of the target user under the action of each drug is determined based on each first weighted reward parameter and each second weighted reward parameter, and a first weighted reward parameter corresponds to a second weighted reward parameter A user reward parameter.
  5. 根据权利要求3所述的方法,其中,所述基于所述目标用户的各第一目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,包括:The method according to claim 3, wherein the determining each user reward parameter of the target user under the action of each drug based on each first target reward parameter of the target user comprises:
    将所述目标用户的各第一目标奖励参数确定为所述目标用户在所述各药物作用下的各用户奖励参数;Determining each first target reward parameter of the target user as each user reward parameter of the target user under the action of each drug;
    其中,所述最大用户奖励参数为所述各第一目标奖励参数中的最大第一目标奖励参数。Wherein, the maximum user reward parameter is the maximum first target reward parameter among the first target reward parameters.
  6. 根据权利要求3所述的方法,其中,所述基于所述目标用户的各第二目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,包括:The method according to claim 3, wherein the determining each user reward parameter of the target user under the action of each drug based on each second target reward parameter of the target user comprises:
    将所述目标用户的各第二目标奖励参数确定为所述目标用户在所述各药物作用下的各用户奖励参数;Determining each second target reward parameter of the target user as each user reward parameter of the target user under the action of each drug;
    其中,所述最大用户奖励参数为所述各第二目标奖励参数中的最大第二目标奖励参数。Wherein, the maximum user reward parameter is the maximum second target reward parameter among the second target reward parameters.
  7. 根据权利要求1-6任一项所述的方法,其中,所述用户界面包括属性信息输入区域;The method according to any one of claims 1-6, wherein the user interface includes an attribute information input area;
    所述获取目标用户的目标用户属性信息,包括:The acquiring target user attribute information of the target user includes:
    在检测到所述属性信息输入区域上的输入指令时,基于所述输入指令获取所述目标用户的目标用户属性信息。When an input instruction on the attribute information input area is detected, target user attribute information of the target user is acquired based on the input instruction.
  8. 一种药物信息推送装置,包括:A drug information push device, comprising:
    信息输入模块,用于获取目标用户的目标用户属性信息,将所述目标用户属性信息输入药物奖励预测模型,所述目标用户属性信息包括人口统计学信息、针对目标疾病用药的健康指标以及历史用药信息中的至少一种;The information input module is used to obtain the target user attribute information of the target user, and input the target user attribute information into the drug reward prediction model. at least one of the information;
    参数输出模块,用于通过所述药物奖励预测模型输出所述目标用户在各药物作用下的各第一目标奖励参数和各第二目标奖励参数,其中,所述药物奖励预测模型包括第一网络参数和第二网络参数,所述第一网络参数用于确定具有任一用户属性信息的任一用户在各种药物作用下的第一奖励参数,所述第二网络参数用于确定所述任一用户在各种药物作用下的第二奖励参数,所述任一用户在一种药物作用下对应一个第一奖励参数和一个第二奖励参数,所述第一奖励参数对应的药物作用时长大于所述第二奖励参数对应的药物作用时长;A parameter output module, configured to output each first target reward parameter and each second target reward parameter of the target user under the action of each drug through the drug reward prediction model, wherein the drug reward prediction model includes a first network parameters and a second network parameter, the first network parameter is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine the A second reward parameter of a user under the action of various drugs, any user under the action of a drug corresponds to a first reward parameter and a second reward parameter, and the drug action duration corresponding to the first reward parameter is greater than the drug action duration corresponding to the second reward parameter;
    参数确定模块,用于基于所述目标用户的各第一目标奖励参数和/或所述目标用户的各第二目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,其中,所述目标用户在一种药物作用下对应一个用户奖励参数;A parameter determination module, configured to determine each user reward parameter of the target user under the action of each drug based on each first target reward parameter of the target user and/or each second target reward parameter of the target user , wherein the target user corresponds to a user reward parameter under the action of a drug;
    信息展示模块,用于从所述各用户奖励参数中确定出最大用户奖励参数,并将具有所述最大用户奖励参数的目标药物的药物信息输出至用户界面,以向所述目标用户展示所述目标药物。The information display module is used to determine the maximum user reward parameter from the user reward parameters, and output the drug information of the target drug with the maximum user reward parameter to the user interface, so as to display the target user. target drug.
  9. 一种计算机设备,包括:处理器、存储器以及网络接口;A computer device, comprising: a processor, a memory and a network interface;
    所述处理器与存储器、网络接口相连,其中,网络接口用于提供数据通信功能,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,执行药物信息推送方法,所述药物信息推送方法包括:The processor is connected to a memory and a network interface, wherein the network interface is used to provide a data communication function, the memory is used to store a program code, the processor is used to call the program code, and execute a drug information push method, the Drug information push methods include:
    获取目标用户的目标用户属性信息,将所述目标用户属性信息输入药物奖励预测模型,所述目标用户属性信息包括人口统计学信息、针对目标疾病用药的健康指标以及历史用药信息中的至少一种;Obtain target user attribute information of the target user, and input the target user attribute information into the drug reward prediction model, where the target user attribute information includes at least one of demographic information, health indicators for drug use for the target disease, and historical drug use information ;
    通过所述药物奖励预测模型输出所述目标用户在各药物作用下的各第一目标奖励参数 和各第二目标奖励参数,其中,所述药物奖励预测模型包括第一网络参数和第二网络参数,所述第一网络参数用于确定具有任一用户属性信息的任一用户在各种药物作用下的第一奖励参数,所述第二网络参数用于确定所述任一用户在各种药物作用下的第二奖励参数,所述任一用户在一种药物作用下对应一个第一奖励参数和一个第二奖励参数,所述第一奖励参数对应的药物作用时长大于所述第二奖励参数对应的药物作用时长;Output each first target reward parameter and each second target reward parameter of the target user under the action of each drug through the drug reward prediction model, wherein the drug reward prediction model includes a first network parameter and a second network parameter , the first network parameter is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine that any user is under the action of various drugs The second reward parameter under the action, the any user corresponds to a first reward parameter and a second reward parameter under the action of a drug, and the drug action duration corresponding to the first reward parameter is longer than the second reward parameter The corresponding duration of action of the drug;
    基于所述目标用户的各第一目标奖励参数和/或所述目标用户的各第二目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,其中,所述目标用户在一种药物作用下对应一个用户奖励参数;Based on each first target reward parameter of the target user and/or each second target reward parameter of the target user, each user reward parameter of the target user under the action of each drug is determined, wherein the target The user corresponds to a user reward parameter under the action of a drug;
    从所述各用户奖励参数中确定出最大用户奖励参数,并将具有所述最大用户奖励参数的目标药物的药物信息输出至用户界面,以向所述目标用户展示所述目标药物。A maximum user reward parameter is determined from the user reward parameters, and drug information of the target drug with the maximum user reward parameter is output to the user interface, so as to display the target drug to the target user.
  10. 根据权利要求9所述的计算机设备,其中,所述处理器执行所述药物信息推送方法时,还包括:The computer device according to claim 9, wherein when the processor executes the drug information push method, the method further comprises:
    获取至少两个用户的样本数据,一个用户的样本数据包括所述用户的用户属性信息和样本药物信息;Obtain sample data of at least two users, and the sample data of one user includes user attribute information and sample drug information of the user;
    获取各用户在所述样本药物信息所指示的样本药物作用下的各第一样本奖励参数和各第二样本奖励参数,并将所述至少两个用户的样本数据、所述各第一样本奖励参数以及所述各第二样本奖励参数输入药物奖励预测模型;Obtain each first sample reward parameter and each second sample reward parameter of each user under the action of the sample drug indicated by the sample drug information, and combine the sample data of the at least two users, the first sample data This reward parameter and each of the second sample reward parameters are input into the drug reward prediction model;
    基于所述至少两个用户的用户属性信息、所述各第一样本奖励参数以及所述各第二样本奖励参数训练所述药物奖励预测模型的第一网络参数和第二网络参数,以获取基于任一用户的用户属性信息预测所述任一用户在各药物作用下的第一奖励参数和第二奖励参数的能力。Based on the user attribute information of the at least two users, the first sample reward parameters and the second sample reward parameters, the first network parameters and the second network parameters of the drug reward prediction model are trained to obtain The ability of predicting the first reward parameter and the second reward parameter of any user under the action of each drug based on the user attribute information of any user.
  11. 根据权利要求10所述的计算机设备,其中,所述第一网络参数包括第一模型参数和第一回传参数,所述第二网络参数包括第二模型参数和第二回传参数;The computer device of claim 10, wherein the first network parameter includes a first model parameter and a first backhaul parameter, and the second network parameter includes a second model parameter and a second backhaul parameter;
    执行所述基于所述至少两个用户的用户属性信息、所述各第一样本奖励参数以及所述各第二样本奖励参数训练所述药物奖励预测模型的第一网络参数和第二网络参数,包括:Execute the training of the first network parameters and the second network parameters of the drug reward prediction model based on the user attribute information of the at least two users, the first sample reward parameters and the second sample reward parameters ,include:
    基于所述第一模型参数和所述第一回传参数确定所述各用户在所述样本药物作用下的各第一预期奖励参数,并基于所述第二模型参数和所述第二回传参数确定所述各用户在所述样本药物作用下的各第二预期奖励参数;Each first expected reward parameter of each user under the action of the sample drug is determined based on the first model parameter and the first return parameter, and based on the second model parameter and the second return parameter The parameter determines each second expected reward parameter of each user under the action of the sample drug;
    基于所述第一回传参数、所述第二回传参数、所述各第一样本奖励参数、所述各第二样本奖励参数、所述各第一预期奖励参数以及所述各第二预期奖励参数确定所述各用户的样本数据对应的各损失值;Based on the first return parameters, the second return parameters, the first sample reward parameters, the second sample reward parameters, the first expected reward parameters, and the second The expected reward parameter determines each loss value corresponding to the sample data of each user;
    基于所述各损失值迭代更新所述第一模型参数的参数值和所述第二模型参数的参数值直至所述损失值不变,以获取基于任一用户的用户属性信息预测所述任一用户在各药物作用下的第一奖励参数和第二奖励参数的能力。Iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter based on the loss values until the loss value remains unchanged, so as to obtain the prediction of any user based on the user attribute information of any user The ability of the user's first reward parameter and second reward parameter under the action of each drug.
  12. 根据权利要求11所述的计算机设备,其中,执行所述基于所述目标用户的各第一目标奖励参数和所述目标用户的各第二目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,包括:11. The computer device according to claim 11, wherein the determining that the target user is in each drug based on each first target reward parameter of the target user and each second target reward parameter of the target user is performed. Reward parameters for each user under the action, including:
    确定所述第一目标奖励参数的第一加权系数和所述第二目标奖励参数的第二加权系数;determining a first weighting coefficient of the first target reward parameter and a second weighting coefficient of the second target reward parameter;
    基于所述第一加权系数和所述目标用户的各第一目标奖励参数确定所述各第一目标奖励参数对应的各第一加权奖励参数,并基于所述第二加权系数和所述目标用户的各第二目标奖励参数确定所述各第二目标奖励参数对应的各第二加权奖励参数;Each first weighted reward parameter corresponding to each of the first target reward parameters is determined based on the first weighting coefficient and each first target reward parameter of the target user, and based on the second weighting coefficient and the target user each of the second target reward parameters of the
    基于所述各第一加权奖励参数和所述各第二加权奖励参数确定所述目标用户在所述各药物作用下的各用户奖励参数,一个第一加权奖励参数和一个第二加权奖励参数对应一个用户奖励参数。Each user reward parameter of the target user under the action of each drug is determined based on each first weighted reward parameter and each second weighted reward parameter, and a first weighted reward parameter corresponds to a second weighted reward parameter A user reward parameter.
  13. 根据权利要求11所述的计算机设备,其中,执行所述基于所述目标用户的各第一目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,包括:The computer device according to claim 11, wherein performing the determining of each user reward parameter of the target user under the action of each drug based on each first target reward parameter of the target user comprises:
    将所述目标用户的各第一目标奖励参数确定为所述目标用户在所述各药物作用下的各用户奖励参数;Determining each first target reward parameter of the target user as each user reward parameter of the target user under the action of each drug;
    其中,所述最大用户奖励参数为所述各第一目标奖励参数中的最大第一目标奖励参数。Wherein, the maximum user reward parameter is the maximum first target reward parameter among the first target reward parameters.
  14. 根据权利要求11所述的计算机设备,其中,执行所述基于所述目标用户的各第二目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,包括:The computer device according to claim 11 , wherein, executing the second target reward parameters based on the target user to determine the user reward parameters of the target user under the action of the drugs, comprising:
    将所述目标用户的各第二目标奖励参数确定为所述目标用户在所述各药物作用下的各用户奖励参数;Determining each second target reward parameter of the target user as each user reward parameter of the target user under the action of each drug;
    其中,所述最大用户奖励参数为所述各第二目标奖励参数中的最大第二目标奖励参数。Wherein, the maximum user reward parameter is the maximum second target reward parameter among the second target reward parameters.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时,执行药物信息推送方法,所述药物信息推送方法包括:A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, the computer program includes program instructions, and when the program instructions are executed by a processor, a method for pushing drug information is executed, and the drug information Push methods include:
    获取目标用户的目标用户属性信息,将所述目标用户属性信息输入药物奖励预测模型,所述目标用户属性信息包括人口统计学信息、针对目标疾病用药的健康指标以及历史用药信息中的至少一种;Obtain target user attribute information of the target user, and input the target user attribute information into the drug reward prediction model, where the target user attribute information includes at least one of demographic information, health indicators for drug use for the target disease, and historical drug use information ;
    通过所述药物奖励预测模型输出所述目标用户在各药物作用下的各第一目标奖励参数和各第二目标奖励参数,其中,所述药物奖励预测模型包括第一网络参数和第二网络参数,所述第一网络参数用于确定具有任一用户属性信息的任一用户在各种药物作用下的第一奖励参数,所述第二网络参数用于确定所述任一用户在各种药物作用下的第二奖励参数,所述任一用户在一种药物作用下对应一个第一奖励参数和一个第二奖励参数,所述第一奖励参数对应的药物作用时长大于所述第二奖励参数对应的药物作用时长;Output each first target reward parameter and each second target reward parameter of the target user under the action of each drug through the drug reward prediction model, wherein the drug reward prediction model includes a first network parameter and a second network parameter , the first network parameter is used to determine the first reward parameter of any user with any user attribute information under the action of various drugs, and the second network parameter is used to determine that any user is under the action of various drugs The second reward parameter under the action, the any user corresponds to a first reward parameter and a second reward parameter under the action of a drug, and the drug action duration corresponding to the first reward parameter is longer than the second reward parameter The corresponding duration of action of the drug;
    基于所述目标用户的各第一目标奖励参数和/或所述目标用户的各第二目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,其中,所述目标用户在一种药物作用下对应一个用户奖励参数;Based on each first target reward parameter of the target user and/or each second target reward parameter of the target user, each user reward parameter of the target user under the action of each drug is determined, wherein the target The user corresponds to a user reward parameter under the action of a drug;
    从所述各用户奖励参数中确定出最大用户奖励参数,并将具有所述最大用户奖励参数的目标药物的药物信息输出至用户界面,以向所述目标用户展示所述目标药物。A maximum user reward parameter is determined from the user reward parameters, and drug information of the target drug with the maximum user reward parameter is output to the user interface, so as to display the target drug to the target user.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述处理器执行所述药物信息推送方法时,还包括:The computer-readable storage medium according to claim 15, wherein when the processor executes the method for pushing drug information, the method further comprises:
    获取至少两个用户的样本数据,一个用户的样本数据包括所述用户的用户属性信息和样本药物信息;Obtain sample data of at least two users, and the sample data of one user includes user attribute information and sample drug information of the user;
    获取各用户在所述样本药物信息所指示的样本药物作用下的各第一样本奖励参数和各第二样本奖励参数,并将所述至少两个用户的样本数据、所述各第一样本奖励参数以及所述各第二样本奖励参数输入药物奖励预测模型;Obtain each first sample reward parameter and each second sample reward parameter of each user under the action of the sample drug indicated by the sample drug information, and combine the sample data of the at least two users, the first sample data This reward parameter and each of the second sample reward parameters are input into the drug reward prediction model;
    基于所述至少两个用户的用户属性信息、所述各第一样本奖励参数以及所述各第二样本奖励参数训练所述药物奖励预测模型的第一网络参数和第二网络参数,以获取基于任一用户的用户属性信息预测所述任一用户在各药物作用下的第一奖励参数和第二奖励参数的能力。Based on the user attribute information of the at least two users, the first sample reward parameters and the second sample reward parameters, the first network parameters and the second network parameters of the drug reward prediction model are trained to obtain The ability of predicting the first reward parameter and the second reward parameter of any user under the action of each drug based on the user attribute information of any user.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述第一网络参数包括第一模型参数和第一回传参数,所述第二网络参数包括第二模型参数和第二回传参数;17. The computer-readable storage medium of claim 16, wherein the first network parameters include first model parameters and first backhaul parameters, and the second network parameters include second model parameters and second backhaul parameters ;
    执行所述基于所述至少两个用户的用户属性信息、所述各第一样本奖励参数以及所述各第二样本奖励参数训练所述药物奖励预测模型的第一网络参数和第二网络参数,包括:Execute the training of the first network parameters and the second network parameters of the drug reward prediction model based on the user attribute information of the at least two users, the first sample reward parameters and the second sample reward parameters ,include:
    基于所述第一模型参数和所述第一回传参数确定所述各用户在所述样本药物作用下的各第一预期奖励参数,并基于所述第二模型参数和所述第二回传参数确定所述各用户在所 述样本药物作用下的各第二预期奖励参数;Each first expected reward parameter of each user under the action of the sample drug is determined based on the first model parameter and the first return parameter, and based on the second model parameter and the second return parameter The parameter determines each second expected reward parameter of each user under the action of the sample drug;
    基于所述第一回传参数、所述第二回传参数、所述各第一样本奖励参数、所述各第二样本奖励参数、所述各第一预期奖励参数以及所述各第二预期奖励参数确定所述各用户的样本数据对应的各损失值;Based on the first return parameters, the second return parameters, the first sample reward parameters, the second sample reward parameters, the first expected reward parameters, and the second The expected reward parameter determines each loss value corresponding to the sample data of each user;
    基于所述各损失值迭代更新所述第一模型参数的参数值和所述第二模型参数的参数值直至所述损失值不变,以获取基于任一用户的用户属性信息预测所述任一用户在各药物作用下的第一奖励参数和第二奖励参数的能力。Iteratively update the parameter value of the first model parameter and the parameter value of the second model parameter based on the loss values until the loss value remains unchanged, so as to obtain the prediction of any user based on the user attribute information of any user The ability of the user's first reward parameter and second reward parameter under the action of each drug.
  18. 根据权利要求17所述的计算机可读存储介质,其中,执行所述基于所述目标用户的各第一目标奖励参数和所述目标用户的各第二目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,包括:18. The computer-readable storage medium of claim 17, wherein the determining that the target user is in the target user based on each first target reward parameter of the target user and each second target reward parameter of the target user is performed. The reward parameters of each user under the action of each drug, including:
    确定所述第一目标奖励参数的第一加权系数和所述第二目标奖励参数的第二加权系数;determining a first weighting coefficient of the first target reward parameter and a second weighting coefficient of the second target reward parameter;
    基于所述第一加权系数和所述目标用户的各第一目标奖励参数确定所述各第一目标奖励参数对应的各第一加权奖励参数,并基于所述第二加权系数和所述目标用户的各第二目标奖励参数确定所述各第二目标奖励参数对应的各第二加权奖励参数;Each first weighted reward parameter corresponding to each of the first target reward parameters is determined based on the first weighting coefficient and each first target reward parameter of the target user, and based on the second weighting coefficient and the target user each of the second target reward parameters of the
    基于所述各第一加权奖励参数和所述各第二加权奖励参数确定所述目标用户在所述各药物作用下的各用户奖励参数,一个第一加权奖励参数和一个第二加权奖励参数对应一个用户奖励参数。Each user reward parameter of the target user under the action of each drug is determined based on each first weighted reward parameter and each second weighted reward parameter, and a first weighted reward parameter corresponds to a second weighted reward parameter A user reward parameter.
  19. 根据权利要求17所述的计算机可读存储介质,其中,执行所述基于所述目标用户的各第一目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,包括:The computer-readable storage medium according to claim 17, wherein performing the step of determining each user reward parameter of the target user under the action of each drug based on each first target reward parameter of the target user comprises: :
    将所述目标用户的各第一目标奖励参数确定为所述目标用户在所述各药物作用下的各用户奖励参数;Determining each first target reward parameter of the target user as each user reward parameter of the target user under the action of each drug;
    其中,所述最大用户奖励参数为所述各第一目标奖励参数中的最大第一目标奖励参数。Wherein, the maximum user reward parameter is the maximum first target reward parameter among the first target reward parameters.
  20. 根据权利要求17所述的计算机可读存储介质,其中,执行所述基于所述目标用户的各第二目标奖励参数,确定所述目标用户在所述各药物作用下的各用户奖励参数,包括:The computer-readable storage medium according to claim 17, wherein performing the step of determining each user reward parameter of the target user under the action of each drug based on each second target reward parameter of the target user, comprising: :
    将所述目标用户的各第二目标奖励参数确定为所述目标用户在所述各药物作用下的各用户奖励参数;Determining each second target reward parameter of the target user as each user reward parameter of the target user under the action of each drug;
    其中,所述最大用户奖励参数为所述各第二目标奖励参数中的最大第二目标奖励参数。Wherein, the maximum user reward parameter is the maximum second target reward parameter among the second target reward parameters.
PCT/CN2021/096712 2021-04-29 2021-05-28 Drug information pushing method and apparatus, computer device, and storage medium WO2022227176A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110473086.X 2021-04-29
CN202110473086.XA CN113076486B (en) 2021-04-29 2021-04-29 Drug information pushing method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022227176A1 true WO2022227176A1 (en) 2022-11-03

Family

ID=76616011

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096712 WO2022227176A1 (en) 2021-04-29 2021-05-28 Drug information pushing method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN113076486B (en)
WO (1) WO2022227176A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116779096B (en) * 2023-06-28 2024-04-16 南栖仙策(南京)高新技术有限公司 Medication policy determination method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160279329A1 (en) * 2013-11-07 2016-09-29 Impreal Innovations Limited System and method for drug delivery
CN110289068A (en) * 2019-06-20 2019-09-27 北京百度网讯科技有限公司 Drug recommended method and equipment
CN111666494A (en) * 2020-05-13 2020-09-15 平安科技(深圳)有限公司 Clustering decision model generation method, clustering processing method, device, equipment and medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112561554B (en) * 2019-09-26 2023-07-28 腾讯科技(深圳)有限公司 Method, device, server and storage medium for determining multimedia resources to be displayed
CN111144949A (en) * 2019-12-30 2020-05-12 北京每日优鲜电子商务有限公司 Reward data issuing method and device, computer equipment and storage medium
CN111933302B (en) * 2020-10-09 2021-01-05 平安科技(深圳)有限公司 Medicine recommendation method and device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160279329A1 (en) * 2013-11-07 2016-09-29 Impreal Innovations Limited System and method for drug delivery
CN110289068A (en) * 2019-06-20 2019-09-27 北京百度网讯科技有限公司 Drug recommended method and equipment
CN111666494A (en) * 2020-05-13 2020-09-15 平安科技(深圳)有限公司 Clustering decision model generation method, clustering processing method, device, equipment and medium

Also Published As

Publication number Publication date
CN113076486B (en) 2023-07-25
CN113076486A (en) 2021-07-06

Similar Documents

Publication Publication Date Title
US20220310267A1 (en) Evaluating Risk of a Patient Based on a Patient Registry and Performing Mitigating Actions Based on Risk
US11809819B2 (en) Automated form generation system
US10685089B2 (en) Modifying patient communications based on simulation of vendor communications
US10395330B2 (en) Evaluating vendor communications for accuracy and quality
US20170293722A1 (en) Insurance Evaluation Engine
US10922360B2 (en) Ancillary speech generation via query answering in knowledge graphs
US20180218127A1 (en) Generating a Knowledge Graph for Determining Patient Symptoms and Medical Recommendations Based on Medical Information
US20170286622A1 (en) Patient Risk Assessment Based on Machine Learning of Health Risks of Patient Population
US20180218126A1 (en) Determining Patient Symptoms and Medical Recommendations Based on Medical Information
CN115485690A (en) Batch technique for handling unbalanced training data of chat robots
US10528702B2 (en) Multi-modal communication with patients based on historical analysis
US20170220758A1 (en) Personalized Sequential Multi-Modal Patient Communication Based on Historical Analysis of Patient Information
WO2017163138A1 (en) Dynamic selection and sequencing of healthcare assessments for patients
US20170213005A1 (en) Variable List Based Caching of Patient Information for Evaluation of Patient Rules
CN111933302B (en) Medicine recommendation method and device, computer equipment and storage medium
CN105278945B (en) Program visualization device and program visualization method
US20240078171A1 (en) Techniques for model artifact validation
EP3996590A1 (en) System and method for online domain adaptation of models for hypoglycemia prediction in type 1 diabetes
WO2022227176A1 (en) Drug information pushing method and apparatus, computer device, and storage medium
US11610137B1 (en) Cognitive computing using a plurality of model structures
CN113535987B (en) Linkage rule matching method and related device
US20080103818A1 (en) Health-related data audit
CN114676176A (en) Time series prediction method, device, equipment and program product
WO2021189949A1 (en) Information recommendation method and apparatus, and electronic device, and medium
CN111967581B (en) Method, device, computer equipment and storage medium for interpreting grouping model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21938646

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21938646

Country of ref document: EP

Kind code of ref document: A1