CN116779150B

CN116779150B - Personalized medical decision method, device and application based on multi-agent interaction

Info

Publication number: CN116779150B
Application number: CN202310817181.6A
Authority: CN
Inventors: 傅亦婷; 张旷; 周华健; 邱瑛; 许振影; 赵宇飞
Original assignee: Zhejiang Yishan Intelligent Medical Research Co ltd
Current assignee: Zhejiang Yishan Intelligent Medical Research Co ltd
Priority date: 2023-07-03
Filing date: 2023-07-03
Publication date: 2023-12-22
Anticipated expiration: 2043-07-03
Also published as: CN116779150A

Abstract

The application provides a personalized medical decision method, a device and application based on multi-agent interaction, comprising the following steps: constructing a solid model, and acquiring patient information of at least one patient in different medical care stages and an actual medical care scheme of each patient; training the entity model by using patient information of different medical care stages through a Markov decision process to obtain an agent model corresponding to each medical care stage; and judging the contribution degree of each agent according to the Markov decision process, calculating the weight of each agent model according to the prediction result and the contribution degree of each agent model, and carrying out weighted average on the prediction result of each agent by using the corresponding weight to obtain the personalized medical decision. According to the scheme, the intelligent body models with multiple angles can be built according to the information of the patients with multiple angles, and decisions are made through different intelligent body models with multiple angles so as to give personalized medical decisions.

Description

Personalized medical decision method, device and application based on multi-agent interaction

Technical Field

The application relates to the field of intelligent medical treatment, in particular to a personalized medical decision method, a device and application based on multi-agent interaction.

Background

Today, medical diagnosis is too dependent on the judgment of the doctor's supervisor and clinical experience, is easily affected by individual differences and subjective factors, causes inconsistency of diagnosis and treatment results, and the subjectivity of the doctor may cause misdiagnosis or missed diagnosis, thereby affecting the treatment effect of patients.

Artificial intelligence is a branch of computer science, which is a new technical science for researching, developing theories, methods, techniques and applications of intelligence for simulating, extending, and expanding people, the application of artificial intelligence is generally divided into two parts, a training process and an reasoning process, the whole training process is defined through algorithms, including model structures and other processing details so that the model can be applied to various fields, such as medical treatment, law, finance and the like, and when clinical judgment is assisted by the artificial intelligence, only limited data sources, such as case records and laboratory examination results, rich information of multi-modal data is ignored, which limits the comprehensiveness and accuracy of clinical prediction by using the artificial intelligence, and the whole situation and pathological characteristics of a patient cannot be captured, thereby causing inaccuracy of diagnosis and treatment.

When the diagnosis is performed by using artificial intelligence, the diagnosis is often performed only from a doctor's perspective, and in most cases, the alleviation and rehabilitation of the symptoms of the patient are often determined by a plurality of angles, such as family members, diet, psychological conditions, nursing conditions, rehabilitation conditions and the like, the existing artificial intelligence can only judge single diseases from the doctor's perspective and can not perform the diagnosis according to the related factors, so that an entire treatment scheme is provided, and the doctor can only provide comments through subjective judgment, so that the treatment effect of the diseases and the rehabilitation of the patient are limited.

In view of the foregoing, there is a need for a method that can generate a personalized treatment regimen for each patient from multiple angles, thereby improving the patient's therapeutic effect and speeding up the patient's recovery.

Disclosure of Invention

The embodiment of the application provides a data warehousing method, a data warehousing device, a computer program product and a computer program, which can realize the purposes of automatic template matching, automatic table building and automatic warehousing aiming at the table data with changeable, complex and similar structures at present, and realize quick, timely and accurate warehousing of stream adjustment data.

In a first aspect, embodiments of the present application provide a personalized medical decision method based on multi-agent interactions, the method comprising:

constructing a solid model, and acquiring patient information of at least one patient in different medical care stages and an actual medical care scheme of each patient, wherein the actual medical care scheme comprises medical care means of each medical care stage before complete rehabilitation of the patient;

training the entity model by using patient information of different medical care stages through a Markov decision process to obtain an agent model corresponding to each medical care stage, obtaining a patient state through the patient information as a state space of the Markov decision process, using all available medical care means as an action space of the Markov decision process, judging whether predicted medical care means obtained by prediction of the entity model are consistent with actual medical care means of the same medical care stage in the training process, rewarding the Markov decision process if the predicted medical care means are consistent with the actual medical care means of the same medical care stage, punishing the Markov decision process otherwise, iterating the process to obtain the agent model, and predicting the medical care means of the corresponding medical care stage by the agent model;

and judging the contribution degree of each agent according to the Markov decision process, calculating the weight of each agent model according to the prediction result and the contribution degree of each agent model, and carrying out weighted average on the prediction result of each agent by using the corresponding weight to obtain the personalized medical decision.

In a second aspect, embodiments of the present application provide a personalized medical decision-making device based on multi-agent interactions, including:

the construction module comprises: constructing a solid model, and acquiring patient information of at least one patient in different medical care stages and an actual medical care scheme of each patient, wherein the actual medical care scheme comprises medical care means of each medical care stage before complete rehabilitation of the patient;

and (3) an iteration module: training the entity model by using patient information of different medical care stages through a Markov decision process to obtain an agent model corresponding to each medical care stage, obtaining a patient state through the patient information as a state space of the Markov decision process, using all available medical care means as an action space of the Markov decision process, judging whether predicted medical care means obtained by prediction of the entity model are consistent with actual medical care means of the same medical care stage in the training process, rewarding the Markov decision process if the predicted medical care means are consistent with the actual medical care means of the same medical care stage, punishing the Markov decision process otherwise, iterating the process to obtain the agent model, and predicting the medical care means of the corresponding medical care stage by the agent model;

decision module: and judging the contribution degree of each agent according to the Markov decision process, calculating the weight of each agent model according to the prediction result and the contribution degree of each agent model, and carrying out weighted average on the prediction result of each agent by using the corresponding weight to obtain the personalized medical decision.

In a third aspect, embodiments of the present application provide an electronic device comprising a memory having a computer program stored therein and a processor configured to run the computer program to perform a personalized medical decision method based on multi-agent interactions.

In a fourth aspect, embodiments of the present application provide a readable storage medium having a computer program stored therein, the computer program comprising program code for controlling a process to perform a process comprising a personalized medical decision method based on multi-agent interactions.

The main contributions and innovation points of the invention are as follows:

according to the embodiment of the application, a plurality of different intelligent agents are built through multi-modal data of a patient to perform interdisciplinary analysis, the entity model is trained through a Markov decision process to obtain the intelligent agent model, the medical care scheme is predicted through the intelligent agent model, and corresponding rewards or punishments are performed on the Markov decision process by acquiring whether different medical care means are necessary and effective or not in the training process, so that the result of intelligent agent prediction is more accurate, and personalized medical decisions are given according to different conditions of each patient; the scheme fully utilizes a plurality of intelligent bodies to provide more reliable support and guidance for medical staff, thereby better improving the treatment effect and health condition of patients.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of a personalized medical decision method based on multi-agent interactions according to an embodiment of the present application;

FIG. 2 is a block diagram of a personalized medical decision-making device based on multi-agent interactions according to an embodiment of the present application;

fig. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.

It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.

Example 1

The embodiment of the application provides a personalized medical decision method based on multi-agent interaction, and specifically referring to fig. 1, the method comprises the following steps:

In some embodiments, the different stages of care refer to all processes of a patient from suffering to complete recovery, the suffering information including patient personal information including age, sex, medical history, physical signs, eating habits, disease information including disease name, etiology, and symptom information including clinical response, symptoms.

In some embodiments, the healthcare stage is divided into a medical stage, a nursing stage and a rehabilitation stage, each stage corresponding to an agent model to predict the healthcare means of the stage.

In some embodiments, in the step of acquiring a patient state through patient information as a state space of a markov decision process, medical entities are defined according to the patient information, wherein the medical entities include a patient entity, a disease entity and a symptom entity, attributes of corresponding medical entities are determined according to the patient information, and association relations of the medical entities with the attributes are established as the patient state.

Illustratively, age (P) represents the age of patient P in the disease entity, name (D) represents the disease name of disease D in the disease entity, and description (S) represents the symptoms of disease S in the symptom entity.

Specifically, in the medical field, there may be very complex association relationships, for example, there may be a diagnostic relationship between a patient entity and a disease entity, there may be an association between a disease entity and a symptom entity, and the markov decision process may have higher accuracy by establishing the association relationship.

Specifically, the association relationship may be represented by constructing an association table, for example:

diagnostis (P, D) indicates that patient P is diagnosed as disease D;

hasSymptom (D, S) indicates that disease D has symptoms S.

In some embodiments, the Markov decision process is a quadruple in which the value of each action within the action space is measured by defining a Q-value function, and the final decision is made based on the value to arrive at a predictive healthcare measure.

Specifically, the quadruple comprises a state space S containing all states that an agent may encounter, an action space a containing all possible actions that the agent may take, a state transition probability P (S '|s, a) representing a probability distribution of taking action a in state S and transitioning to another state S', and a reward function R (S, a, S ') representing a reward obtained after taking action a in state S and transitioning to S'.

Specifically, the prediction result with the highest value is selected as the prediction medical care means for the prediction. That is, the more valuable the predictive healthcare approach is to the actual healthcare approach.

In particular, the predictive healthcare measure may be a specific behavior or operation, e.g., a patient may receive a certain treatment regimen or a certain medical test, etc.

Illustratively, treatment (P, T) indicates that patient P receives treatment T.

Specifically, the medical examination can be X-ray, CT, color ultrasound, etc., and the treatment method can be administration, intravenous injection, etc.

In some embodiments, in the step of "judging whether the predicted medical means predicted by the entity model in the training process accords with the actual medical means in the same medical stage, if so, rewarding the markov decision process, otherwise punishing the markov decision process, iterating the above process to obtain the intelligent agent model", the predicted medical means is various medical methods, nursing methods or rehabilitation methods, if the predicted medical means in the same medical stage completely accords with the actual medical means, rewarding the markov decision process with positive values, and if the predicted medical means in the same stage does not accord with the actual medical means, rewarding the markov decision process with negative values.

Specifically, the actual medical regimen includes a series of medical test items, treatment methods, care methods, rehabilitation methods, and the like.

For example, if there is a patient with a stomach illness, the predicted care plan obtained through the agent model in the medical stage is medication and gastroscope, if the actual care plan is also medication and gastroscope, the markov decision process in the stage is rewarded with positive value, if the actual care plan is intravenous injection and enteroscope, the markov decision process in the stage is rewarded with negative value, and the medical stage is one of a plurality of medical stages.

In some embodiments, if the predictive measure of the unified stage is inconsistent with the actual healthcare measure, and the inconsistent portion is necessary and valid in the subsequent healthcare stage, a positive reward is made to the markov decision process, and if the inconsistent portion is unnecessary and invalid in the subsequent healthcare stage, a negative reward is made to the markov decision process.

Specifically, since the medical test items in the predictive medical means do not exist in the actual medical regimen, it is necessary to determine whether or not the medical test items included in the predictive medical means are valid, and thereby whether or not the medical test items in the predictive medical means are correct.

Illustratively, if the predictive healthcare means suggests that the X-ray is necessary and effective in the subsequent treatment, it is indicated that the predictive healthcare means is judged to be correct, a positive reward is made to the medical markov decision process, and if the predictive healthcare means is found to be unnecessary and ineffective in the subsequent treatment, it is indicated that the predictive healthcare means is judged to be incorrect, a negative reward is made to the markov decision process.

Illustratively, if acupuncture is recommended in the predictive care and found to be necessary and effective in subsequent treatments, positive rewards are applied to the markov decision process, and if acupuncture is found to be unnecessary and ineffective in subsequent treatments, negative rewards are applied to the markov decision process.

In some embodiments, the risk and side effects of the predictive healthcare approach are determined, and if the risk value is above a set threshold or there is a side effect, the medical markov decision process is penalized, and otherwise rewarded.

Specifically, if a part of medical care means is accompanied with a certain risk and side effect, if the medical care means is predicted to select the medical care means with high risk or side effect, punishment is carried out on the medical care means, so that the medical intelligent body selects the treatment mode which is more beneficial to the patient as much as possible.

Specifically, the Markov decision process is rewarded and punished in the mode, so that an intelligent agent model obtained after multiple iterations can be ensured to obtain an optimal prediction medical care means, and the optimal prediction medical care means has the lowest risk value and the smallest side effect on the premise of being overlapped with the actual medical care means as much as possible.

In some embodiments, the process of assigning rewards or penalties to the markov decision process may be expressed as R (s, a) =

The +r correction if predictive healthcare measure is the same as the actual healthcare measure;

-r incorrect if predictive healthcare is different from actual healthcare;

the non-conforming portion of +r necessary if is necessary in the subsequent healthcare phase;

the non-conforming portion of r UNNECESSANYI is unnecessary in the subsequent stages of care;

the portion of the +r effective if discrepancy is valid in the subsequent healthcare phase;

-the non-conforming portion of r ineffectiveif is ineffective in subsequent stages of healthcare;

+/-r risk rewarding or punishing according to risk and side effects;

where s represents the state of the agent model, a represents the action taken by the agent model, and R (s, a) represents the reward that the agent model takes action a in state s.

In some embodiments, the markov decision process is iterated as follows:

1. initializing policy network parameters θ of a medical markov decision process:

θ＝initial_parameters

2. initializing a value function network parameter w:

w＝initial_parameters

3. initialization state s:

s＝initial_state

4. iterating the following steps

4.1 selecting an action a according to the current state s using the policy network parameter θ:

a～π(a|s；θ)

4.2 performing action a, observing the rewards r and new state s' returned by the environment:

4.3 estimating the value function of the next state from the value function network parameter w:

V(s')＝φ(s')^T w

4.4 update value function network parameters w:

4.5 updating policy network parameter θ:

4.6 updating the state to a new state:

s＝s'

wherein θ is a parameter of the policy network, w is a parameter of the value function network, pi (a|s; θ) is an action probability distribution generated according to the policy network θ, α_θ and α_w are learning rates for controlling weights updated each time, γ is a discount factor for measuring importance of future rewards, V(s) is a value function estimation representing a value of the state s, and Q (s, a; w) is a Q value estimation of the state-action pair (s, a) based on the value function network parameter w.

The function of step 4.1 is to select an action a, i.e. a-pi (a|s; θ), from the current state s using the policy network parameters, and to sample an action from the action space according to the probability distribution.

The function of step 4.4 is to estimate V (s') and the discount prize r from the value function, update the value function network parameters w to optimize the accuracy of the value function.

The function of step 4.5 is to update the policy network parameters θ using the value function estimates Q (s, a; w) to optimize the performance of the policy network according to the policy gradient approach.

Specifically, the intelligent agent model of different medical care stages can be constructed according to different patient information.

In some embodiments, in the "determine contribution of each agent according to the Markov decision process" step, the contribution of each agent is obtained according to the probability of a state transition of each agent in the Markov decision process.

Specifically, the profit situation of each agent when being combined with other agents is obtained by running simulation or evaluating by using the existing data, and the contribution degree of the agent is obtained by calculating the average value of the profit situation of the agent in each combination situation.

Specifically, the contribution of each agent can be expressed by the following formula:

where N represents the total number of agents, S represents a subset of agents (excluding agent i), Q (S) represents the benefit of agent subset S, the expression sums up the combination of agent i with other agents, calculates the contribution of agent i in each combination, and averages it.

Specifically, for a set of agent partners to perform diagnostic tasks, Q (S) may represent the diagnostic accuracy of the agent partner subset S. For example, confusion matrices or other metrics may be used to measure the performance of agent collaboration in terms of diagnostic accuracy.

For a set of agent cooperatives to perform a treatment plan selection task, Q (S) may represent the effect of the treatment plan selected by the agent cooperation subset S. For example, clinical metrics, patient feedback, or other relevant metrics may be used to evaluate the performance of the agent's collaboration in terms of therapeutic effects.

For a set of agent cooperatives to perform a resource allocation task, Q (S) may represent the cost effectiveness of the agent cooperation subset S. For example, the performance of an agent collaboration in terms of cost effectiveness may be evaluated in view of a balance between the resource investment and the benefits obtained by the agent collaboration.

In some embodiments, in the step of calculating the weight of each agent model according to the prediction result and contribution degree of each agent model, the contribution degree and the prediction result of each agent model are used to calculate the saproli value of each agent, and then the weight of each agent model is obtained according to the saproli value of each agent. Specifically, the equation for calculating the saprolimus value for each agent model is as follows:

Φ[i]＝Φ[i]+ρ(i)*(Q(s,a；w)-V(s))

wherein ρ (i) is the contribution degree of the agent model i, Q (s, a; w) is the Q value estimation of the state-action pair (s, a) based on the value function network parameter w, and V(s) is the value function estimation representing the state s.

Specifically, the formula for calculating the weight of each agent model using the saprolimus values is as follows:

w_i=Φ [ i ]/Σ (Φ [ j ]) (where j=1 to N)

Where Σ (Φj) represents the sum of all agent saproli values, Φi represents the saproli value of current agent i, and w_i represents the weight of current agent i.

In some embodiments, the formula for weighting the corresponding decision results using the weights of each agent model to arrive at a personalized medical decision is as follows:

d=Σ (w_i×d_i) (where i=1 to N)

Wherein i is the number of different agent models, d is the personalized medical decision, the personalized medical decision is the decision result of a plurality of agents, and the personalized medical decision comprehensively considers the contribution and the weight of the agents so as to provide a more accurate and personalized scheme.

Example two

Based on the same conception, referring to fig. 2, the application further provides a personalized medical decision-making device based on multi-agent interaction, which comprises:

Example III

This embodiment also provides an electronic device, referring to fig. 3, comprising a memory 404 and a processor 402, the memory 404 having stored therein a computer program, the processor 402 being arranged to run the computer program to perform the steps of any of the method embodiments described above.

In particular, the processor 402 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.

The memory 404 may include, among other things, mass storage 404 for data or instructions. By way of example, and not limitation, memory 404 may comprise a Hard Disk Drive (HDD), floppy disk drive, solid State Drive (SSD), flash memory, optical disk, magneto-optical disk, tape, or Universal Serial Bus (USB) drive, or a combination of two or more of these. Memory 404 may include removable or non-removable (or fixed) media, where appropriate. Memory 404 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 404 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 404 includes Read-only memory (ROM) and Random Access Memory (RAM). Where appropriate, the ROM may be a mask-programmed ROM, a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), an electrically rewritable ROM (EAROM) or FLASH memory (FLASH) or a combination of two or more of these. The RAM may be Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM) where appropriate, and the DRAM may be fast page mode dynamic random access memory 404 (FPMDRAM), extended Data Output Dynamic Random Access Memory (EDODRAM), synchronous Dynamic Random Access Memory (SDRAM), or the like.

Memory 404 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions for execution by processor 402.

The processor 402 reads and executes the computer program instructions stored in the memory 404 to implement any of the personalized medical decision methods based on multi-agent interactions of the above embodiments.

Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402 and the input/output device 408 is connected to the processor 402.

The transmission device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wired or wireless network provided by a communication provider of the electronic device. In one example, the transmission device includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through the base station to communicate with the internet. In one example, the transmission device 406 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.

The input-output device 408 is used to input or output information. In this embodiment, the input information may be patient disease information, and the output information may be prediction medical hand of different agent models.

Alternatively, in the present embodiment, the above-mentioned processor 402 may be configured to execute the following steps by a computer program:

s101, constructing a solid model, and acquiring patient information of at least one patient in different medical care stages and an actual medical care scheme of each patient, wherein the actual medical care scheme comprises medical care means of each medical care stage of the patient before complete rehabilitation;

s102, training the entity model by using patient information of different medical care stages through a Markov decision process to obtain an agent model corresponding to each medical care stage, acquiring the patient state through the patient information as a state space of the Markov decision process, using all medical care means which can be taken as an action space of the Markov decision process, judging whether predicted medical care means obtained by predicting the entity model accords with actual medical care means of the same medical care stage in the training process, rewarding the Markov decision process if so, otherwise punishing the Markov decision process, iterating the process to obtain the agent model, and predicting the medical care means of the corresponding medical care stage;

s103, judging the contribution degree of each agent according to the Markov decision process, calculating the weight of each agent model according to the prediction result and the contribution degree of each agent model, and carrying out weighted average on the prediction result of each agent by using the corresponding weight to obtain the personalized medical decision.

It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and this embodiment is not repeated herein.

In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

Embodiments of the invention may be implemented by computer software executable by a data processor of a mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets, and/or macros can be stored in any apparatus-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may include one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. In this regard, it should also be noted that any block of the logic flow as in fig. 3 may represent a procedure step, or interconnected logic circuits, blocks and functions, or a combination of procedure steps and logic circuits, blocks and functions. The software may be stored on a physical medium such as a memory chip or memory block implemented within a processor, a magnetic medium such as a hard disk or floppy disk, and an optical medium such as, for example, a DVD and its data variants, a CD, etc. The physical medium is a non-transitory medium.

It should be understood by those skilled in the art that the technical features of the above embodiments may be combined in any manner, and for brevity, all of the possible combinations of the technical features of the above embodiments are not described, however, they should be considered as being within the scope of the description provided herein, as long as there is no contradiction between the combinations of the technical features.

The foregoing examples merely represent several embodiments of the present application, the description of which is more specific and detailed and which should not be construed as limiting the scope of the present application in any way. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A personalized medical decision-making method based on multi-agent interaction is characterized by comprising the following steps:

training the entity model by using patient information of different medical care stages through a Markov decision process to obtain an intelligent agent model corresponding to each medical care stage, obtaining the patient state through the patient information as a state space of the Markov decision process, defining a medical entity according to the patient information, wherein the medical entity comprises a patient entity, a disease entity and a symptom entity, determining the attribute of the corresponding medical entity through the patient information, establishing the association relationship of the medical entity with the attribute as the patient state, using all medical care means which can be taken as an action space of the Markov decision process, judging whether the predicted medical care means predicted by the entity model accords with the actual medical care means of the same medical care stage in the training process, rewarding the Markov decision process if so, otherwise punishment is carried out on the Markov decision process, the process is iterated to obtain an agent model, the agent model is used for predicting medical facilities of corresponding medical care stages, the predicted medical facilities are various medical care methods, nursing methods or rehabilitation methods, if the predicted medical facilities of the same medical care stage are completely consistent with the actual medical facilities, positive value rewards are carried out on the Markov decision process, if the predicted medical facilities of the same stage are inconsistent with the actual medical facilities, negative value rewards are carried out on the Markov decision process, if the predicted medical facilities of the same stage are inconsistent with the actual medical facilities, and inconsistent parts are necessary and effective in the subsequent medical care stages, positive value rewards are carried out on the Markov decision process, if the inconsistent parts are unnecessary and ineffective in the subsequent medical care stages, negative rewards are made to the markov decision process;

judging the contribution degree of each agent according to a Markov decision process, calculating the weight of each agent model according to the prediction result and the contribution degree of each agent model, wherein the contribution degree of each agent is obtained according to the state transition probability of each agent in the Markov decision process, the contribution degree and the prediction result of each agent model are used for calculating the saprolimus value of each agent, the weight of each agent model is obtained according to the saprolimus value of each agent, and the prediction result of each agent is subjected to weighted average by using the corresponding weight to obtain personalized medical decisions.

2. The personalized medical decision method based on multi-agent interactions of claim 1, wherein the markov decision process is a quadruple, wherein the value of each action in the action space is measured by defining a Q-function in the quadruple, and the final decision is made based on the value to obtain the predictive medical measure.

3. The personalized medical decision method based on multi-agent interactions according to claim 1, wherein the risk and side effects of the predictive medical means are determined, and if the risk value is higher than a set threshold or there is a side effect, the medical markov decision process is punished, and otherwise rewarded.

4. A personalized medical decision-making device based on multi-agent interactions, comprising:

and (3) an iteration module: training the entity model by using patient information of different medical care stages through a Markov decision process to obtain an intelligent agent model corresponding to each medical care stage, obtaining the patient state through the patient information as a state space of the Markov decision process, defining a medical entity according to the patient information, wherein the medical entity comprises a patient entity, a disease entity and a symptom entity, determining the attribute of the corresponding medical entity through the patient information, establishing the association relationship of the medical entity with the attribute as the patient state, using all medical care means which can be taken as an action space of the Markov decision process, judging whether the predicted medical care means predicted by the entity model accords with the actual medical care means of the same medical care stage in the training process, rewarding the Markov decision process if so, otherwise punishment is carried out on the Markov decision process, the process is iterated to obtain an agent model, the agent model is used for predicting medical facilities of corresponding medical care stages, the predicted medical facilities are various medical care methods, nursing methods or rehabilitation methods, if the predicted medical facilities of the same medical care stage are completely consistent with the actual medical facilities, positive value rewards are carried out on the Markov decision process, if the predicted medical facilities of the same stage are inconsistent with the actual medical facilities, negative value rewards are carried out on the Markov decision process, if the predicted medical facilities of the same stage are inconsistent with the actual medical facilities, and inconsistent parts are necessary and effective in the subsequent medical care stages, positive value rewards are carried out on the Markov decision process, if the inconsistent parts are unnecessary and ineffective in the subsequent medical care stages, negative rewards are made to the markov decision process;

decision module: judging the contribution degree of each agent according to a Markov decision process, calculating the weight of each agent model according to the prediction result and the contribution degree of each agent model, wherein the contribution degree of each agent is obtained according to the state transition probability of each agent in the Markov decision process, the contribution degree and the prediction result of each agent model are used for calculating the saprolimus value of each agent, the weight of each agent model is obtained according to the saprolimus value of each agent, and the prediction result of each agent is subjected to weighted average by using the corresponding weight to obtain personalized medical decisions.

5. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform a personalized medical decision method based on multi-agent interactions as claimed in any of claims 1-3.

6. A readable storage medium, characterized in that the readable storage medium has stored therein a computer program comprising program code for controlling a process to perform a process comprising a personalized medical decision method based on multi-agent interactions according to any of claims 1-3.