CN112349409A

CN112349409A - Disease type prediction method, device, equipment and system

Info

Publication number: CN112349409A
Application number: CN202011135075.2A
Authority: CN
Inventors: 魏忠钰
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2021-02-09

Abstract

The application discloses a disease type prediction method, a device, equipment and a system, wherein the method comprises the following steps: receiving a disease type prediction request; according to the disease type prediction request, carrying out symptom inquiry by using a symptom data set in a reinforcement learning disease type determination model, and collecting symptom information, wherein the reinforcement learning disease type determination model comprises a plurality of symptom data sets which are classified based on department classification in a medical system, and different symptom data sets comprise symptom databases of different medical departments; and according to the acquired symptom information, determining a disease type prediction result corresponding to the disease type prediction request by using a disease type classifier in the reinforcement learning disease type determination model. The invention comprises a plurality of symptom data sets classified based on department classification in a medical system, can predict various disease types, and has wider disease type prediction and stronger adaptability.

Description

Disease type prediction method, device, equipment and system

Technical Field

The present application relates to the field of data analysis and prediction, and in particular, to a method, an apparatus, a device, and a system for predicting disease types.

Background

With the development of Electronic Health Records (EHRs) systems, researchers have explored different machine learning approaches for automated diagnosis. The concept of computer-assisted medical systems has emerged in recent years to facilitate patient self-diagnosis. The computer-assisted medical system may request that the patient provide some information and then attempt to diagnose the underlying disease based on interaction with the patient. Although impressive efforts have been made in the identification of portions of diseases, these records are labor intensive due to the overreliance on sophisticated electronic health records.

Nowadays, the prediction of disease types by learning-only models using computer technology to assist physicians' work is becoming more and more important in medical work. However, it is often difficult to convert a supervised model for one disease into another, and therefore, each disease requires training one model, which is inefficient and costly. Therefore, how to provide a solution that can be applied to the prediction of various disease types to improve efficiency and reduce cost is a technical problem that needs to be solved in the field.

Disclosure of Invention

An object of the embodiments of the present specification is to provide a disease type prediction method, apparatus, device, and system, which implement prediction processing for automatically diagnosing multiple diseases by using a model, improve data processing efficiency, and reduce model construction cost.

In one aspect, the present invention provides a method for predicting a disease type, the method comprising:

receiving a disease type prediction request;

according to the disease type prediction request, carrying out symptom inquiry by using a symptom data set in a reinforcement learning disease type determination model, and collecting symptom information, wherein the reinforcement learning disease type determination model comprises a plurality of symptom data sets which are classified based on department classification in a medical system, and different symptom data sets comprise symptom databases of different medical departments;

and according to the acquired symptom information, determining a disease type prediction result corresponding to the disease type prediction request by using a disease type classifier in the reinforcement learning disease type determination model.

Further, the using the symptom data set in the reinforcement learning disease type determination model to perform symptom query and collect symptom information includes:

sequentially taking each symptom data set in the reinforcement learning disease type determination model as a target symptom data set, and initiating a symptom inquiry about whether symptom characteristics in a symptom database in the target symptom data set exist or not according to the symptom database in the target symptom data set;

and receiving symptom inquiry results, and acquiring the symptom information according to the symptom inquiry results acquired by each target symptom data set.

Further, initiating a symptom query according to the symptom database in the target symptom data set, including:

and if the number of times of conversation of initiating the symptom inquiry according to the symptom characteristics in the target symptom data set reaches a preset upper limit of the number of times of conversation of the symptom inquiry, or the result of the symptom inquiry meets a preset inquiry stopping condition, stopping the symptom inquiry, and acquiring the next symptom data set in the reinforcement learning disease type determination model as a new target symptom data set.

Further, the training method of the reinforcement learning disease determination model comprises the following steps:

collecting sample data, wherein the sample data comprises a sample symptom data set and a disease type corresponding to the sample symptom data set;

inputting the sample symptom dataset into the reinforcement learning disease type determination model, and training the reinforcement learning disease type determination model in the following way: sequentially utilizing the symptom data set in the reinforcement learning disease type determination model to inquire the symptom of the sample symptom data set and acquire symptom information; if the symptom inquiry result of the symptom data set is yes, setting the reward value of the symptom data set as a positive preset internal reward value, if the symptom inquiry result of the symptom data set is no, setting the reward value of the symptom data set as a negative preset internal reward value, and if the symptom inquiry result of the symptom data set is empty, setting the reward value of the symptom data set as 0;

inputting the acquired symptom information into a disease type classifier in the reinforcement learning disease type determination model, and predicting a predicted disease type corresponding to the sample symptom data set by using the disease type classifier.

Further, the method further comprises:

and comparing the predicted disease type corresponding to the sample symptom data set predicted by the disease type classifier with the disease type corresponding to the sample symptom data set, setting the reward value of the reinforcement learning disease type as a positive preset external reward value if the predicted disease type is the same as the disease type corresponding to the sample symptom data set, and setting the reward value of the reinforcement learning disease type as a negative preset external reward value if the predicted disease type is different from the disease type corresponding to the sample symptom data set.

Further, the sample data comprises:

a real case dataset and a synthetic dataset for a hospital system.

Further, the symptom data set in the disease type determination model is used for carrying out symptom inquiry, and symptom information is collected, wherein the symptom information comprises:

and according to the disease type prediction request, a target symptom data set is decided by utilizing the reinforcement learning disease type determination model, and symptom inquiry is carried out by utilizing the target symptom data set to acquire symptom information.

In another aspect, the present invention provides a disease type prediction apparatus, including:

a request receiving module for receiving a disease type prediction request;

the symptom acquisition module is used for inquiring symptoms by using a symptom data set in a reinforcement learning disease type determination model according to the disease type prediction request and acquiring symptom information, wherein the reinforcement learning disease type determination model comprises a plurality of symptom data sets which are classified based on department classification in a medical system, and different symptom data sets comprise symptom databases of different medical departments;

and the disease type prediction module is used for determining a disease type prediction result corresponding to the disease type prediction request by using a disease type classifier in the reinforcement learning disease type determination model according to the acquired symptom information.

In another aspect, the present invention provides a disease type prediction processing apparatus including: at least one processor and a memory for storing processor-executable instructions, the instructions when executed by the processor implementing the method of any of the above.

In yet another aspect, the present invention provides a disease type prediction system, comprising a controller, a symptom query module, and a disease type determination module, wherein the symptom query module comprises a plurality of symptom data sets, the plurality of symptom data sets are classified based on department classifications in a medical system, and different symptom data sets comprise symptom databases of different medical departments;

the controller is used for receiving a disease type prediction request sent by a user simulator or a user terminal, and calling the symptom inquiry module to inquire symptoms with the user simulator based on a symptom database in the symptom data set;

the user simulator or the user terminal returns a symptom inquiry result according to the symptom inquiry initiated by the symptom inquiry module;

the symptom inquiry module acquires symptom information according to the symptom inquiry result and returns the acquired symptom information to the controller;

and after receiving the symptom information, the controller calls the disease type determining module, and the disease type determining module determines a disease type prediction result corresponding to the disease type prediction request according to the symptom information.

The disease type prediction method, device, equipment and system provided by the embodiment of the application have the following technical effects:

the disease type prediction method, the device, the equipment and the system provided by the disclosure can select a specific symptom data set according to a disease type prediction request provided by a user (such as initial symptoms of the user and symptom responses for symptom inquiry actions) so as to recommend the user to take further symptom inquiries, collect symptom information according to the responses of the symptom inquiries and further determine a final disease type prediction result according to the collected symptom information, so that the accuracy of result prediction can be improved.

And, provide the abnormal reward of detection result in the training phase in order to encourage the reinforcement learning agent to choose the symptom inquiry action, therefore reduce the scope of action of disease type prediction, in order to improve the efficiency that disease type prediction recommends the symptom inquiry action and help improving the accuracy of disease prediction.

Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a disease type prediction method provided in an embodiment of the present application;

FIG. 2 is a flow chart of a method for building a reinforcement learning disease type determination model according to an embodiment of the present application;

FIG. 3 is a graphical illustration of the symptomatic distribution of a disease as provided by an embodiment of the present application;

FIG. 4 is a frame diagram of a reinforcement learning disease type determination model provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a diagnosis process of a reinforcement learning disease type determination model according to an embodiment of the present application;

FIG. 6 is a diagram of an error analysis of a reinforcement learning disease type determination model for classifying different groups of diseases according to an embodiment of the present application;

FIG. 7 is a flow chart of another reinforcement learning disease type determination model building method provided by the embodiments of the present application;

FIG. 8 is a schematic view of a disease type prediction device provided in an embodiment of the present application;

fig. 9 is a block diagram of a hardware structure of a server of a disease type prediction method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the era of information society, each traditional field is impacted by emerging technologies, and the technologies of machine learning and artificial intelligence have milestone breakthroughs in each field. In the field of weiqi, AlphaGo (alpha, which is an artificial intelligence program for weiqi) based on a reinforcement learning model defeats the world's high hands of human weiqi, a field in which weiqi was once considered impossible to subvert, has been broken down under the impact of emerging technologies. The intelligent inquiry is a thing which is considered impossible originally, and with the coming of big data and artificial intelligence times, more and more large-scale companies and research institutions begin to enter the fields of military internet medical treatment and intelligent inquiry. IBM Watson robot computing was the first company to enter the field of intelligent interrogation. At present, all large companies have products in the internet medical field, but the intelligent inquiry and diagnosis field, particularly the aspect of simulating the inquiry of doctors and the natural interaction of an intelligent inquiry system and patients, is still unsatisfactory.

Medical intelligent inquiry differs from disease prediction, which is not only based on the symptoms of a known patient, but also based on the patient's chief complaints, analyses and simulates a doctor to ask for possible symptoms, and then based on the interactive response to the disease, simulates inquiry again until the correct disease is finally given. At present, it is not a difficult matter to know typical symptoms and let the intelligent inquiry system predict the corresponding diseases. However, it is very difficult, but very important, in the whole field of intelligent medical treatment to make the system ask the patient step by step for the patient's chief complaints and simulate the inquiry process to finally obtain the correct disease diagnosis result.

An ideal intelligent inquiry method should ensure that the inquiry logic is reasonable and consistent with the professional doctors, and should avoid excessive inquiry rounds. In addition, the intelligent interrogation system can give accurate disease diagnosis based on the interrogation flow sequence and the interactive interrogation of the patient. However, these are often not easily available at the same time.

Therefore, the invention provides a disease type prediction method, a device and a system according to a large amount of simulated medical inquiry flow dialogue data and inquiry dialogue data analyzed from medical documents and medical related cases. With the breakthrough of AlphaGo in the field of go, more and more fields use reinforcement learning (Reinforcement learning) based models and methods, and especially in view of the field of human-computer interaction, reinforcement learning has its natural advantages. Considering the problem of medical intelligent inquiry, the inquiry logic should simulate doctors as much as possible, and meanwhile, considering the interactivity of patients and the intelligent inquiry system, the reinforcement learning model can enable the system to learn the optimal inquiry logic and accurately give the diagnosis result. Different from the traditional method, the invention simultaneously considers the inquiry logic and the accuracy of disease diagnosis, converts the intelligent inquiry into the training of a reinforced model and optimizes the model. The intelligent inquiry method provided by the embodiment of the invention can be applied to a plurality of scenes, including but not limited to: 1) medical intelligent auxiliary inquiry; 2) medical intelligent triage and guide; 3) medical intelligent inquiry diagnosis and solution scheme.

A disease type prediction method, apparatus, and system according to embodiments of the present invention will be described with reference to the accompanying drawings.

Fig. 1 is a flowchart of a disease type prediction method according to an embodiment of the present application, and referring to fig. 1, the disease type prediction method according to the embodiment includes the following steps:

s102, a disease type prediction request is received.

In a specific implementation, the disease type prediction request characterizes the requirement of the disease type that the user needs to verify, and the disease type prediction request can be the initial symptom in the disease type input by the user. The disease type prediction request may be transmitted in the form of text, natural language, or code.

The disease type prediction method in the embodiments of the present specification can be applied to a data processing system such as: in the distributed system, a user inputs a disease type prediction request into a data processing system in a text or natural language form, and the data processing system converts the disease type prediction request into data information which can be analyzed by the data processing system. It will be appreciated that the processing of the disease type prediction request by the data processing system may be natural language extraction of keywords or the like.

The disease type prediction request may include one disease type prediction request or a plurality of disease type prediction requests, which may be specifically set according to actual needs, and the embodiments of the present specification are not specifically limited.

And S104, according to the disease type prediction request, carrying out symptom inquiry by using a symptom data set in the reinforcement learning disease type determination model, and collecting symptom information, wherein the reinforcement learning disease type determination model comprises a plurality of symptom data sets, the plurality of symptom data sets are classified based on department classification in the medical system, and different symptom data sets comprise symptom databases of different medical departments.

In a specific implementation, since each disease has a set of corresponding symptoms, the overlap between different symptom data sets is limited, that is, a person with the disease usually has some corresponding symptoms at the same time, and fig. 3 is a distribution chart of a disease on symptoms provided by the embodiment of the present application, and as shown in fig. 3, the correlation between the disease and the symptoms can be seen. Wherein the horizontal axis represents symptoms and the vertical axis represents the proportion of symptoms. Therefore, the diseases are classified into different categories according to the settings of hospital departments, and a hierarchical structure for symptom information collection and disease type determination is designed. The reinforcement learning disease type determination model is established by adopting a grade reinforcement learning strategy, and two levels, namely a main level (controller) and an execution level, are arranged in the model, wherein the main level is used for collecting symptom information and activating corresponding execution components according to disease type prediction requests. Each of the executive components may be classified based on department classification in the medical system or may be classified by anatomy into different groups of diseases. The different symptom datasets may include a database of symptoms for different medical departments or anatomical classifications.

And S106, determining a disease type prediction result corresponding to the disease type prediction request by using a disease type classifier in the reinforcement learning disease type determination model according to the acquired symptom information.

In a specific implementation process, the disease type classifier stores symptom information and disease types associated with the symptom information, and after the symptom information is acquired by the controller, the disease type classifier can inquire the disease types associated with the symptom information according to the symptom information and give the occurrence probability of each disease type.

It should be emphasized that the technical solution of the present application is not a disease diagnosis process, but is only used for predicting disease types, and the prediction result can be used for assisting a doctor in disease diagnosis and treatment, and cannot be directly used as a final disease diagnosis result.

On the basis of the above embodiments, in an embodiment of the present specification, the performing symptom query by using a symptom data set in a reinforcement learning disease type determination model and collecting symptom information includes:

In a specific implementation process, the target symptom data sets are sequentially selected to be symptom data sets of initial symptoms which are represented by the symptom data sets and include disease type prediction requests, and it can be understood that the symptom data sets are preset inquiry procedures, and corresponding disease type prediction results are generated according to feedback symptom inquiry results.

Illustratively, the reinforcement learning disease type determination model includes five symptom data sets, a first symptom data set may be sequentially selected as a target symptom data set, after all symptoms in the first symptom data set are queried, a second symptom data set is selected as a target symptom data set according to a preset order, symptom query is performed again until all five symptom data sets complete symptom query, and symptom information is obtained based on all acquired symptom query results.

According to the disease type prediction method provided by the embodiment of the specification, due to the arrangement of the plurality of symptom data sets, the classification of data can be ensured to be more definite, the inquiry of symptoms is more in line with the inquiry process of doctors, the user can be ensured to quickly explain the symptoms of the user, and the user experience is improved.

On the basis of the above embodiments, in an embodiment of the present specification, the initiating a symptom query according to the symptom database in the target symptom data set includes:

In a specific implementation process, it is understood that the upper limit of the number of dialog times of symptom query and the preset query stop condition are not specifically limited in the embodiment of the present specification, and may be set according to actual needs.

For example, in the case of a symptom query, if all symptoms in the first symptom data set are twenty, the next symptom data set may be selected as a new target symptom data set after the tenth query, or if the user does not answer the symptom query "yes" and the built-in answer "yes", the next symptom data set may be selected as a new target symptom data set.

The disease type prediction method provided by the embodiment of the specification increases the ending condition when the symptom inquiry is carried out and the symptom information is collected, can accelerate the symptom inquiry, avoid the useless symptom information collection and accelerate the disease type prediction result process.

On the basis of the foregoing embodiments, in an embodiment of the present specification, fig. 2 is a flowchart of a method for establishing a reinforcement learning disease type determination model according to an embodiment of the present application, and as shown in fig. 2, the method for training the reinforcement learning disease type determination model includes:

s402, collecting sample data, wherein the sample data comprises a sample symptom data set and a disease type corresponding to the sample symptom data set;

s404, inputting the sample symptom data set into the reinforcement learning disease type determination model, and training the reinforcement learning disease type determination model in the following way: sequentially utilizing the symptom data set in the reinforcement learning disease type determination model to inquire the symptom of the sample symptom data set and acquire symptom information; if the symptom inquiry result of the symptom data set is yes, setting the reward value of the symptom data set as a positive preset internal reward value, if the symptom inquiry result of the symptom data set is no, setting the reward value of the symptom data set as a negative preset internal reward value, and if the symptom inquiry result of the symptom data set is empty, setting the reward value of the symptom data set as 0;

s406, inputting the acquired symptom information into a disease type classifier in the reinforcement learning disease type determination model, and predicting a predicted disease type corresponding to the sample symptom data set by using the disease type classifier.

On the basis of the above embodiments, in an embodiment of the present specification, as shown in fig. 7, fig. 7 is a flowchart of another reinforcement learning disease type determination model establishment method provided in the embodiment of the present application,

s408, comparing the predicted disease type corresponding to the sample symptom data set predicted by the disease type classifier with the disease type corresponding to the sample symptom data set, setting the reward value of the reinforcement learning disease type as a positive preset external reward value if the predicted disease type is the same as the disease type corresponding to the sample symptom data set, and setting the reward value of the reinforcement learning disease type as a negative preset external reward value if the predicted disease type is different from the disease type corresponding to the sample symptom data set.

On the basis of the above embodiments, in an embodiment of the present specification, the acquiring symptom information by performing symptom query using a symptom data set in a reinforcement learning disease type determination model includes:

In a specific implementation, the sample data may be data derived by an electronic health record system, and it is understood that the electronic health record system may include a diagnosis process of a doctor, symptoms of a disease type, and a diagnosis result of the disease type.

The reinforcement learning disease type determination model provided by the embodiments of the present specification may be a deep learning (CNN, LSTM) based approach, which may give a relatively high disease prediction accuracy given all observed symptoms. On the other hand, the Probabilistic Graphical Model (PGM) and information gain-based method can be used for effective inquiry according to a disease-symptom transition probability matrix, the probabilistic graphical model-based method has lower computational complexity in a model inference (inference) part, and meanwhile, the disease-symptom transition matrix of the probabilistic graphical model can be labeled by professional doctors.

When a reinforcement learning disease type determination model is constructed, the disease can be divided into a plurality of symptom data sets based on a hierarchical reinforcement learning mode, a two-layer dialogue system is established, and automatic disease diagnosis is carried out by a hierarchical reinforcement learning method. The method is established in a two-stage grading mode, and is popularized to a hierarchical structure with two-layer strategies in order to reduce the problem of overlarge action space, so that automatic diagnosis is realized. The constructed framework design can be shown in fig. 4, and fig. 4 is a framework diagram of a reinforcement learning disease type determination model provided by the embodiment of the present application. The system comprises four components in five frames, namely a controller, an actuator, a disease type classifier, an internal evaluation module and a user simulator. The actuator may include a plurality of execution components.

The upper-layer strategy can be composed of a controller as a model, and the lower-layer strategy can be composed of an actuator and a disease type classifier. The controller is responsible for triggering the actuators of the underlying strategies and the disease type classifier. Each executive component is responsible for querying symptom data associated with a certain set of diseases, while the disease type classifier is responsible for making a final diagnosis based on the information collected by the executive components.

The reinforcement learning disease type determination model mimics a group of physicians from different departments to diagnose patients together. Where each executive component is like a doctor from a particular department and the controller is like a committee, delegating the doctor to interact with the patient.

When sufficient symptom information is collected from the executive component, the controller will activate a separate disease type classifier to make the diagnosis. The models of the two levels are jointly trained for better disease diagnosis.

For an automated diagnostic model based on hierarchical reinforcement learning, the action space of an agent may be a may be D @ S, where D is the set of all disease types and S is the set of all symptom information related to these diseases.

Given a state St ∈ S, the agent follows its policy with a_t～π(a|S_t) Take action and immediately obtain a reward r from the environment (patient/user)_t＝R(s_t，a_t)。

If a is_tE.g., S, the agent selects a symptom to ask the patient or user. The user then responds to the agent with true, false, or unknown, and the corresponding symptom can be represented by a three-dimensional heat vector b e R3.

If at a_tE.g. D, the agent informs the user about the corresponding disease as a disease type prediction result, and the session is terminated as success or failure of the diagnosis correctness. Symptom information

That is, a hot coded state for each symptom and a concatenation of symptom data not required before the t-turn point are coded as b [0,0,0 ].

The goal of the agent is to find an optimal strategy that maximizes the desired future cumulative discount reward

Wherein gamma is ∈ [0,1 ]]Is the discount factor and T is the maximum round of the current dialog session.

The Q-value function is the expected return for executing policy a with state s. Q^π(s,a)＝E[R_T|s_t＝s,a_t＝a,π]The best Q function is the maximum Q of all possible strategies:

Q^*(s,a)＝max_πQ^π(s, a). The bellman equation followed by the Q-value function is:

one strategy pi is optimal if and only if, for each state and behavior, Q^π(s,a)＝Q^*(s, a). The policy may be based on pi (a | s ═ argmax)_a∈AQ^*(s, a) is decisively reduced.

Specifically, we divided all diseases in D into h subsets D1, D2₁∪D₂∪…∪D_h＝D，

For any i ≠ j, where i and j ≠ 1,2, … h. Each D_iAre all associated with a range of symptoms S_iIs a proper subset of S, the symptoms of which are associated with disease. While executing component wⁱIs responsible for collecting relevant S from the user_iInformation of symptoms.

At t, the controller decides to collect symptom information from the user (selects one)Multiple rounds of interaction by the execution component with the user) or to inform the user of the disease type prediction result (selecting the disease type classifier to output the predicted disease type). FIG. 5 is a schematic diagram of a diagnosis process of a reinforcement learning disease type determination model provided by an embodiment of the present application, and FIG. 5 shows the interaction between two levels of models in the diagnosis process, where wⁱIs an act of invoking an execution component, and d is an act of invoking a disease type classifier. The internal evaluation module is then responsible for returning the intrinsic rewards to the execution component and telling the invoked execution component whether the subtasks have been completed. Further, the user simulator is applied to interact with the model and return an external reward, it being understood that the user simulator is used to simulate the user.

The controller has a motion space A^m＝{wⁱI | 1,2, … h } < u { d }, action wⁱIndicating activation of the executive component and d indicating activation of the original action of the disease type classifier. At t, the controller takes a conversation state st epsilon S as input and according to a preset strategy

Performing an action

An external award

Will be returned to the controller from the environment

The decision making process of the controller is not a standard markov decision making process. Once the controller has activated an execution component, the execution component will interact with the agent n times until the subtask is terminated. Only after the subtask is terminated can the controller take a new action and observe a new dialog state. The reinforcement learning of the controller may employ a semi-Markov decision process, and external rewards returned during interaction of the agent with the selected execution components may be accumulated as direct rewards to the controller. That is, after taking action, the reward of the host may be defined as:

where i is 1, …, h,

is the external reward for ambient feedback at time t, r is the controller's discount factor, and n is the number of executing component original actions. We can derive the calculation formula of the controller from bellman's equation as follows:

where s' is the controller performing action a^mThe state of the conversation observed thereafter, a^m’Is the next action with state s'. The goal of the controller is to maximize the extrinsic reward through a semi-Markov decision process, so the loss function of the controller is as follows:

when in use

When theta is greater than theta_mThe network parameters of the current iteration are,

is the network parameter of the previous iteration and beta m is the fixed-length buffer of the main sample.

Execution Unit wⁱSet D with disease type_iAnd symptom data set S_iCorrespondingly, the execution component wⁱIs a behavioral space of

At t, if the execution component w is calledⁱThen the controller s_tWill pass through to the current state ofTo the execution component wⁱThen execute the component wⁱWill be from s_tIs prepared by

And will s_tAs input and generate an operation

The state extraction formula is as follows:

wherein

Is S_(j)∈S_iRepresents a vector.

After the corresponding action has been taken, the user may,

the conversation is updated to s_t+1Execution Unit wⁱWill receive intrinsic rewards from the internal criticizing module

The goal of the employee is to maximize the desired cumulative discount intrinsic reward. Thus, the execution component wⁱThe loss function of (d) can be written as:

wherein the content of the first and second substances,

γ_wis the discount factor for all the executing components,

is the network parameter of the current iteration,

is the definition of sample dataA long buffer area.

After the symptom information is collected, the disease type classifier is activated, and the disease information s collected by the controller is used^tAs input, and outputs a vector P ∈ R^|D|Which is expressed as a probability distribution of all diseases, the disease type classifier feeds back the most likely disease to the agent as a disease type prediction. It is understood that the above model uses a two-layer Multi-layer Perceptron (MLP) for prediction of disease type.

The internal evaluation module provided by the embodiment of the specification is responsible for executing actions

After the t-th time, an internal award is generated

To the execution unit wⁱWherein if the symptom information returned by the agent is the same as the content of the symptom inquiry, the reward is given

Equal to + 1. If the content of the symptom inquiry is the same as the question inquired before (the execution component has repeated actions) or the number of the symptom inquiries reaches a preset number threshold T^subWhen the reward is equal to-1, otherwise the reward is equal to 0.

The internal evaluation module is also responsible for judging the termination condition of the execution component. When the symptom is inquired, repeated actions are generated or the symptom inquiry frequency of one execution component reaches a preset frequency threshold value T^subThe execution component is terminated. The execution component is terminated as successful when the agent actually reacts to the symptom query. It is to be appreciated that the currently executing component completes the subtasks by collecting sufficient symptom information.

The user simulator provided by the embodiment of the specification is used for simulating the symptom collection process of a user, and the user simulator can interact with an intelligent agent. At each symptom query, the user simulator randomly draws a user target from the sample data. All apparent symptoms of the sampled user target are used for symptom query. In the symptom inquiry process, the user simulator interacts with the intelligent agent according to a certain rule according to the symptom data set. If the agent makes a correct diagnosis, a session will be terminated. To improve the efficiency of the interaction, the dialog will be terminated when repeated actions occur.

The reward mode provided by the embodiments of the present specification can be set as follows: since the number of symptoms of one patient is smaller than the size of the symptom data set, a part of symptoms in the symptom data set cannot be inquired. It can also be appreciated that it is difficult for an agent to locate the symptoms that the user actually suffers. In order to improve the efficiency of symptom inquiry, a reward forming method is used, and under the condition that the optimal reinforcement learning policy is kept unchanged, auxiliary rewards are added on the basis of the original external rewards.

From symptom information S_tTo symptom information S_t+1Information on symptoms S_tThe auxiliary reward function is defined as

Wherein the content of the first and second substances,

is a latent function and can be defined as

Wherein the content of the first and second substances,

calculating the number of true symptoms of given symptom information S, λ being a greater than 0 hyperparameter, λ being used to control the size of reward formation, S_⊥Is the last symptom data set. The reward function of the controller becomes

Strategy pi of controller^mAnd policy pi for each execution component^imAre trained in a deep reinforcement learning mode.

In the deep reinforcement learning process, actions typically select the ∈ -greedy policy below. In our hierarchical framework, both the controller and the execution component are trained following their e-greedy administration. During the training process, we store

Beta in^mIn, store

In that

In (1). Respectively aligning BETA^mAnd

and repeatedly training, and fixing the experience transfer process of the target network. The target network will be updated (changed to the current network) and the training will end again or proceed to the next training. At each step, the current network training set will evaluate the experience and the buffer will only keep the step with the highest success rate of the current network update. Thus, previous experience is removed to produce sample pad iterations, which will speed up the incubation process.

For a disease type classifier, its terminal state and corresponding disease label will be updated every 10 cycles in the training process.

On the basis of the foregoing embodiments, in an embodiment of the present specification, the sample data includes:

a real case dataset and a synthetic dataset for a hospital system.

In a specific implementation process, sample data is used for training the model by adopting two data sets, the reinforcement learning disease type determination model can be applied to a dialogue system, and the dialogue system has the data sets from real cases as task guidance to preset disease types. And the original symptom data set is expanded to set the following marking strategies. The symptom dataset provided by the present specification may comprise 1,490 constructed real world datasets (RD) belonging to 4 diseases, and the goals of the user may include respiratory tract infections (URI), functional dyspepsia (CFD) in children, Infantile Diarrhea (ID) and bronchial inflammation (CB). Raw data may be collected from an existing network.

Each of the original symptom data sets may include self-reports provided by the user and session text between the patient and the doctor. We can employ a medical context to determine symptom performance, with three labels labeled ("true", "false" or "null") indicating whether the user has the symptom. After this, the expert manually links to clinical medical system terminology for each symptom expression. It is understood that self-reports and conversations may be flagged. Those implicit symptoms that are considered explicit can be extracted from the self-reported symptoms, and can be extracted from the conversation.

Statistical data for the real case data set of the hospital system as can be seen in table 1, the real case data set we constructed can contain the target of 1490 users, 80% of which are used for 20% of training and testing.

Type of disease	Number of samples	Mean value of overt symptoms	Mean value of recessive symptoms	Number of symptoms
					ID	450	2.22	4.68	83
CFD	350	1.73	5.05	81
					URI	240	2.79	5	81
CB	450	3.01	5.35	84
					Total up to	1490	2.44	3.68	90

TABLE 1-actual case data set List

In addition to the real case data set, a synthetic data set is created. The synthetic data set is based on the symptom disease database of SymCat. There were 801 diseases in the database, which were divided into 21 departments (groups) according to the statistical classification of international diseases and related health problems. 9 representative departments were selected from the database, each containing the top 10 diseases, based on the incidence of disease in the centers for disease prevention and control database.

In the disease control center database, each disease is associated with a set of symptoms, each of which is likely to indicate how likely the disease is to be. Records are generated for each target disease one by one, based on the probability distribution. For a disease and its associated symptoms, the generation of the user target may follow two steps.

First, for each relevant symptom, the symptom label is sampled (true or false). Second, one symptom is randomly selected from all true symptom sets as a dominant symptom (the same as the self-reported symptoms of the true case data set), and the remaining true symptoms are treated as recessive symptoms. The records for the synthesized data set can be seen in table 2. A description of the resultant data set is shown in table 3. The constructed composite dataset contained 30000 user goals, 80% of which were used for training and 20% for testing.

TABLE 2 types of diseases and symptoms in the synthetic data set

The type of disease selected in the synthetic dataset. Cerebral edema contained 1 dominant symptom and 4 recessive symptoms in total. Cerebral edema corresponded to the symptom data set in group 6.

TABLE 3-summary of the synthesized data set

In a particular implementation, ∈ for the controller and all executing components is set to 0.1. For the controller, the preset number threshold is set at 20, and if the controller predicts the correct disease type, the controller will receive an external reward of + 1. Otherwise, if the number of queries reaches a preset number threshold or the correct disease type is not predicted, an extrinsic reward of-1 is obtained. When no disease type prediction results are given, the extrinsic reward is 0.

The preset number threshold for each execution block is set to 5. For the controller and all executing components, the deep reinforcement learning neural network is a three-layer network with two training models, all parameters are not specifically limited in the embodiment of the present specification, and may be set according to actual needs, for example, the size of the hidden layer is 512, and the learning rate for the deep reinforcement learning neural network is set to 0.0005.

It will be appreciated that the parameter settings of both levels may be the same. In addition, in the all-executive-component training process, 10 periods of all executive-component training correspond to one training of the controller. For the disease type classifier, the neural network is a two-layer network having iterative layers, the size of the hidden layer is 512, the learning rate of the network is set to 0.0005, and the training is performed every 10 cycles in the training process of the controller.

In the training process, it takes about 5000 cycles for the model to converge, and about 18 hours to use one GPU (picture processor). The parametric dimensions of this model are 12. In the hyper-parameter search, the reduction factor gamma of the controller and the reduction factors gamma of all the execution components are included_wA reward size adjustment factor λ. Wherein, γ_wIs a value of 0 or more and less than 1, and λ is a value of 0 or more.

The best performing model is tested with gamma set to 0.95 and gamma_wAre set to 0.9 and lambda is set to + 1. The method of selecting the hyper-parameters is a manual adjustment based on the accuracy of the model training.

After the model is built, the reinforcement learning disease type determination model is compared with some advanced disease diagnosis reinforcement learning models.

Flat-DQN: the model has only one layer strategy and one operating space containing symptoms and disease.

And (3) training an HRL model in advance, wherein the model is set to be similar to the reinforcement learning disease type determination model, but the low-level strategy is trained firstly, and then the high-level strategy is trained. In addition, pre-training the HRL does not have a disease type classifier specifically for disease diagnosis, and is diagnosed by the staff.

In addition, the reinforcement learning disease type determination model comprises two sub-models, the two sub-models follow the supervised learning setting, and the prediction of the disease type is regarded as a multi-class classification problem.

SVM-a model of dominant and recessive symptoms that takes symptoms as input and predicts the target disease. The model uses dominant and recessive symptoms as inputs and uses SVM for classification. Since the model can obtain all the implicit symptoms from the user, the model can be considered as an RL-based model.

SVM-dominant symptom model-classification by support SVM with only definite symptoms as input, which can be considered to be an RL-based model.

For the real case dataset and the synthetic dataset, 80% of the samples were used for training and 20% for testing. The embodiments herein use three criteria to evaluate the dialog system, i.e., power, average reward, and number. It will be appreciated that since there is only one disease per group, implementing a multi-classification model for disease diagnosis is not a trivial matter if there is a lack of results from deep reinforcement learning training of the real case data set.

Model name	Rate of accuracy	Reward	Number of times
				SVM-dominant symptom model	0.663±.003	\	\
Flat-DQN	0.681±.018	0.509±.029	2.534±.171
				Pre-trained HRL model	\	\	\
Application model	0.695±.018	0.521±.022	4.187±.187
				SVM-model of dominant and recessive symptoms	0.761±.006	\	\

TABLE 4 index comparison Table for different models under real case data set

Model name	Rate of accuracy	Reward	Number of times
				SVM-dominant symptom model	0.321±.008	\	\
Flat-DQN	0.343±.006	0.327±.003	2.455±.065
				Pre-trained HRL model	0.452±.013	0.439±.023	6.838±.358
Application model	0.504±.018	0.473±.056	12.959±.704
				SVM-model of dominant and recessive symptoms	0.732±.014	\	\

TABLE 5 index comparison Table for different models under the synthesized data set

Table 4 and table 5 show the overall performance of the different models on the real case dataset and the synthetic dataset, respectively. As can be seen from the table, the diagnostic performance of the SVM-dominant and recessive symptom models on the two data sets is superior to that of the SVM-dominant symptom models, which shows that the recessive symptom can obviously improve the diagnostic accuracy. In addition, the difference between the SVM-dominant and recessive symptom models and the SVM-dominant symptom models on the synthetic dataset is much larger than the difference on the real case dataset due to less symptom overlap between different diseases.

Due to the need for additional implicit symptoms, the Flat-reinforcement learning model and the pre-trained HRL model have higher diagnosis accuracy on both data sets than the SVM-explicit symptom model, and therefore it can be seen that the model provided by the embodiment of the specification is superior to other existing models.

Compared with a baseline model, the reinforcement learning disease type determination model provided by the embodiment of the specification has more turns to interact with the user, and can collect more information about implicit symptoms of the user. Because the implicit symptom information is collected more, the reinforcement learning disease type determination model is obviously superior to other baselines in diagnosis success rate.

To evaluate the performance of the different executive components and the disease type classifier, some additional experiments were performed based on a reinforcement learning disease type determination model.

All the wrong disease type prediction results are collected and fitted into a matrix chart, fig. 6 is an error analysis chart of a reinforcement learning disease type determination model provided by the embodiment of the application on different groups of disease classification, and as shown in fig. 6, 9 groups of disease prediction results are displayed. It can be seen that the diagonal squares are darker in color than the other squares, which means that most of the wrong informed user goals are informed that the disease is in the same group, indicating that the prediction results of the reinforcement learning disease type determination model are relatively correct.

TABLE 6-Performance of different execution Components in a reinforcement learning disease type determination model

We evaluate employee performance in terms of success rate, average intrinsic rewards, and match rate. The matching rate refers to a proportion of operations requested with respect to implicit symptoms possessed by the user. The results can be seen in table 6 that there is a positive correlation between the average intrinsic reward and the match rate, which means that the more implicit symptoms the executive component requires from the user, the more accurate the outcome of the disease type prediction.

The reinforcement learning disease type determination model provided by the embodiment of the specification is based on a hierarchical learning algorithm integrating deep reinforcement learning and human resource management, and can simultaneously learn strategies of different levels. The present specification embodiments propose a general framework to first learn useful skills in an environment (high-level strategies) and then utilize the acquired skills to learn more quickly in downstream tasks. In addition, a method for automatically generating or finding targets during two-level strategy training is also provided, and the practicability and the applicability of the disease type prediction method can be improved.

The reinforcement learning disease type determination model provided by the embodiments of the present specification allocates symptom information collection and disease type prediction to low-level execution components and disease type classifiers. A controller is designed at a high level and is responsible for a low-level trigger model. And the original model is improved by introducing a disease type classifier and an internal evaluation module, and the disease type classifier and the internal evaluation module are combined for training to achieve the optimal state.

On the other hand, embodiments of the present disclosure provide a disease type prediction apparatus, and fig. 8 is a schematic diagram of a disease type prediction apparatus provided in embodiments of the present disclosure, and as shown in fig. 8, the apparatus includes:

a request receiving module for receiving a disease type prediction request;

In another aspect, an embodiment of the present specification provides a disease type prediction processing apparatus, including: at least one processor and a memory for storing processor-executable instructions, the instructions when executed by the processor implementing the method of any of the above.

The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the apparatus, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.

In yet another aspect, the present embodiment provides a disease type prediction system, including a controller, a symptom query module, and a disease type determination module, where the symptom query module includes a plurality of symptom data sets, the plurality of symptom data sets are classified based on department classifications in a medical system, and different symptom data sets include symptom databases of different medical departments;

Since the technical effects of the disease type prediction device, the processing equipment and the system are the same as those of the disease type prediction method, the details are not repeated herein.

The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal, a server or a similar operation device. Taking an example of the server running on the server, fig. 9 is a block diagram of a hardware structure of the server of the disease type prediction method provided in the embodiment of the present application, as shown in fig. 9, the server 900 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 910 (the processor 910 may include but is not limited to a Processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 930 for storing data, and one or more storage media 920 (e.g., one or more mass storage devices) for storing an application 923 or data 922. Memory 930 and storage media 920 may be, among other things, transient or persistent storage. The program stored in the storage medium 920 may include one or more modules, each of which may include a series of instruction operations in a server. Still further, the central processor 910 may be configured to communicate with the storage medium 920, and execute a series of instruction operations in the storage medium 920 on the server 900. The Server 900 may also include one or more power supplies 960, one or more wired or wireless network interfaces 950, one or more input-output interfaces 940, and/or one or more operating systems 921, such as a Windows Server^TM，Mac OS X^TM，Unix^TM，Linux^TM，FreeBSD^TMAnd so on.

The input/output interface 940 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 900. In one example, the input/output Interface 940 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the input/output interface 940 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

It will be understood by those skilled in the art that the structure shown in fig. 9 is only an illustration and is not intended to limit the structure of the electronic device. For example, server 900 may also include more or fewer components than shown in FIG. 9, or have a different configuration than shown in FIG. 9.

Embodiments of the present application further provide a storage medium, which may be disposed in a server to store at least one instruction or at least one program for implementing a disease type prediction method according to the method embodiments, where the at least one instruction or the at least one program is loaded and executed by the processor to implement the disease type prediction method provided by the method embodiments.

Alternatively, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware to implement the above embodiments, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of disease type prediction, the method comprising:

receiving a disease type prediction request;

2. The method of claim 1, wherein said symptom information is collected by performing symptom queries using a symptom data set in a reinforcement learning disease type determination model, comprising:

3. The method of claim 2, wherein initiating a symptom query based on the symptom database in the target symptom dataset comprises:

4. The method of claim 2, wherein the method of training the reinforcement learning disease determination model comprises:

5. The method of claim 4, further comprising:

6. The method of claim 4, wherein the sample data comprises:

a real case dataset and a synthetic dataset for a hospital system.

7. The method of claim 1, wherein the symptom information is collected by performing symptom queries using a symptom data set in a reinforcement learning disease type determination model, comprising:

8. A disease type prediction apparatus, characterized in that the apparatus comprises:

a request receiving module for receiving a disease type prediction request;

9. A disease type prediction processing apparatus characterized by comprising: at least one processor and a memory for storing processor-executable instructions, the processor implementing the method of any one of claims 1-7 when executing the instructions.

10. A disease type prediction system comprising a controller, a symptom query module, a disease type determination module, the symptom query module comprising a plurality of symptom data sets, the plurality of symptom data sets being categorized based on department classifications in a medical system, different symptom data sets comprising symptom databases for different medical departments;