CN108717862B

CN108717862B - Intelligent trial and development method based on machine learning

Info

Publication number: CN108717862B
Application number: CN201810317800.4A
Authority: CN
Inventors: 罗安; 周聪俊; 史鹏翔; 张楠; 许春霞; 乔新宇
Original assignee: Sichuan Junyi Fudun Technology Co ltd
Current assignee: Sichuan Junyi Fudun Technology Co ltd
Priority date: 2018-04-10
Filing date: 2018-04-10
Publication date: 2022-05-03
Anticipated expiration: 2038-04-10
Also published as: CN108717862A

Abstract

The invention discloses an intelligent party-checking evolution model based on machine learning, which belongs to the technical field of machine learning models, and is characterized in that a classification algorithm is used for solving the model, historical prescription data and party-checking data are learned to obtain a probability matrix of the model based on a Bayesian algorithm, finally, an intelligent party-checking model is established through a medicine and a probability matrix of preliminary diagnosis, a medicine relevance model is established through medicine relevance analysis under the same preliminary diagnosis, relevance analysis between the preliminary diagnosis and the medicine is solved, and whether a recommended medicine accords with the preliminary diagnosis or not is determined on the premise of analyzing and formulating the preliminary diagnosis; standardizing the data of the medicine components through the correlation analysis between the preliminary diagnosis and the medicine components of the medicine; after the user symptom data and the drug indication data are normalized, the relevance between the user symptom and the corresponding drug component is analyzed.

Description

Intelligent trial and development method based on machine learning

Technical Field

The invention belongs to the technical field of machine learning models, and particularly relates to an intelligent trial and development method based on machine learning.

Background

With the wide application of the existing internet remote inquiry, the inquiry flow of the patient is roughly simplified as follows: the patient comes to a PC and is connected with a video screen to describe the illness state through voice, a doctor obtains the symptoms of the illness state and the severity of the illness state, the doctor determines the etiology according to medical relevant knowledge to obtain initial diagnosis so as to obtain the disease type, then the doctor gives medicines according to symptoms and disease types, and corresponding relevant medicines are selected from a medicine database to form a prescription. And then the prescription is sent to a pharmacist by the system for prescription verification, and the pharmacist completes the prescription verification according to the prescription collocation, the patient information description, the preliminary diagnosis and the medicine dosage.

In general, a user consults a physician to describe relevant information such as the condition (i.e., symptoms) and the history of allergies. The physician then makes a preliminary diagnosis and prescribes a prescription based on the preliminary diagnosis and the user's symptoms, where the prescription typically contains one or more drugs, which all have attributes: the ingredients, applicable symptoms and contraindications of the medicine. The medicine components among different medicines need to be made into separate conflict analysis, the applicable symptoms among different medicines can be analyzed whether the medicines are applicable to the user symptoms or not, namely whether the medicines are taken according to the symptoms of the user, and the medicine contraindication of the medicines, the allergic history and the genetic medical history of the user and the like can form the verification of medicine taking safety.

Due to the development needs of products, an intelligent trial prescription development system based on machine learning is required to be established and used for assisting doctors to diagnose and serve patients quickly, accurately and efficiently.

Disclosure of Invention

In order to solve the above problems in the prior art, the present invention aims to provide an intelligent trial prescription development method based on machine learning to achieve the purpose of assisting a doctor pharmacist to diagnose and service a patient quickly, accurately and efficiently.

The technical scheme adopted by the invention is as follows: the utility model provides an intelligent auditor evolution method based on machine learning, which comprises the following contents:

(one) all ingredients of the drug are listed as { a₁，a₂，a₃....a_wW components in total, the set of w components being denoted by A and containing a list of components { x } for the selected drug i₁,x₂.....,x_kExpressing a list of k components of the ith medicine by using a set X, wherein the X belongs to A, i is a variable and takes a value of 1-n and expresses the ith medicine in the n medicines listed in the electronic prescription;

(II) obtaining the overall relevance F (y) based on a relevance analysis model of a classification algorithm_j) The following functional formula is adopted:

wherein, w_iRepresents the cost coefficient of drug i and w_i∈[0，1]，P1_ijRepresents the probability magnitude that the drug i belongs to the preliminary diagnosis group j and P_ij∈[0，1](ii) a Then P2_kjRepresents a drug component x_kThe probability of belonging to the preliminary diagnostic packet j is then given by the following functional formula:

wherein, P2_qjRepresenting the probability magnitude that the drug component q belongs to the preliminary diagnostic packet j;

by combining (2-1) and (2-2), the following functional formula is obtained:

the objective function is:

the preliminary diagnosis group j belongs to m preliminary diagnosis groups of the total, j belongs to (1, 2, 3.. eta.. m), and the probability of each medicine i belonging to each preliminary diagnosis group j is P1_ijThen P1_ijAn array of dimensions n × m, i.e.:

and (III) simplifying the process of the party to be examined into the following steps: the weighted sum of the correlation size of each ith drug in the n drugs prescribed in the prescription with the preliminary diagnosis group j of the disease is the overall correlation size F (y)_j) Setting the total relevance weighted sum to be epsilon and epsilon to be (0, 1)]When the total correlation size is F (y)_j) If the relation is less than epsilon, the grouping j of the initial diagnosis relation of the n medicines and the diseases is a strong relation, and the prescription is used for checking; when the overall relevance size F (y)_j) When the dosage is larger than epsilon, it indicates that the prescribed n medicines and the primary diagnosis of the disease are presentIf the break relationship group j is weak, the prescription will not be checked.

Further, the probability matrix P_ijThe calculation steps are as follows:

(1) data initialization, for a given set of preliminary diagnoses C_jSplitting and preliminarily diagnosing;

(2) selection of prescription E in preliminary diagnosis_jPrescription E_jIn which the medicine X is contained₁，X₂......X_k；

(3) For drug X_iAnd i is less than or equal to k, and extracting a drug component vector { x₁,x₂.....,x_nThe assumption is that the drug components are independent of each other; if the medicine components cannot be extracted, discarding the medicine, and regarding the medicine as an invalid medicine;

(4) recording the drug composition to preliminary diagnosis C_jIf the component has appeared, the number of appearance times is increased by 1 in the record of the component, and at the same time, the corresponding medicine is recorded in the preliminary diagnosis C_jThe number of occurrences in (1);

(5) recording preliminary diagnosis C_jThe total number of samples is added by 1;

(6) repeating the steps (2) to (5) until all medicines in the prescription are completely learned so as to obtain a preliminary diagnosis list;

(7) traversing the preliminary diagnosis list, and calculating the preliminary diagnosis C of the medicine components in the preliminary diagnosis list_jThe posterior probability of (2):

wherein x is_iIs a pharmaceutical ingredient and x_i∈{x₁,x₂.....,x_nD is the sample space,

representing the drug ingredients in the preliminary diagnosis list for the preliminary diagnosis C_jA posterior probability of (2), wherein, count (x)_i∧C_j| D) represents the preliminary diagnosis C_jChinese medicinal component x_iIn the sample space DNumber of occurrences, count (C)_j| D) represents the preliminary diagnosis C_jNumber of occurrences in sample space D;

(8) repeating step (7) until a preliminary diagnosis C_jAll the medicine components are calculated;

(9) traversing all the medicines in the preliminary diagnosis list;

(10) calculating drug X_iFor preliminary diagnosis C_jProbability of (c):

as a pharmaceutical ingredient<x₁,x₂...x_n>Indicating drug X_iDue to the medicine<x₁,x₂...x_n>Probability p (x)₁,x₂...x_n) And (3) as a constant, carrying out normalization processing on the constant, and according to Bayesian hypothesis, developing by applying a Bayesian formula to obtain:

wherein the formula (2-7) means the drug X_iChinese medicine component for preliminary diagnosis C_jThe product of the posterior probabilities of, i.e. the pharmaceutical product X_iFor preliminary diagnosis C_jThe probability of (d);

(11) repeating steps (9) - (10) until a preliminary diagnosis C_jThe probability of all the medicines is calculated;

(12) repeating the steps (7) - (11) until all the medicines finish calculating the probability of all the initial diagnoses to obtain a probability matrix P1_ij。

Further, it is characterized in that the cost coefficient w in the formula (2-1)_iThe calculation formula is as follows:

wherein the cost coefficient w_iPreliminary applicability to drugs omega_iIn inverse proportion, the stronger the applicability of the medicine is, the lower the cost of the medicine taking by mistake is, and k represents a correction parameter, wherein C'_iTo representHow many different preliminary diagnoses the drug i can appear in to measure the suitability of the drug i, ω is_i＝C′_i。

The invention has the beneficial effects that:

1. the intelligent examination prescription developing method based on machine learning can simplify the inquiry flow, extract a mathematical model, establish a corresponding model and solve the model through a classification algorithm; learning historical prescription data and auditor data based on Bayesian algorithm to obtain probability matrix of model, and finally obtaining probability matrix P of medicine and preliminary diagnosis_ijAn intelligent auditor model is established, and a medicine relevance model is established by establishing medication relevance analysis under the same preliminary diagnosis, so that the purpose of opening an intelligent auditor is achieved.

Detailed Description

The invention is further illustrated by the following specific examples.

The invention provides an intelligent party-checking and developing method based on machine learning, which comprises the following steps:

(II) a relevance analysis model based on a classification algorithm, wherein the relevance size is F (y)_j) The following functional formula is adopted:

n

wherein, w_iRepresents the cost coefficient of drug i and w_i∈[0，1]Indicates that if the cost of the drug is wrong, the cost is higher if the cost coefficient is larger, wherein w_ip1_ijIndicating the effectiveness of the ith drug on the jth preliminary diagnosis, if any>0, indicates that the drug i has a promoting effect on the preliminary diagnosis j. Otherwise, it has inhibitory effect, P1_ijRepresents the probability magnitude that the drug i belongs to the preliminary diagnosis group j and P_ij∈[0，1](ii) a Then P2_kjRepresents a drug component x_kThe probability of belonging to the preliminary diagnostic packet j is then given by the following functional formula:

by combining (2-1) and (2-2), the following functional formula is obtained:

the objective function is:

(III) for each prescription process, each ith medicine in the n medicines prescribed in the prescription is respectively associated with the preliminary diagnosis of the diseaseThe weighted sum of the relevance sizes of the packets j is the overall relevance size F (y)_j) Setting the total relevance weighted sum to be epsilon and epsilon to be (0, 1)]When the total correlation size is F (y)_j) If the relation is less than epsilon, the grouping j of the initial diagnosis relation of the n medicines and the diseases is a strong relation, and the prescription is used for checking; when the overall relevance size F (y)_j) If the relation is more than epsilon, the grouping j of the preliminary diagnosis relation between the n medicines and the disease is weak, and the prescription is not used for diagnosis.

Relevance size F (y)_j) The geometrical meaning of (A) is shown in FIG. 1, the area A represents a preliminary diagnosis (or a prescription), and the area B represents the area B₁Region B₂Region B₃Region B_i.._nIndicating the medicine, the areas A are respectively arranged in the areas B₁Region B₂Region B₃Region B_i.._nThe size of the area in (1) represents the probability. If only one drug is recommended then the degradation is: the region A and the region B₁Region B₂Region B₃Region B_i.._nThe repeat area in between. Medicine B_iThe larger the area appearing in the preliminary diagnosis a, the larger the probability. If all drugs are associated with the preliminary diagnosis A, then the prescription is considered valid, if drug B is present_iIf the medicine is not in the area where the preliminary diagnosis of A is located, the medicine B is represented_iIndependent of the preliminary diagnosis A, or the drug B_iNot applicable to the preliminary diagnosis A.

Cost coefficient w in the formula (2-1)_iIs equal to the applicability omega of the medicine_iWherein i represents the ith drug product according to the formula:

wherein the cost coefficient w_iPreliminary applicability to drugs omega_iIn inverse proportion, the stronger the applicability of the medicine is, the lower the cost of the medicine taking by mistake is, and k represents a correction parameter, wherein C'_iIndicates how many different initiatives the drug i may appear inIn the diagnosis, to measure the applicability of drug i, ω is_i＝C′_i。

Based on naive Bayes algorithm, the probability matrix P is_ijThe calculation steps are as follows:

wherein x is_iIs a pharmaceutical ingredient and x_i∈{x₁,x₂.....,x_n}; d is a sample space;

representing the drug ingredients in the preliminary diagnosis list for the preliminary diagnosis C_jA posterior probability of (2), wherein, count (x)_i∧C_j| D) represents the preliminary diagnosis C_jChinese medicinal component x_iNumber of occurrences in sample space D, count (C)_j| D) represents the preliminary diagnosis C_jNumber of occurrences in sample space D;

(9) traversing all the medicines in the preliminary diagnosis list;

(10) calculating drug X_iFor preliminary diagnosis C_jProbability of (c):

as a pharmaceutical ingredient<x₁,x₂...x_n>Indicating drug X_iDue to the medicine<x₁,x₂...x_n>Probability p (of (a))<x₁,x₂...x_n>) And (3) as a constant, carrying out normalization processing on the constant, and according to Bayesian hypothesis, developing by applying a Bayesian formula to obtain:

Aiming at the Bayes assumption in the step (3), the following assumption is introduced into the Bayes classifier, and in the given class C, all the attributes are assumed to be independent from each other, namely:

each pharmaceutical ingredient A of the medicine₁,A₂....A_nIndependent of each other, each medicine component A₁,A₂....A_nThe relationship with the preliminary diagnosis C is shown in fig. 2 below, the constant we use the normalization factor a for representation. The posterior probability given a preliminary diagnostic category C given drug X is:

assigning class c_iIt should satisfy:

if classification c_iFor optimal classification, it is then also necessary to satisfy:

p(c_i|<a₁,L,a_n>)>p(c_j|<a₁,L,a_n>),i≠j

wherein a is₁,L,a_nRepresenting n components of the drug.

The prior probability distribution of the class C can simply obtain the maximum likelihood estimation of the class C from the training set data, the maximum likelihood estimation is equal to the frequency of the different class attributes appearing in the data set, and the calculation complexity is O (| D |).

The profile for the bayesian classification model is as follows:

the classification includes a rule-based classification (query) and an irregular classification (directed learning).

Bayesian classification

Is irregular classification, which is trained by a training set (classified example set) to generalize classifiers (the predicted variables are discrete, called classification, and continuous, called regression), and classifies the unclassified data by using the classifiers. Typical classifiers in the bayesian classifier include a naive bayes classifier, a bayesian network classifier, a tree-enhanced naive bayes classification model TAN, and the like.

The bayesian classification has the following characteristics:

A. bayesian classification does not absolutely assign an instance to a class, but rather computes the probability of belonging to a class, the class with the highest probability being the class to which the instance belongs.

B. In general, all attributes in a bayesian classification function directly or indirectly, i.e. all attributes participate in the classification, rather than one or several attributes determining the classification.

C. The attributes of the bayesian classification instances may be discrete, continuous, or mixed.

It is based on the prior probability of h, the probability of observing different data given the assumption, and the observed data itself. The prior probability of h is denoted by P (h) without training data. P (h) is called the prior probability (priorprobability) of h, representing background knowledge about the probability that h is the correct hypothesis. P (D) denotes the prior probability of the training data D to be observed, i.e. the probability of D when no hypothesis is determined. P (D | h) represents the probability of data D assuming h holds. We need to find the probability that h holds given the training data D, i.e. the posterior probability P (D | h) of h, then we find the method of calculating the posterior probability P (D | h) by the bayesian formula:

the maximum a posteriori assumption: in many learning scenarios, the learner considers a set of candidate hypotheses H, and the hypothesis H ∈ H in which the likelihood of requiring generation of data D is greatest. The most probable hypothesis is called the maximum a posteriori hypothesis, noted as: h is_MAP

Where, we remove P (D) and replace it with a constant a, since P (D) is a constant that is independent of h.

In the present embodiment, assume that A₁,A₂....A_nRepresents the characteristic attributes of the medicine (the values are all medicine components and n components in total), and assuming that m classes are counted in the preliminary diagnosis, C is equal to{C₁,C₂,C₃,L,C_m}. Given a particular drug X, the properties of its drug components are { X }₁,x₂.....,x_nX here_iIs attribute A_iThe specific value of (b) represents all the component information of the drug X. The medicine belongs to class C_iHas a posterior probability of P (X | C)_i) And C (X) represents the category to which the medicine is finally classified, namely the category with the highest probability.

I.e. to predict to which class the drug will ultimately belong, given the drug composition, i.e. the posterior probability under that class is the greatest.

The summary of the algorithm for bayes is as follows:

the naive Bayes algorithm is based on a Bayes probability formula, and firstly, a conditional probability formula is introduced. The probability of occurrence at time B, under the condition that time a has occurred, is referred to as the conditional probability (also called a posteriori probability) of event B at a given event a, which is denoted as P (B | a), and accordingly P (a) is referred to as the unconditional probability (also called a priori probability).

In the case of event A, the probability of event B occurring is

Wherein:

P(A·B)＝P(A)P(B|A)＝P(B)·P(A|B)

combining the above formula to obtain:

where S is assumed to be the sample space, A₁,A₂,A₃,L A_mIs a division of the sample space S. Wherein P (A)_j)>0(j＝1,2,3L,m) according to the total probability formula:

the Bayesian formula of the combination is:

for P (A) in the intelligent party checking model_jI B) indicates that the drug B belongs to the preliminary diagnosis A_jThe posterior probability of (d). P (B | A)_j) Indicates the preliminary diagnosis A_jThe probability of containing medicine B is determined according to the central limit theorem P (A)_j) A constant value can be derived from historical statistics.

The invention is not limited to the above alternative embodiments, and any other various forms of products can be obtained by anyone in the light of the present invention, but any changes in shape or structure thereof, which fall within the scope of the present invention as defined in the claims, fall within the scope of the present invention.

Claims

1. An intelligent party-seeking method based on machine learning is characterized by comprising the following steps:

(one) tabulating all the ingredients of the drug as

In total, w components, the set of w components being denoted by A, and for a selected drug i, the list of components contained therein is

Tabulating and aggregating the k components of the ith drug

It is shown that,

wherein, i is a variable with a value of 1-n and represents the ith medicine in the n medicines listed in the electronic prescription;

(II) obtaining the overall relevance based on a relevance analysis model of a classification algorithm

The following functional formula is adopted:

（2-1）

wherein the content of the first and second substances,

represents the cost coefficient of drug i and

，

represents the size of the probability that the drug i belongs to the preliminary diagnostic group j and

；

indicating the composition of a drug

The probability of belonging to the preliminary diagnostic packet j is then given by the following functional formula:

（2-2）

wherein the content of the first and second substances,

representing the probability magnitude that the drug component q belongs to the preliminary diagnostic packet j;

by combining (2-1) and (2-2), the following functional formula is obtained:

（2-3）

the objective function is:

；（2-4）

wherein epsilon represents a threshold value of the overall relevance size, and epsilon is a constant of (0, 1);

said preliminary diagnostic packet j belongs to the total of m preliminary diagnostic packets, then

The probability of each drug i belonging to each preliminary diagnostic grouping j is of the magnitude

Then probability matrix

An array of dimensions n × m, i.e.:

；（2-5）

and (III) simplifying the process of the party to be examined into the following steps: the weighted sum of the correlation size of each ith drug in the n drugs prescribed in the prescription and the preliminary diagnosis group j of the disease is the overall correlation size

When the overall relevance size is large

If the relation is less than epsilon, the grouping j of the initial diagnosis relation of the n medicines and the diseases is a strong relation, and the prescription is used for checking; when the overall relevance size

If the relation is not less than epsilon, the grouping j of the preliminary diagnosis relation between the n medicines and the disease is weak, and the prescription is not used for diagnosis.

2. The machine-learning-based intelligent trial development method according to claim 1, wherein the cost coefficients in the formula (2-1)

The calculation formula is as follows:

（2-8）

wherein the cost coefficient

Preliminary applicability to drugs

Inversely proportional, the stronger the drug's applicability, the less costly the drug is, k represents the correction parameter, where

Indicating a drug

Can be presented in how many different preliminary diagnoses to measure the drug

Applicability of

。