CN114220535A - Real world data-based exogenous febrile disease assistant decision-making system - Google Patents

Real world data-based exogenous febrile disease assistant decision-making system Download PDF

Info

Publication number
CN114220535A
CN114220535A CN202111051801.7A CN202111051801A CN114220535A CN 114220535 A CN114220535 A CN 114220535A CN 202111051801 A CN202111051801 A CN 202111051801A CN 114220535 A CN114220535 A CN 114220535A
Authority
CN
China
Prior art keywords
case
model
base
index
cases
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111051801.7A
Other languages
Chinese (zh)
Inventor
苏芮
刘清泉
马自腾
王烁
郭玉红
王玉贤
徐霄龙
李博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Traditional Chinese Medicine Hospital
Original Assignee
Beijing Traditional Chinese Medicine Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Traditional Chinese Medicine Hospital filed Critical Beijing Traditional Chinese Medicine Hospital
Priority to CN202111051801.7A priority Critical patent/CN114220535A/en
Publication of CN114220535A publication Critical patent/CN114220535A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

Real world data-based exogenous febrile disease aid decision making system, comprising: the system module comprises a quit system submodule and a user information maintenance submodule; the database maintains each characteristic information related to the case retrieval process by using a grouping information maintenance principle, and each grouping information in the module realizes the addition, modification, deletion and query of information; the system comprises an encoding base, a case retrieval algorithm and a case retrieval module, wherein the encoding base generates case encoding information, the module adopts an artificial intelligent natural language (AI) processing model, learns based on case linguistic data in the case base and converts cases in the case base into case codes for characteristic input in the case retrieval algorithm; meanwhile, calculating to obtain the objective weight of each feature in the case base based on the generated case codes; the decision reasoning module retrieves K original cases which are most similar to the target case from the case base based on the input target case, wherein subjective weight correction and a similarity algorithm are involved; a help module that provides profile information about the system.

Description

Real world data-based exogenous febrile disease assistant decision-making system
Technical Field
The invention relates to the technical field of data processing, in particular to an external fever aided decision-making system based on real world data.
Background
The traditional Chinese medicine has rich experience in preventing and treating infectious diseases, and typhoid theory and febrile disease theory are the important theoretical basis of diagnosis and treatment of the traditional Chinese medicine and are formed in the long-term struggle process between ancient people and exogenous febrile disease. The incidence of the exogenous febrile disease of lung system is also extremely high, the disease is rapid, the disease course is short, and the exogenous febrile disease of lung system often appears complicated and complicated clinical manifestations due to the difference of pathogens and the constitution of patients, if the exogenous febrile disease is not controlled in time, the possible metamorphosis occurs, especially the patients with the combined chronic diseases are often in the pathological states of mixed cold and heat, deficient origin and excessive nature, and the conflict of vital qi and pathogens, so the diagnosis and treatment of clinicians are very troublesome. The traditional Chinese medicine exogenous febrile disease is divided into two major systems of typhoid and epidemic febrile disease, and the treatment method of typhoid and epidemic febrile disease is different and the same: exogenous febrile disease, exogenous pathogenic wind and exogenous pathogenic wind all turn into warm after entering yang and brightness, so the syndrome of exogenous pathogenic wind is consistent with that of epidemic febrile disease, and the famous prescriptions of Daqinglong and Xiaoqinglong decoction, Maxingshigan decoction and chaihu decoction for exogenous pathogenic wind are widely used for treating epidemic febrile disease. While stroke, typhoid and epidemic febrile diseases are treated separately in the early stage of exogenous febrile disease, taiyang typhoid therapy is used as pungent-warm to dispel cold and release exterior, and if cold or cool is used by mistake, qi activity is blocked and heat pathogen cannot reach outside; in the early stage of warm diseases, pungent-cool exterior syndrome is treated, and misuse of pungent-warm syndrome will impair the body fluid and consume the fluid, resulting in endogenous dryness-heat. The traditional Chinese medicine considers that the lung is a place with clear deficiency, is delicate and tender, invades pathogenic qi internally, is cold when meeting cold, and is hot when hurting heat, so that the syndromes of mixed cold and heat, deficiency in origin and marked excess, both exterior and interior diseases and reverse disorder of qi activity are easy to occur. Exogenous febrile disease accidentally uses cold and cold to injure yang qi of human body; misuse of warm diseases with pungent and warm natures will consume body fluids; the damp pathogen lingers, and the clinical manifestations change more frequently, and can be cold-transformed and heat-transformed; pathogenic factors of epidemic toxin firstly invade the lung, direct transmission of yangming and reverse transmission of pericardium lead to the occurrence of critical illness due to untimely treatment or treatment error; therefore, the characteristics of the prescription for treating exogenous febrile disease are different from those of internal injury and miscellaneous diseases, the medication is more accurate, the number of the medicines is often less, the dosage of a single medicine is often larger, doctors are required to deeply understand and flexibly apply the theories of epidemic febrile disease and typhoid fever, and the properties and the action characteristics of the medicines are well mastered. The medicine for internal injury miscellaneous diseases is excessive or harmless, such as Atractylodis rhizoma for invigorating spleen, rhizoma Dioscoreae, semen lablab album, semen Nelumbinis, and fructus Jujubae; for example, it is used with Bushen, Buxi Yuan Si, Rou cong, shou Wu, Qian Shi and Du Zhong, but it is also indicated that tonics are not too much and the internal injury is too complicated. If the pathogenic factors are in the sun, and if the pathogenic factors are too early, the pathogenic factors will cause yangming, if the root of kudzu vine is a gentle medicine for treating exogenous febrile diseases; for example, radix puerariae and radix angelicae dahuricae are yang-invigorating powder, but radix angelicae dahuricae is warm-dispersing, radix puerariae is cool-dispersing, radix angelicae dahuricae is cold-dispersing, radix puerariae is cold-dispersing, and radix puerariae is warm-dispersing, and if the warm pathogen is in yang-invigorating, it is difficult to take effect by using radix puerariae and radix angelicae dahuricae, and it is seen that the medicine for exogenous febrile disease is different from the internal injury miscellaneous disease, the prescription is more precise, and each medicine is aimed at the cold-heat property and the invasion position of the pathogenic qi. As described in Qing. Liu Song Feng: "it is necessary to follow the pulse condition to know if the pathogenic factors are in one channel, exterior or interior, and the disease is complicated, with direct insertion with one knife and no restriction with more than five or six flavors. In addition, the lung system exogenous febrile disease is caused by six exogenous pathogens or epidemic toxin, mainly invades the lung, and different symptoms of the same disease appear in different regions, different times and different groups due to the difference of pathogens, human constitutions, climates and regions, so that the uncertainty of exogenous febrile disease onset and the diversity of clinical manifestations both put forward higher requirements on the syndrome differentiation and treatment level of clinicians. In the prescription-dispatching prescription of the lung system, the six meridians, the defense-qi-nutrient-blood and the triple energizer syndrome differentiation system should be considered as the same system, and the prescription-dispatching prescription should be flexibly guided by the theory of exogenous fever. Accurate syndrome differentiation and correct traditional Chinese medicine intervention are the keys for shortening the course of the exogenous febrile diseases of the lung system and reducing the occurrence of severe cases, and the complicated and complicated disease characteristics of the exogenous febrile diseases of the lung system are difficult to deal with in a more stereotyped diagnosis mode of disease typing according to the traditional textbook. The diagnosis and treatment levels of different levels of clinical traditional Chinese medicine doctors on the pulmonary system exogenous febrile diseases are greatly different, especially, the low-age doctors are difficult to effectively combine the cold injury theory and the warm disease theory to think about the onset of the pulmonary system exogenous febrile diseases, so that the syndrome differentiation is not accurate enough, the prescription taking medicine lacks theoretical guidance, the exertion of the clinical curative effect of the traditional Chinese medicine and the experience support are influenced to a certain extent, and the effective inheritance and the development of the experience of famous old experts in the field of the pulmonary system exogenous febrile diseases are urgent.
Traditional Chinese medicine is a typical clinical practice medicine, and the core of the clinical ability of traditional Chinese medicine is the mental ability of traditional Chinese medicine. The traditional Chinese medicine clinical thinking ability is gradually improved in long-term clinical practice, and experts with abundant traditional Chinese medicine clinical experience are special intelligent resources of traditional Chinese medicine and represent the development level of the current traditional Chinese medicine academic. The exogenous febrile disease of traditional Chinese medicine has a very rich theoretical basis, and the typhoid, epidemic and epidemic theories are precious classical theories along with the continuous deepening and generation of the understanding of exogenous febrile diseases of doctors of all generations, and still have very important guiding significance today when people are frequently threatened by sudden new infectious diseases. Therefore, the traditional experience inheritance mode of the famous and old traditional Chinese medicine really has the problems of long talent culture period, low efficiency, small quantity, and poor repeatability and popularization. For a long time, traditional Chinese medicine inheritance work mainly faces two major problems to be solved urgently: on one hand, the traditional Chinese medicine academic experience of the old people is low in inheritance efficiency, strong in subjectivity, poor in repeatability and popularization, and difficult to effectively improve the basic traditional Chinese medicine service capability to meet wide clinical requirements; on the other hand, a large amount of clinically generated traditional Chinese medicine diagnosis and treatment data is low in quality and contains a large amount of noise, incomplete, inaccurate and even inconsistent data, so that the traditional Chinese medicine lacks sufficient effective clinical scientific research data, the overall utilization degree of traditional Chinese medicine and western medicine diagnosis and treatment information resources is not high, and the waste of data resources exists. Therefore, the mode and method innovation of the famous and old traditional Chinese medicine experience inheritance is an important task in the current traditional Chinese medicine industry, and the methodological exploration is needed for describing and effectively propagating the diagnosis and treatment thinking of the famous and old traditional Chinese medicine, particularly the famous and old traditional Chinese medicine with exogenous febrile disease.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an external infectious fever assistant decision system based on real world data, which can provide accurate assistant data for traditional Chinese medicine decision, establish a case reasoning model of the pulmonary external infectious fever traditional Chinese medicine case, and simulate the thinking process of diagnosis and syndrome differentiation of experts.
The technical scheme of the invention is as follows: the system for assisting decision-making of exogenous febrile disease based on real world data comprises:
the system module comprises a quit system submodule and a user information maintenance submodule, wherein the user information maintenance submodule is used for managing user information of the system, and comprises user information registration, user information modification, user information deletion and user information inquiry; the quitting system submodule executes and directly quits the whole system;
the database maintains each characteristic information related to the case retrieval process by using a grouping information maintenance principle, and each grouping information in the module realizes the addition, modification, deletion and query of information;
the module adopts an artificial intelligent natural language (AI) processing model, learns based on case linguistic data in the case library, and converts cases in the case library into case codes for characteristic input in a case retrieval algorithm; meanwhile, calculating to obtain the objective weight of each feature in the case base based on the generated case codes;
the decision reasoning module retrieves K original cases which are most similar to the target case from the case base based on the input target case, wherein subjective weight correction and a similarity algorithm are involved;
a help module that provides profile information about the system.
The invention takes the national lung exogenous febrile disease clinical scientific research data as a research object, adopts an artificial intelligent case reasoning technology to carry out traditional Chinese medicine auxiliary decision making technical research, adopts an expert consulting weight method to classify the clinical manifestations of the lung exogenous febrile disease and determine the symptom similarity weight, constructs a lung exogenous febrile disease case cloud database, adopts a mixed algorithm suitable for a traditional Chinese medicine mixed data set to establish a lung exogenous febrile disease traditional Chinese medicine case reasoning model, can provide accurate auxiliary data for traditional Chinese medicine decision making, establishes the lung exogenous febrile disease traditional Chinese medicine case reasoning model, and simulates the thinking process of expert diagnosis and syndrome differentiation.
Drawings
Fig. 1 is a structural diagram of an exogenous febrile disease assistant decision system based on real world data according to the present invention.
Fig. 2 is a flow chart of the implementation of the real world data-based exogenous febrile disease aid decision system according to the present invention.
Figure 3 shows the Word2vec network basic structure.
Fig. 4 shows the CBOW structure of a Word2vec network.
Fig. 5 shows a similar case retrieval process.
Detailed Description
As shown in fig. 1, the system for assisting decision-making of exogenous febrile disease based on real world data comprises:
the system module comprises a quit system submodule and a user information maintenance submodule, wherein the user information maintenance submodule is used for managing user information of the system, and comprises user information registration, user information modification, user information deletion and user information inquiry; the quitting system submodule executes and directly quits the whole system;
the database maintains each characteristic information related to the case retrieval process by using a grouping information maintenance principle, and each grouping information in the module realizes the addition, modification, deletion and query of information;
the module adopts an artificial intelligent natural language (AI) processing model, learns based on case linguistic data in the case library, and converts cases in the case library into case codes for characteristic input in a case retrieval algorithm; meanwhile, calculating to obtain the objective weight of each feature in the case base based on the generated case codes;
the decision reasoning module retrieves K original cases which are most similar to the target case from the case base based on the input target case, wherein subjective weight correction and a similarity algorithm are involved;
a help module that provides profile information about the system.
The invention takes the national lung exogenous febrile disease clinical scientific research data as a research object, adopts an artificial intelligent case reasoning technology to carry out traditional Chinese medicine auxiliary decision making technical research, adopts an expert consulting weight method to classify the clinical manifestations of the lung exogenous febrile disease and determine the symptom similarity weight, constructs a lung exogenous febrile disease case cloud database, adopts a mixed algorithm suitable for a traditional Chinese medicine mixed data set to establish a lung exogenous febrile disease traditional Chinese medicine case reasoning model, can provide accurate auxiliary data for traditional Chinese medicine decision making, establishes the lung exogenous febrile disease traditional Chinese medicine case reasoning model, and simulates the thinking process of expert diagnosis and syndrome differentiation.
Preferably, the decision reasoning module comprises:
preparing a source case, namely firstly converting and cleaning data content in an original Excel file from the outside and storing the data content in a case base to form the source case, then coding the source case by using a source case learning word vector model in the case base to construct a coding base, and simultaneously performing objective weight analysis on the coding base again by adopting an entropy weight method to obtain characteristic weights of the source case for case similarity calculation in a target case retrieval process;
searching a target case, namely firstly inputting the target case to be searched by a user and carrying out subjective correction on the weight, then coding the target case by adopting a word vector model learned in a source case preparation process, carrying out similarity calculation by using the coded target case and case characteristics in a coding library to obtain the most similar K case numbers, and then obtaining the most similar source case from the case library for case diagnosis according to the K case numbers; if the retrieved source case can be matched with the target case, case reuse is completed and is directly used for case diagnosis and treatment, otherwise, case correction is carried out and then case reuse is carried out; and finally, storing the reused cases into the case library in the form of new cases.
Preferably, in the coding library, a Word2Vec model is adopted for case coding;
the Word2Vec model is divided into two parts, wherein the first part is used for establishing the model, and the second part is used for obtaining the embedded Word vector through the model; the whole modeling process of Word2Vec firstly constructs a neural network based on training data, when the model is trained, parameters learned by the model through the training data are hidden layer weight matrixes, and then Word vectors are calculated based on the weight matrixes;
word2vec uses a single hidden layer, neurons in the hidden layer are all linear neurons, an input layer is set to have as many neurons as words in a vocabulary used for training, the size of the hidden layer is set to be the dimension of a generated Word vector, and the size of an output layer is the same as that of the input layer;
assuming that the vocabulary for learning word vectors consists of V words and N is the dimension of the word vector, the input to the hidden layer connection is represented by a matrix WI of size V × N, where each row represents a vocabulary word; in the same way, the connection size from the hidden layer to the output layer is described by the matrix WO to be N × V.
Preferably, the continuous bag-of-words model modeling method of the Word2Vec model executes the following steps:
(1) calculating the output of the hidden layer h:
Figure BDA0003253229770000081
wherein x isiFor each Word, a code corresponding to the input layer of the Word2Vec model, WITA weight matrix between a Word2Vec model input layer and a hidden layer is obtained, C is the number of nodes of the hidden layer of the Word2Vec model, and h is the output vector of the hidden layer of the Word2Vec model;
(2) calculate the input at each node of the output layer:
u=WOT·h
wherein, WO is a weight matrix between a Word2Vec model hidden layer and an output layer, h is a Word2Vec model hidden layer output vector, and u is an input vector of the Word2Vec model output layer;
(3) and calculating the output of the output layer, wherein Softmax is used as an activation function, Softmax is a nonlinear activation function and is used for calculating the output vector of the background Word in the Word2Vec model, and then the output vector is compared with the target Word vector to achieve model learning.
Preferably, the coding library objectively weights the cases by using an entropy weight method,
assume index set X ═ X1,X2,…,XmCalculating each index X by using an entropy weight methodiThe process comprises the following steps:
(I) index XjLower value xijAnd (3) standardization:
Figure BDA0003253229770000091
wherein x isijIs an index XjValue of (ii), min (X)j) Is an index XjMinimum value of (1), max (X)j) Index XjMaximum value of (2). y isijIs xijNormalized values;
(II) index XjLower value xijNormalized yijSpecific gravity p ofij
Figure BDA0003253229770000092
Wherein n is an index XjThe number of median values, or called number of case rules;
(III) index XjEntropy value E (X)j):
Figure BDA0003253229770000093
Wherein ln (·) is a logarithmic function.
(IV) index XjWeight W (X) ofj):
Figure BDA0003253229770000094
Wherein, E (X)j) Is an index XjM is the number of indexes.
Preferably, in the decision inference module,
in the similarity matching process, in order to match the K source cases most similar to the target case T from the source case set U of the case base according to the target case T, the matching algorithm adopts the euclidean distance Dist (U, T) to measure the similarity between the cases, the closer the distance is, the more similar the similarity is, the higher the similarity Sim (U, T) is, and the similarity is calculated as follows:
Sim(U,T)=1-Dist(U,T)
the Dist (U, T) calculation form is as follows:
Figure BDA0003253229770000095
the source case set U is composed of a plurality of cases, the target case T and the source case have the same characteristic indexes, and the complete scanning of the source case base is performed once in each retrieval process, so that the global TopK of the obtained TopK similar cases is ensured.
Preferably, if the case base is large and time consumption of each scanning is long, the retrieval algorithm is improved by increasing the similarity parameter, a retrieval similarity value is set, and if the number of cases larger than the similarity value reaches K in the retrieval process, the case base can be scanned.
The present invention is described in more detail below.
The system is a comprehensive AI decision-making system integrating a natural language coding algorithm, an objective weight analysis algorithm and a case retrieval algorithm, case feature coding and retrieval are all realized by adopting an AI model, and the automatic intellectualization of the source case preparation and target case retrieval processes is realized, as shown in figure 2.
The algorithm execution of the decision system is divided into two flows: the method comprises the steps of source case preparation, target case retrieval and mutually independent execution of two processes.
The method comprises the steps of preparing a source case, firstly, converting and cleaning data content in an original Excel file from the outside, storing the data content in a case base to form the source case, then, coding the source case by using a source case learning word vector model in the case base to construct a coding base, and meanwhile, carrying out objective weight analysis on the coding base again by adopting an entropy weight method to obtain characteristic weights of the source case for case similarity calculation in a target case retrieval process.
And target case retrieval, namely firstly, inputting a target case to be retrieved by a user and subjectively correcting the weight, then, coding the target case by adopting a word vector model learned in a source case preparation process, carrying out similarity calculation by utilizing the coded target case and case characteristics in a coding library to obtain the most similar K case serial numbers, and then obtaining the most similar source case from a case library for case diagnosis according to the K case serial numbers. If the retrieved source case can be matched with the target case, the case reuse (directly used for case diagnosis and treatment) can be completed, otherwise, the case correction is needed, and then the case reuse is performed. And finally, storing the reused cases into a case library in the form of new cases (case learning).
In the case of traditional Chinese medicine diagnosis and treatment, no matter the case symptoms or the diagnosis and treatment prescriptions are described by natural language (hereinafter referred to as text), but the text cannot be directly subjected to distance calculation in case retrieval, and the text needs to be converted into numerical values to participate in calculation.
Methods for converting texts into numerical values are many, and are usually OneHot Encoding, Order Encoding and the like, but all of the methods have essential defects. The Encoding length of each word by the OneHot Encoding is limited by the size of the corpus dictionary entry, i.e. the larger the corpus dictionary entry, the larger the Encoding length of each word. If the dimension of the word vector is too large (too sparse), a word needs to be represented by a large number of other words, which causes great expense for operation and cannot consider the context relationship. Order Encoding hides the precedence ordering and distance measures of words, while in fact most words do not have precedence and distances between them, which introduces unnecessary ambiguity.
Therefore, there is a need to find a better representation method, which needs to satisfy the following two requirements: firstly, carrying context information; the second is that the representation of the word is dense. It is proved that the two requirements can be met by performing text-to-digital conversion (word vectorization) through neural network modeling. This is the reason for introducing Word vector transformation model Word2Vec into the system, and both the case coding module and the target case coding module in fig. 2 adopt the Word2Vec model.
The Word2Vec model is actually divided into two parts, the first part is used for establishing the model, and the second part is used for obtaining the embedded Word vector through the model. The whole modeling process of Word2Vec is actually similar to the idea of a self-encoder (auto encoder), namely, a neural network is constructed based on training data, after the model is trained, a new task is not processed by the trained model, and what is really needed is that the model calculates a Word vector based on a parameter learned by the training data, namely a hidden layer weight matrix.
Word2vec uses a single hidden layer, a fully connected neural network as shown in fig. 3, and the neurons in the hidden layer are all linear neurons. The input layer is arranged to have as many neurons as there are words in the vocabulary used for training. The hidden layer size is set to the dimensions of the generated word vector. The output layer is the same size as the input layer.
Assuming that the vocabulary for learning word vectors consists of V words and N is the dimension of the word vector (each word has N features), the input to the hidden layer connection can be represented by a matrix WI of size V × N, where each row represents a vocabulary word. In the same way, the connection size from the hidden layer to the output layer can be described by the matrix WO to be N × V.
There are generally two modeling methods for the Word2Vec model, namely CBOW and Skip-Gram. The CBOW modeling method is only briefly described here. CBOW is called Continuous Bag of Words (Continuous Bag of Words model). The essence is to predict Target Word (Target Word) by Context Word, and CBOW is to predict Target Word according to background Word.
For example, the corpus "nervous system and neuromuscular disease", Target Word "disease" can be represented by a plurality of Context words, and "nervous system" and "neuromuscular" can be used as the Context Word of "disease", and it is necessary to modify the neural network structure shown in fig. 3 to that shown in fig. 4, where the modification includes copying the input layer C times (the size of C is the size of window, and C is 2), and adding the operation of dividing by C to the neurons in the hidden layer. Corresponding to a Target Word, C times of neural network is trained.
The specific algorithm is as follows:
first, the output of the hidden layer h is calculated as follows:
Figure BDA0003253229770000121
second, the inputs to each node in the output layer are computed. The following were used:
uj=WOT·h
among them, WOjIs column j of the matrix WO.
Finally, the output of the output layer is calculated, i.e. Softmax is used as the activation function.
Example demonstration
Assuming that the hidden layer of fig. 4 has 2 neurons, the linguistic data "nervous system and neuromuscular disease" is participled to construct a dictionary [ nervous system, neuromuscular, disease ], then:
(1) the dictionary code is:
nervous system → x1=[0,0,1]TNeuromuscular → x2=[0,1,0]TDisease → x3=[1,0,0]T
(2) WI and WO would be 3 x 2 and 2 x 3 matrices respectively. Random initialization is not:
Figure BDA0003253229770000122
Figure BDA0003253229770000123
(3) "nervous system" and "neuromuscular" are used as Context Word for "disease", and C in the formula is 2.
The calculation process is as follows:
first, the hidden output is calculated:
Figure BDA0003253229770000131
second, calculate the inputs at each node of the output layer:
u=WOT·h=[-0.95765253,-0.34506633,1.11377731]
thirdly, calculating a probability value of u by utilizing SoftMax:
Figure BDA0003253229770000132
finally, obtaining a dictionary vector:
x′=argmax(P(u))=[0,0,1]
since the model is only the first round of learning, x' ≠ x3It is clear that multiple rounds of learning by updating the weights WI and WO by gradient descent are also required until x' ═ x3
The objective weighting algorithm is a means for carrying out weight analysis on data indexes, the weight is determined by a certain mathematical method according to the relation between original data, the judgment result does not depend on the subjective judgment of people, and the objective weighting algorithm has a stronger mathematical theoretical basis. The system adopts an entropy weight method to realize objective weighting on the cases. The entropy weight method will be described in detail below.
The basic idea of the entropy weight method is to determine objective weights according to the size of index variability. Generally, if the information entropy E of a certain index is EjThe smaller the index value, the larger the degree of variation, the more information is provided, the greater the effect of the overall evaluation, and the greater the weight. Conversely, the larger the information entropy of a certain index is, the smaller the degree of variation of the index value is, the smaller the amount of information provided is, the smaller the role played in the comprehensive evaluation is, and the smaller the weight thereof is.
Assume index set X ═ X1,X2,…,XmAs shown in Table 1, each index X is calculated by the entropy weight methodiThe process of (2) is as follows:
TABLE 1
ID X1 X2 Xm
u1 x11 x12 x1m
u2 x21 x22 x2m
un xn1 xn2 xnm
(1) Index XjLower value xijAnd (3) standardization:
Figure BDA0003253229770000141
(2) index XjLower value xijNormalized yijThe specific gravity of (A):
Figure BDA0003253229770000142
(3) index XjEntropy of (2):
Figure BDA0003253229770000143
(4) index XjThe weight of (c):
Figure BDA0003253229770000144
the case base scale of the ADSFD system is larger, only 10 cases and the data scale (shown in a table 2) of 6 characteristic indexes of past history, allergy history, smoking history, whether children and women are born, marital status and family genetic history are selected for carrying out the calculation process of the case demonstration entropy weight method.
TABLE 2
Patient ID History of the past History of allergies History of smoking Whether or not to breed a child Marital status Family genetic history
P1 0.227831 0.227831 0.227831 -1.91791 -0.00236164 -0.0429749
P2 0.227831 0.227831 -1.91791 0.227831 -0.0744947 0.227831
P3 1.1134 0.227831 0.227831 -1.91791 -0.00236164 -1.81933
P4 1.1134 0.227831 0.227831 0.227831 -0.00236164 0.227831
P5 0.227831 0.227831 0.227831 -1.91791 -0.0744947 0.227831
P6 0.227831 0.227831 0.227831 -1.91791 -0.0744947 -1.81933
P7 0.227831 0.227831 -1.91791 0.227831 -0.00236164 0.227831
P8 0.227831 0.227831 -1.91791 -1.91791 -0.00236164 0.227831
P9 1.1134 0.227831 0.227831 -1.91791 -0.0744947 0.227831
P10 1.1134 0.227831 0.227831 -1.91791 -0.00236164 -1.81933
(1) And (3) index standardization:
TABLE 3
Patient ID History of the past History of allergies History of smoking Whether or not to breed a child Marital status Family genetic history
P1 0.002 0.002 0.998 0.002 0.998 0.866
P2 0.002 0.998 0.002 0.998 0.002 0.998
P3 0.998 0.002 0.998 0.002 0.998 0.002
P4 0.998 0.998 0.998 0.998 0.998 0.998
P5 0.002 0.002 0.998 0.002 0.002 0.998
P6 0.002 0.998 0.998 0.002 0.002 0.002
P7 0.002 0.002 0.002 0.998 0.998 0.998
P8 0.002 0.998 0.002 0.002 0.998 0.998
P9 0.998 0.002 0.998 0.002 0.002 0.998
P10 0.998 0.998 0.998 0.002 0.998 0.002
(2) Index proportion:
TABLE 4
Patient ID History of the past History of allergies History of smoking Whether or not to breed a child Marital status Family genetic history
P1 0.00050 0.00040 0.14273 0.00066 0.16644 0.12627
P2 0.00050 0.19960 0.00029 0.33178 0.00033 0.14548
P3 0.24925 0.00040 0.14273 0.00066 0.16644 0.00029
P4 0.24925 0.19960 0.14273 0.33178 0.16644 0.14548
P5 0.00050 0.00040 0.14273 0.00066 0.00033 0.14548
P6 0.00050 0.19960 0.14273 0.00066 0.00033 0.00029
P7 0.00050 0.00040 0.00029 0.33178 0.16644 0.14548
P8 0.00050 0.19960 0.00029 0.00066 0.16644 0.14548
P9 0.24925 0.00040 0.14273 0.00066 0.00033 0.14548
P10 0.24925 0.19960 0.14273 0.00066 0.16644 0.00029
(3) Index entropy value
TABLE 5
History of the past History of allergies History of smoking Whether or not to breed a child Marital status Family genetic history
Entropy of information 0.61145 0.705236 0.847786 0.491705 0.782331 0.847333
(4) Index weight
TABLE 6
History of the past History of allergies History of smoking Whether or not to breed a child Marital status Family genetic history
Weight of 0.226671 0.171959 0.088798 0.296527 0.126983 0.089062
Case retrieval, namely selecting K cases most similar to the target case from the case library. Because the case coding adopts a natural language word vector coding method, the method can automatically shield dimensional difference between case characteristics, and therefore, the dissimilarity processing of dimensional characteristic indexes and dimensionless characteristic indexes is not required to be considered in case base retrieval.
And a similarity matching process, namely matching K source cases (TopK similar cases) most similar to the T from the source case set U of the case base according to the target case T. The matching algorithm uses the Euclidean distance Dist (U, T) to measure the similarity between cases, wherein the closer the distance is, the more similar the similarity is, i.e. the higher the similarity Sim (U, T) is, the similarity is calculated as follows:
Sim(U,T)=1-Dist(U,T)
here, Dist (U, T) is calculated in the form:
Figure BDA0003253229770000161
the TopK similar case search process is described in FIG. 5. In fig. 5, the source case set U is composed of a plurality of cases, the target case T and the source case have the same characteristic indexes, and a complete scan of the source case library is required in each retrieval process, so that the most global TopK of the obtained TopK-like cases can be ensured.
If the case base is large and time consumption of each scanning is long, a retrieval algorithm can be improved by increasing a similarity parameter, namely a retrieval similarity value is set, and if the number of cases larger than the similarity value reaches K in the retrieval process, the case base can be scanned.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (7)

1. Real world data-based exogenous febrile disease assistant decision-making system is characterized in that: it includes:
the system module comprises a quit system submodule and a user information maintenance submodule, wherein the user information maintenance submodule is used for managing user information of the system, and comprises user information registration, user information modification, user information deletion and user information inquiry; the quitting system submodule executes and directly quits the whole system;
the database maintains each characteristic information related to the case retrieval process by using a grouping information maintenance principle, and each grouping information in the module realizes the addition, modification, deletion and query of information;
the system comprises an encoding base, a case retrieval algorithm and a case retrieval module, wherein the encoding base generates case encoding information, the module adopts an artificial intelligent natural language (AI) processing model, learns based on case linguistic data in the case base and converts cases in the case base into case codes for characteristic input in the case retrieval algorithm; meanwhile, calculating to obtain the objective weight of each feature in the case base based on the generated case codes;
the decision reasoning module retrieves K original cases which are most similar to the target case from the case base based on the input target case, wherein subjective weight correction and a similarity algorithm are involved;
a help module that provides profile information about the system.
2. The real world data-based aid decision system for exogenous febrile disease according to claim 1, wherein: the decision reasoning module comprises:
preparing a source case, namely firstly converting and cleaning data content in an original Excel file from the outside and storing the data content in a case base to form the source case, then coding the source case by using a source case learning word vector model in the case base to construct a coding base, and simultaneously performing objective weight analysis on the coding base again by adopting an entropy weight method to obtain characteristic weights of the source case for case similarity calculation in a target case retrieval process;
target case retrieval, namely firstly, inputting a target case to be retrieved by a user and subjectively correcting the weight, then, coding the target case by adopting a word vector model learned in a source case preparation process, carrying out similarity calculation by utilizing the coded target case and case characteristics in a coding library to obtain the most similar K case serial numbers, and then obtaining the most similar source case from a case library for case diagnosis according to the K case serial numbers; if the retrieved source case can be matched with the target case, case reuse is completed and is directly used for case diagnosis and treatment, otherwise, case correction is carried out and then case reuse is carried out; and finally, storing the reused cases into the case library in the form of new cases.
3. The real world data-based aid decision system for exogenous febrile disease according to claim 2, wherein: in the coding library, a Word2Vec model is adopted for case coding;
the Word2Vec model is divided into two parts, wherein the first part is used for establishing the model, and the second part is used for obtaining the embedded Word vector through the model; the whole modeling process of Word2Vec firstly constructs a neural network based on training data, when the model is trained, the learned parameters of the model through the training data are hidden layer weight matrixes, and then Word vectors are calculated based on the weight matrixes;
word2vec uses a single hidden layer, neurons in the hidden layer are all linear neurons, an input layer is set to have as many neurons as words in a vocabulary used for training, the size of the hidden layer is set to be the dimension of a generated Word vector, and the size of an output layer is the same as that of the input layer;
assuming that the vocabulary for learning word vectors consists of V words and N is the dimension of the word vector, the input to the hidden layer connection is represented by a matrix WI of size V × N, where each row represents a vocabulary word; in the same way, the connection size from the hidden layer to the output layer is described by the matrix WO to be N × V.
4. The real world data-based aid decision system for exogenous febrile disease according to claim 3, wherein: the continuous bag-of-words model modeling method of the Word2Vec model comprises the following steps:
(1) calculating the output of the hidden layer h:
Figure FDA0003253229760000031
wherein x isiFor each Word, a code corresponding to the input layer of the Word2Vec model, WITA weight matrix between a Word2Vec model input layer and a hidden layer is obtained, C is the number of nodes of the hidden layer of the Word2Vec model, and h is the output vector of the hidden layer of the Word2Vec model;
(2) calculate the input at each node of the output layer:
u=WOT·h
wherein, WO is a weight matrix between a Word2Vec model hidden layer and an output layer, h is a Word2Vec model hidden layer output vector, and u is an input vector of the Word2Vec model output layer;
(3) and calculating the output of the output layer, wherein Softmax is used as an activation function, and Softmax is a nonlinear activation function and is used for calculating the output vector of the background Word in the Word2Vec model, and then comparing the output vector with the target Word vector to achieve model learning.
5. The real world data-based aid decision system for exogenous febrile disease according to claim 4, wherein: the coding library objectively weights the cases by adopting an entropy weight method, and assumes an index set X ═ X1,X2,…,XmCalculating each index X by using an entropy weight methodiThe process comprises the following steps:
(I) index XjLower value xijAnd (3) standardization:
Figure FDA0003253229760000032
wherein x isijIs an index XjValue of (ii), min (X)j) Is an index XjMinimum value of (1), max (X)j) Index XjMaximum value of (1), yijIs xijNormalized values;
(II) index XjLower value xijNormalized yijSpecific gravity p ofij
Figure FDA0003253229760000041
Wherein n is an index XjThe number of median values, or called number of case rules;
(III) index XjEntropy value E (X)j):
Figure FDA0003253229760000042
Wherein ln (·) is a logarithmic function;
(IV) index XjWeight W (X) ofj):
Figure FDA0003253229760000043
Wherein, E (X)j) Is an index XjM is the number of indexes.
6. The real world data-based aid decision system for exogenous febrile disease according to claim 5, wherein: in the decision-making inference module,
in the similarity matching process, in order to match the K source cases most similar to the target case T from the source case set U of the case base according to the target case T, the matching algorithm adopts the euclidean distance Dist (U, T) to measure the similarity between the cases, the closer the distance is, the more similar the similarity is, the higher the similarity Sim (U, T) is, and the similarity is calculated as follows:
Sim(U,T)=1-Dist(U,T)
the Dist (U, T) calculation form is as follows:
Figure FDA0003253229760000044
the source case set U is composed of a plurality of cases, the target case T and the source case have the same characteristic indexes, and the complete scanning of the source case base is performed once in each retrieval process, so that the global TopK of the obtained TopK similar cases is ensured.
7. The real world data-based aid decision system for exogenous febrile disease according to claim 6, wherein: if the case base is large and time consumption is long in each scanning process, a retrieval algorithm is improved by increasing similarity parameters, a retrieval similarity value is set, and if the number of cases larger than the similarity value reaches K in the retrieval process, the case base can be scanned.
CN202111051801.7A 2021-09-08 2021-09-08 Real world data-based exogenous febrile disease assistant decision-making system Pending CN114220535A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111051801.7A CN114220535A (en) 2021-09-08 2021-09-08 Real world data-based exogenous febrile disease assistant decision-making system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111051801.7A CN114220535A (en) 2021-09-08 2021-09-08 Real world data-based exogenous febrile disease assistant decision-making system

Publications (1)

Publication Number Publication Date
CN114220535A true CN114220535A (en) 2022-03-22

Family

ID=80695920

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111051801.7A Pending CN114220535A (en) 2021-09-08 2021-09-08 Real world data-based exogenous febrile disease assistant decision-making system

Country Status (1)

Country Link
CN (1) CN114220535A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350288A (en) * 2023-12-01 2024-01-05 浙商银行股份有限公司 Case matching-based network security operation auxiliary decision-making method, system and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350288A (en) * 2023-12-01 2024-01-05 浙商银行股份有限公司 Case matching-based network security operation auxiliary decision-making method, system and device
CN117350288B (en) * 2023-12-01 2024-05-03 浙商银行股份有限公司 Case matching-based network security operation auxiliary decision-making method, system and device

Similar Documents

Publication Publication Date Title
CN110444259A (en) Traditional Chinese medical electronic case history entity relationship extracting method based on entity relationship mark strategy
CN107369098B (en) Method and device for processing data in social network
CN109949929A (en) A kind of assistant diagnosis system based on the extensive case history of deep learning
CN113990520A (en) Traditional Chinese medicine prescription generation method based on controllable generation countermeasure network
Cho et al. Adversarial tableqa: Attention supervision for question answering on tables
CN116910172B (en) Follow-up table generation method and system based on artificial intelligence
Sapna et al. Implementation of genetic algorithm in predicting diabetes
CN112420191A (en) Traditional Chinese medicine auxiliary decision making system and method
CN115050481B (en) Traditional Chinese medicine prescription efficacy prediction method based on graph convolution neural network
Wen et al. Cross domains adversarial learning for Chinese named entity recognition for online medical consultation
CN114822874A (en) Prescription efficacy classification method based on characteristic deviation alignment
Liu et al. Deep neural network-based recognition of entities in Chinese online medical inquiry texts
CN110299194B (en) Similar case recommendation method based on comprehensive feature representation and improved wide-depth model
CN114220535A (en) Real world data-based exogenous febrile disease assistant decision-making system
Feng et al. A Chinese question answering system in medical domain
CN108460132A (en) Chinese medicine attributive character coding based on theories of Chinese materia medica and searching system
Lu et al. Chinese clinical named entity recognition with word-level information incorporating dictionaries
Fang et al. Semantic sequential query expansion for biomedical article search
CN116682536A (en) Traditional Chinese medicine prescription recommendation method based on improved Bert4Rec
Jiang et al. Bakgrastec: A background knowledge graph based method for short text classification
Tang et al. A protein-protein interaction extraction approach based on large pre-trained language model and adversarial training
CN112287665B (en) Chronic disease data analysis method and system based on natural language processing and integrated training
CN115631851A (en) Prescription recommendation method and device, electronic equipment and nonvolatile storage medium
Yuan et al. Application of Graph Convolutional Network in the Construction of Knowledge Graph for Higher Mathematics Teaching.
Zhao et al. A Dynamic Optimization-Based Ensemble Learning Method for Traditional Chinese Medicine Named Entity Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication