CN113792879A - Case reasoning attribute weight adjusting method based on introspection learning - Google Patents
Case reasoning attribute weight adjusting method based on introspection learning Download PDFInfo
- Publication number
- CN113792879A CN113792879A CN202111166411.4A CN202111166411A CN113792879A CN 113792879 A CN113792879 A CN 113792879A CN 202111166411 A CN202111166411 A CN 202111166411A CN 113792879 A CN113792879 A CN 113792879A
- Authority
- CN
- China
- Prior art keywords
- case
- attribute
- cases
- weight
- similar
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims description 46
- 208000024172 Cardiovascular disease Diseases 0.000 claims abstract description 15
- 238000010606 normalization Methods 0.000 claims abstract description 4
- 238000012549 training Methods 0.000 claims description 25
- 230000008569 process Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 7
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000003745 diagnosis Methods 0.000 abstract description 9
- 238000004364 calculation method Methods 0.000 abstract description 4
- 238000005457 optimization Methods 0.000 abstract description 4
- 238000012360 testing method Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 230000002068 genetic effect Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 4
- 230000002526 effect on cardiovascular system Effects 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000013515 script Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 239000013585 weight reducing agent Substances 0.000 description 2
- 230000009084 cardiovascular function Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000007427 paired t-test Methods 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/045—Explanation of inference; Explainable artificial intelligence [XAI]; Interpretable artificial intelligence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Public Health (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an attribute weight adjustment application program for case reasoning and a cardiovascular disease diagnosis device, wherein the application program comprises a similar case retrieval module, an attribute weight updating module, a normalization module and an optimal weight calculation module. The invention carries out global optimization adjustment on the weight by using the principle of introspection learning, can carry out iterative learning on the weight along with the updating of the case base, and achieves the aim of improving the performance of the CBR system.
Description
Technical Field
The invention relates to the field of artificial intelligence and the technical field of medical auxiliary systems, in particular to a case reasoning attribute weight adjusting method based on introspection learning.
Background
The pulse wave contains rich cardiovascular physiological and pathological information, the waveform characteristics of the pulse wave such as form, intensity, speed, rhythm and other information are closely related to the cardiovascular state, and the cardiovascular function is usually analyzed by using pulse wave detection equipment clinically. The pulse wave detection equipment can obtain a large amount of waveform characteristic information, and the waveform data has practical significance for providing auxiliary decision support for diagnosis of cardiovascular diseases.
Case reasoning (CBR) is a new problem solving and machine learning method in the field of artificial intelligence, and the basic idea is to use the past experience cases (or called source cases) for solving similar problems to carry out reasoning and solve new problems (or called target cases). When the cardiovascular state represented by the pulse wave data is diagnosed by case reasoning, a historical case library can be established by using the previously detected pulse wave and the corresponding diagnosis result, when a new waveform to be diagnosed exists, the historical records similar to the waveform parameter of the currently-diagnosed pulse wave are searched from the historical library, and then the diagnosis result of a suggestion is provided for the current waveform according to the diagnosis result in the similar historical records, wherein the suggestion can provide an auxiliary decision support for the daily detection of people or the clinical examination of doctors.
In the existing technology for applying CBR to a medical auxiliary system, for example, a case reasoning depression recognition system based on electroencephalogram characteristics disclosed in Chinese patent CN110974260A, the success or failure of case retrieval usually directly relates to the performance of the whole system. The goal of CBR retrieval is to quickly and efficiently find the fewest possible cases from the case base that are most similar to the problem description. The retrieval strategy mainly comprises a knowledge guidance strategy, a template retrieval strategy, a KNN retrieval strategy and the like. The KNN strategy which takes the similarity as a retrieval principle is widely concerned, but the strategy is sensitive to noise or irrelevant data. The solution to this problem typically relies on assigning different weights to the case attributes. The method for determining the weight mainly comprises a subjective method and an objective method, and the commonly used weighting method for subjective analysis comprises the following steps: expert consulting methods, survey statistics methods, non-difference compromise methods, correlation analysis methods and the like, and also includes analytic hierarchy methods which are widely applied. Since these conventional subjective analysis methods depend too much on subjective judgment and experience, the accuracy of similar case retrieval is affected. Thus, objective analysis methods such as genetic algorithms, entropy, etc. are proposed in succession, but these methods are not adjusted once the weights are determined, even if the case base is constantly updated, and are actually a negative way of learning. Therefore, it is necessary to search for a dynamic weight adjustment method from a new viewpoint and to provide positive learning ability.
Disclosure of Invention
In order to endow the medical auxiliary system applying case reasoning with positive learning ability and enable the medical auxiliary system to solve the technical problem of dynamic adjustment of the weight, the technical scheme provided by the invention comprises the following three aspects:
in a first aspect, an attribute weight adjustment application for case reasoning is provided, which includes the following modules:
a retrieve similar cases module configured to:
acquiring a training set B of cardiovascular disease cases, traversing all N target cases in the B,
for each target case, an iteration is performed comprising the following steps:
retrieving K cases which are most similar to the target case from a history set A of the cardiovascular disease cases;
if the most similar cases have correctly classified cases, updating the attribute weight of the correctly classified cases;
an update attribute weight module configured to:
increasing the weight of the attributes of the correctly classified cases matching the target case;
reducing the weight of the attributes that the correctly classified case does not match the target case; a normalization module configured to:
and calculating to obtain the normalized attribute weight according to the formula I:
wherein, ω ″)i(t) represents the weight, ω 'of the ith term attribute after the tth iteration and normalized'i(t) represents the weight of the ith attribute after the t-th iterationI is 1,2, …, m, t is 1,2, …, N, m represents the total number of attributes, N represents the number of cases in B, i.e., the number of iterations;
a calculate optimal weight module configured to:
performing case reasoning classification on the training set B based on the m attribute weights after the iteration for the t time, and calculating the accuracy of the case reasoning classification;
and recording the m attribute weights corresponding to the maximum classification accuracy as the optimal attribute weights of case reasoning based on introspection learning.
In a second aspect, a cardiovascular disease diagnosis apparatus is presented, which is loaded with a case-inference attribute weight adjustment application.
In a third aspect, a computer readable storage medium is presented, which stores all computer programs/instructions and related data of an attribute weight adjustment application for case reasoning.
The invention carries out global optimization adjustment on the weight by using the principle of introspection learning, can carry out iterative learning on the weight along with the updating of the case base, and achieves the purpose of improving the performance of the CBR system.
Drawings
FIG. 1 is a block diagram of an attribute weight adjustment application for case-based reasoning;
fig. 2, a flow chart of the operation of some cardiovascular disease diagnosis devices.
Detailed Description
Some embodiments relate to an attribute weight adjustment application for case reasoning, comprising the main modules shown in fig. 1:
a retrieve similar cases module configured to:
acquiring a training set B of the cardiovascular disease cases, traversing all N target cases in the B, and performing iteration comprising the following steps for each target case:
retrieving K cases which are most similar to the target case from a history set A of the cardiovascular disease cases;
if the most similar cases have correctly classified cases, updating the attribute weight of the correctly classified cases;
an update attribute weight module configured to:
increasing the weight of the attribute matched by the correctly classified case and the target case;
reducing the weight of the attributes of the correctly classified cases that do not match the target cases;
a normalization module configured to:
and calculating to obtain the normalized attribute weight according to the formula I:
wherein, ω ″)i(t) represents the weight, ω 'of the ith term attribute after the t iteration and normalized'i(t) represents the weight of the ith attribute after the tth iteration, i is 1,2, …, m; t ═ 1,2, …, N; m represents the total number of attributes, and N represents the number of cases in B, namely the number of iterations; a calculate optimal weight module configured to:
performing case reasoning classification on the training set B based on the m attribute weights after the t-th iteration, and calculating the accuracy of the case reasoning classification;
and recording the m attribute weights corresponding to the maximum classification accuracy as the optimal attribute weights of case reasoning based on introspection learning.
It should be noted that all cases in the training set B can be regarded as target cases, so the embodiment includes: the training set B has N cases, and the N cases are taken as target cases. At each iteration update weight: firstly, a case (regarded as a target case) in the training set B is taken, and K most similar cases are respectively retrieved from the history set A. And then comparing the similar cases with the target case, and if the similar cases are the same as the target case in category, increasing the weight of the attributes matched with the target case in the similar cases and reducing the weight of the attributes not matched with the target case in the similar cases.
The method for calculating the accuracy (ACCtr) of case reasoning classification comprises the following steps: after each iteration, a set of attribute weights is obtained, and case reasoning classification can be carried out on each case in the B according to the set of weights. If the classification category obtained by inference is the same as the real category of the case in the B, the classification is correct, otherwise, the classification is incorrect; the accuracy (ACCtr) of case reasoning classification is equal to the ratio of the number of correctly classified cases in B to the total number of cases (N).
Wherein, the cardiovascular disease case comprises the attributes in the table 1 by collecting and representing the attributes and the diagnosis result as a binary form as follows:
Ck=(Xk;Yk),k=1,2,…,p (1)
wherein p is the total number of cases in the case database; x is the problem description for each case; y is the category of the case. X and Y may be represented as:
Xk=(x1,k,x2,k,...,xi,k,...,xm,k) (2)
Yk=yk (3)
wherein x isi,k(i 1.. m) represents the description value of the ith attribute of the kth case, and m is the number of case attributes; is the category of the kth case. The problem description of the new case is expressed as X ═ X (X)1,x2,...,xi,...,xm) The category of the new case is marked as Y, which is the waiting quantity.
TABLE 1 cardiovascular data set Attribute information
All cases in the case database are divided into A, B parts. A is used as a history library in the weight adjustment process, and the attribute weight of the case in A is iteratively updated along with the introspection learning; b is used as a training set, and the cases in B are used for training the attribute weight of the cases in the historical library. The following formula is used for the weight adjustment:
the weight is increased:
ωi'(t)=ωi(t)+Δim and omegai'(t)≥0 (4)
Weight reduction:
ωi'(t)=ωi(t)-Δim and omegai‘(t)≥0 (5)
Wherein, ω isi(t) is the weight of the ith attribute before the tth iteration; omegai' (t) is the process weight after the tth iteration of the ith attribute; deltaiThe amount of change in the weights is determined by/m, m being the number of case attributes, even if Δ is different for different classification problems if the number of attributes is differentiThe weights are also varied by the same amount. However, m is fixed and invariant for the same case base, so the weight change is mainly related to ΔiIn connection with, here,. DELTA.iHas a value interval of [0.01,0.2 ]]And the updated weights satisfying the equations (4) and (5) are greater than zero.
The retrieval process of the application program of some embodiments to retrieve similar case module configurations includes the steps of:
calculating each case X and target case C in AkSimilarity of (S) (X, C)k);
And sequencing the cases in the A according to the similarity, and sequentially selecting K cases with the maximum similarity as the most similar cases.
Note that the similarity S (X, C)k) The method comprises the steps of obtaining weighted measurement of similarity by a plurality of calculation modes, wherein different measurement modes have different calculation formulas. For example, in some more specific embodiments, the similarity S (X, C)k) Comprises the steps of calculating according to formula II:
wherein X represents in ACase (A) ofkRepresenting the target case, S (X, C)k) Presentation case X and target case CkSimilarity of (a) (. omega.)iOptimal attribute weight, x, representing the ith attributeiRepresents the value of the ith attribute, X, for case Xi,kRepresenting target case CkThe value of the ith attribute of (1).
In still more specific embodiments, the similarity calculation process includes the steps of calculating according to formula III:
in some embodiments, the update property weights module is configured to further include the steps of:
setting a threshold value;
if the absolute value of the difference between the ith attribute of the correctly classified case and the ith attribute of the target case does not exceed the threshold value, the ith attribute is a matching attribute;
and if the absolute value of the difference between the ith attribute of the correctly classified case and the ith attribute of the target case exceeds a threshold value, the ith attribute is a mismatch attribute.
For example, two cases C in the case baseI=(XI;YI) And CII=(XII;YII) Wherein X isI=(x1,I,x2,I,…,xi,I,…,xm,I),XII=(x1,II,x2,II,…,xi,II,…,xm,II) If | x is satisfiedi,I,-xi,II| ≦ ξ, then call case CIAnd case CIIIs a match attribute, otherwise referred to as a no match attribute. Where ξ is a threshold for judging whether the attributes match or not, and represents the proximity of the same attribute of two cases.
In some embodiments, the retrieving process of retrieving similar case module configurations further comprises: and retrieving K Nearest Neighbor cases of the target case from A as the most similar cases based on a K-Nearest Neighbor retrieval strategy (kNN).
The application program of some embodiments performs a dimension reduction process on the case data through a hash coding method, and on the basis, retrieves K Approximate Nearest Neighbor cases from a as most similar cases based on an Approximate Nearest Neighbor search strategy (ANN).
The application of some embodiments, the retrieve similar cases module is further configured to: and gradually adjusting the K value by a cross-validation method. It should be noted that the estimation error of learning can be reduced by increasing the K value, but the approximation error of learning increases, and the training instance far from the input instance also acts on the prediction, so that the prediction is wrong, and the complexity of the K value increase model decreases. Thus, the initial assignment of K takes a relatively small value.
Some cardiovascular disease diagnosis devices are loaded with case-inference attribute weight adjustment applications.
A more specific embodiment comprises the following steps:
step 1: a case representation;
representing source cases in a historian as a binary form as follows
Ck=(Xk;Yk),k=1,2,…,p (1)
Where p is the total number of source cases in the history database; is the problem description for each source case; is a category of the case. And may be respectively represented as
Xk=(x1,k,x2,k,...,xi,k,...,xm,k) (2)
Yk=yk (3)
Wherein, the description value of the ith attribute of the kth case is represented, and m is the number of case attributes; is the category of the kth case. The problem description of the new case is expressed as X ═ X (X)1,x2,...,xi,...,xm) The category of the new case is marked as Y, which is the waiting quantity.
Step 2: introspection learning of case attribute weights;
the p cases in the history database are first divided into A, B two parts. A is used as a history library in the weight adjustment process, and the attribute weight of the case in A is iteratively updated along with the introspection learning; b is used as a training set, and the cases in B are used for training the attribute weight of the cases in the historical library. The introspection learning training process for case attribute weights is as follows:
2.1: let t be 0, assign an average weight to each attribute in the history base a, which is the weight of the 0 th iteration of the first attribute;
2.3: calculating the classification accuracy of the training set B under the uniform weight according to a nearest neighbor retrieval strategy, and recording the classification accuracy as the classification accuracy;
2.4: taking the t-th case (recorded as a target case D) in the training set B in sequence, and searching K most similar neighbor cases (recorded as neighbor cases) from the historical library A by using a nearest neighbor search strategy;
2.5: respectively adopting the following weight adjustment algorithm based on a weight update strategy of successful driving to the K adjacent cases to carry out learning update of the attribute weight; the weight update strategy based on successful driving is: when a case is successfully classified, the weight of the matched attribute is increased, and the weight of the unmatched attribute is reduced (abbreviated as GMU + GUD). I.e. when the retrieved neighboring cases have the same category as the target case D (classification success), the weight of the attributes in the case that match case D is increased while the weight of the attributes in the case that do not match case D is decreased. Thus, the similarity between the case and the case D is increased, and the similar case can be searched more easily when the case D is solved. The following formula is used for the weight adjustment:
the weight is increased:
ωi'(t)=ωi(t)+Δim and omegai'(t)≥0 (5)
Weight reduction:
ωi'(t)=ωi(t)-Δim and omegai‘(t)≥0 (6)
Wherein, ω isi(t) is the weight of the ith attribute before the tth iteration; omegai' (t) is the process weight after the tth iteration of the ith attribute; deltaiM determines the variance of the weights, m is the caseNumber of attributes, for different classification problems, if the number of attributes is different, even ΔiThe weights are also varied by the same amount. However, m is fixed and invariant for the same case base, so the weight change is mainly related to ΔiIn connection with, here,. DELTA.iHas a value interval of [0.01,0.2 ]]And the updated weights of the formula (5) and the formula (6) are required to be larger than zero.
2.6: normalizing the updated attribute weight according to the formula (7) to obtain the final weight of the attribute after the t-th iterative learning and storing the final weight;
wherein, ω isi"(t) is the final weight of the ith attribute after the tth iteration.
2.7: reclassifying all cases in the training set according to the new weight of the t-th iteration, and calculating new accuracy ACC of the training settr(t) of (d). If t is less than NtrThen, 2.4 is rotated; otherwise, go to 2.8;
2.8: and selecting the corresponding weight as the optimal weight under the current introspection learning strategy according to the maximum accuracy of the training set. The introspection learning training process of the weights ends.
And step 3: case retrieval;
calculating a new case X and a source case C according to the attribute weight obtained in the step 2 and a KNN retrieval strategykDegree of similarity of
Wherein, ω isiIs the weight of the ith characteristic attribute, obtained according to step 2, satisfies
P cases in the historical database and new casesSimilarity of (S) (X, C)1),S(X,C2),...S(X,Cp) And sequentially selecting the maximum K similarity degrees according to the arrangement from large to small, and further obtaining the K adjacent cases which are most similar to the new case.
And 5: reusing the cases;
selecting the most categories y from the K adjacent casesqAs a category of the new case to be sought, then
Y=yq (10)
Step 6: case storage;
description of the new problem X ═ X1,x2,…,xm) And its corresponding class Y ═ YqStored in the history database shown in formula (1) for the next case classification. To this end, the total number of case records in the history database is p + 1.
In order to verify the effectiveness of the method, in the following comparative examples, a Heart (statlog Heart Data set) Data set in a uci (university of California irvine) resource library is selected, and the basic information is shown in table 2. Positive in Table 2 indicates patient, and negative indicates normal.
TABLE 2 data set basic information
In order to compare the performance influence of different weight adjustment methods on the CBR classifier, four CBR classifiers are adopted in each data set, namely a traditional CBR classifier adopting uniform weight (abbreviated as EW), a CBR classifier adopting introspection learning weight (abbreviated as GMU + GUD), a CBR classifier adopting genetic algorithm optimization weight (abbreviated as GA) and a CBR classifier adopting information entropy optimization weight (abbreviated as EN).
And five-fold cross validation is adopted to ensure the objectivity of the result. In each fold of experiment, the data set was divided into five parts, one of which was the test set and the other four parts (for convenience of presentation, called case base) were used for training the weights. For GMU + GUD and GA, three parts in the case base are used as a history base, the rest part is called a training set, the attribute weight of the cases in the history base is iteratively adjusted through introspection learning or genetic algorithm, the cases in the training set are used for adjusting the weight of the cases in the history base, and the test set is used for testing the performance after the attribute weight is adjusted through the introspection learning or genetic algorithm. For other classifiers, except one classifier is used as a test set, the other four classifiers are used as a history library, the history library is used for training the weight, and the test set is used for testing the performance of the classifier adopting the trained weight.
And when the CBR classifier carries out case retrieval, a KNN retrieval strategy is adopted, K represents the number of the selected neighbors, the values of K are different, and the obtained results are different. In order to eliminate the influence of different K values on the results, the K value in the experiment is 5. The weight adjustment amount in the introspection learning weight adjustment is 0.015/m, m is the number of case attributes (m is 14 in this experiment), and the threshold ξ for matching attributes ismatchTake 0.01. In the genetic algorithm, the population size is set to be 20, the cross probability is 0.4, the mutation probability is 0.05, and the evolution passage number is 10. Table 3 is the average accuracy of experiments performed with different classifiers.
TABLE 3 average accuracy (%) -for each data set using different classifiers
As can be seen from table 3, the accuracy of the EW classifier is the lowest, and the accuracy of the other classifiers is mostly improved, which indicates that the weight adjustment can improve the performance of the classifier. The GMU + GUD classifier has the highest accuracy, and the weight adjusting method for introspection learning has a good effect.
To compare whether different weighting methods significantly improve the classification performance, a T-test can be performed to determine whether the mean of two samples differs significantly from the population represented by each. When H ═ 0, no significant difference is indicated; when H is 1, there is a significant difference in the representation. P greater than 0.05 indicates no difference or no significant difference; when P is less than 0.05, the difference is expressed or significant, and the smaller the P value, the more significant the difference. Table 3 is the paired T test results (significance level 0.05) using different weight adjustment methods with equal weights.
TABLE 4T-test for different weight adjustment methods and uniform weight method
As can be seen from Table 4, H of EW- (GMU + GUD) is 1, and P is less than 0.05, which indicates that the accuracy of the CBR classifier can be remarkably improved by the weight adjustment method for introspection learning. H of EW-EN and EW-GA is 0, and P value is not less than 0.05, which shows that the difference between the distribution weight and the entropy weight of the genetic algorithm and the average weight is not obvious. The T test result further shows that the CBR classifier performance is improved more remarkably by the weight adjusting method of introspection learning compared with the weight optimizing method of genetic algorithm and information entropy.
Through the comparison example, it can be seen that the classification accuracy of the CBR classifier can be significantly improved by the weight adjustment method based on the introspection learning of the successful driving strategy.
Further, an improved CBR-based cardiovascular disease auxiliary diagnostic system was established, as shown in fig. 2. The system mainly comprises a cardiovascular disease database, a main case reasoning process such as feature extraction, case retrieval, case reuse, case correction and case storage, and a man-machine interaction part which is convenient for a user to exchange information with the system.
Implementations and functional operations of the subject matter described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware, including the structures disclosed in this specification and their structural equivalents, or combinations of more than one of the foregoing. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on one or more tangible, non-transitory program carriers, for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of the foregoing.
A computer program (which may also be referred to or described as a program, software application, module, software module, script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in: in a markup language document; in a single file dedicated to the relevant program; or in multiple coordinated files, such as files that store one or more modules, sub programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output.
Computer readable media suitable for storing computer program instructions and data include all forms of non volatile memory, media and memory devices, including by way of example: semiconductor memory devices, such as EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
Implementations of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components in the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network ("LAN") and a wide area network ("WAN"), e.g., the Internet.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features that may embody particular implementations of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in combination and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as: such operations are required to be performed in the particular order shown, or in sequential order, or all illustrated operations may be performed, in order to achieve desirable results. In certain situations, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Particular embodiments of the subject matter have been described. Other implementations are within the scope of the following claims. For example, the activities recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.
Claims (10)
1. An attribute weight adjustment application for case reasoning, comprising the following modules:
a retrieve similar cases module configured to:
acquiring a training set B of cardiovascular disease cases, traversing all N target cases in the B, and performing iteration comprising the following steps for each target case:
retrieving K cases which are most similar to the target case from a history set A of the cardiovascular disease cases;
if the most similar cases have correctly classified cases, updating the attribute weight of the correctly classified cases;
an update attribute weight module configured to:
increasing the weight of the attributes of the correctly classified cases matching the target case;
reducing the weight of the attributes that the correctly classified case does not match the target case; a normalization module configured to:
and calculating to obtain the normalized attribute weight according to the formula I:
wherein, ω ″)i(t) represents the weight, ω 'of the ith term attribute after the tth iteration and normalized'i(t) represents the weight of the ith attribute after the tth iteration, i is 1,2, …, m; t ═ 1,2, …, N; m represents the total number of attributes, N represents the number of cases in B, i.e., the number of iterationsCounting;
a calculate optimal weight module configured to:
performing case reasoning classification on the training set B based on the m attribute weights after the iteration for the t time, and calculating the accuracy of the case reasoning classification;
and recording the m attribute weights corresponding to the maximum classification accuracy as the optimal attribute weights of case reasoning based on introspection learning.
2. The application of claim 1, wherein the retrieving process of retrieving similar case module configurations comprises the steps of:
calculating each case X in the A and the target case CkSimilarity of (S) (X, C)k) (ii) a And sorting the cases in the A according to the similarity, and sequentially selecting K cases with the maximum similarity as the most similar cases.
3. The application of claim 2, wherein the similarity S (X, C)k) Comprises the steps of calculating according to formula II:
wherein X represents the case in A, CkRepresenting the target case, S (X, C)k) Presentation case X and target case CkSimilarity of (a) (. omega.)iOptimal attribute weight, x, representing the ith attributeiRepresents the value of the ith attribute, X, for case Xi,kRepresenting target case CkThe value of the ith attribute of (1).
4. The application program of claim 3, wherein the update property weight module is configured to further comprise the steps of:
setting a threshold value;
if the absolute value of the difference between the ith attribute of the correctly classified case and the ith attribute of the target case does not exceed the threshold value, the ith attribute is a matching attribute; and if the absolute value of the difference between the ith attribute of the correctly classified case and the ith attribute of the target case exceeds the threshold value, the ith attribute is a mismatch attribute.
5. The application of claim 4, wherein the retrieving process of retrieving similar case module configurations further comprises: and retrieving K nearest neighbor cases of the target case from the A based on a K nearest neighbor retrieval strategy as the most similar cases.
6. The application of claim 4, wherein the retrieving process of retrieving similar case module configurations further comprises: retrieving K approximate nearest neighbor cases from the A as the most similar cases based on an approximate nearest neighbor search strategy.
7. The application of claim 5, wherein the retrieve similar cases module is further configured to: and gradually adjusting the K value by a cross-validation method.
8. The application of claim 6, wherein the case data is processed in a dimension reduction manner by a hash coding method.
9. A cardiovascular disease diagnostic device loaded with the application program according to claims 1-8.
10. A computer-readable storage medium, characterized in that it stores all the computer programs/instructions and related data of the application programs of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111166411.4A CN113792879A (en) | 2021-09-30 | 2021-09-30 | Case reasoning attribute weight adjusting method based on introspection learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111166411.4A CN113792879A (en) | 2021-09-30 | 2021-09-30 | Case reasoning attribute weight adjusting method based on introspection learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113792879A true CN113792879A (en) | 2021-12-14 |
Family
ID=78877723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111166411.4A Pending CN113792879A (en) | 2021-09-30 | 2021-09-30 | Case reasoning attribute weight adjusting method based on introspection learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113792879A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114611575A (en) * | 2022-01-29 | 2022-06-10 | 国网河北省电力有限公司邯郸供电分公司 | Fault case classification method and system |
-
2021
- 2021-09-30 CN CN202111166411.4A patent/CN113792879A/en active Pending
Non-Patent Citations (2)
Title |
---|
张春晓等: "案例推理分类器属性权重的内省学习调整方法", 计算机应用 * |
王蕾;蒋乔薇;王新宴;张莎莎;王枞;: "基于案例推理的高血压辅助诊疗方法研究", 转化医学杂志 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114611575A (en) * | 2022-01-29 | 2022-06-10 | 国网河北省电力有限公司邯郸供电分公司 | Fault case classification method and system |
CN114611575B (en) * | 2022-01-29 | 2023-09-15 | 国网河北省电力有限公司邯郸供电分公司 | Fault case classification method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Singh et al. | Investigating the impact of data normalization on classification performance | |
Reddy et al. | An efficient system for heart disease prediction using hybrid OFBAT with rule-based fuzzy logic model | |
Yaman et al. | Comparison of bagging and boosting ensemble machine learning methods for automated EMG signal classification | |
Liu et al. | Classification of heart diseases based on ECG signals using long short-term memory | |
CN113834656A (en) | Bearing fault diagnosis method, system, equipment and terminal | |
Qadri et al. | Effective feature engineering technique for heart disease prediction with machine learning | |
Nugroho et al. | Effective predictive modelling for coronary artery diseases using support vector machine | |
CN113674864A (en) | Method for predicting risk of malignant tumor complicated with venous thromboembolism | |
CN116226629B (en) | Multi-model feature selection method and system based on feature contribution | |
US20230029947A1 (en) | Medical disease feature selection method based on improved salp swarm algorithm | |
Hamim et al. | A hybrid gene selection strategy based on fisher and ant colony optimization algorithm for breast cancer classification | |
US20230206054A1 (en) | Expedited Assessment and Ranking of Model Quality in Machine Learning | |
CN113792879A (en) | Case reasoning attribute weight adjusting method based on introspection learning | |
CN117727464B (en) | Training method and device based on medical multi-view disease prediction model | |
Bakasa et al. | Stacked ensemble deep learning for pancreas cancer classification using extreme gradient boosting | |
CN117174257B (en) | Medical image processing device, electronic apparatus, and computer-readable storage medium | |
KR101935094B1 (en) | Computational system and method to identify cancer causing genes, recording medium for performing the method | |
CN112084944A (en) | Method and system for identifying dynamically evolved expressions | |
Cateni et al. | Improving the stability of Sequential Forward variables selection | |
Muthulakshmi et al. | Prediction of Heart Disease using Ensemble Learning | |
US20220172064A1 (en) | Machine learning method and machine learning device for eliminating spurious correlation | |
CN112131415B (en) | Method and device for improving data acquisition quality based on deep learning | |
Usha et al. | Feature Selection Techniques in Learning Algorithms to Predict Truthful Data | |
Usha et al. | Predicting Heart Disease Using Feature Selection Techniques Based On Data Driven Approach | |
Nagarajan et al. | An optimized sub group partition based healthcare data mining in big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211214 |