CN106991283B - Method for constructing medical record library based on fractal technology - Google Patents
Method for constructing medical record library based on fractal technology Download PDFInfo
- Publication number
- CN106991283B CN106991283B CN201710206758.4A CN201710206758A CN106991283B CN 106991283 B CN106991283 B CN 106991283B CN 201710206758 A CN201710206758 A CN 201710206758A CN 106991283 B CN106991283 B CN 106991283B
- Authority
- CN
- China
- Prior art keywords
- fractal
- medical record
- attribute
- attributes
- max
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000005516 engineering process Methods 0.000 title claims abstract description 15
- 238000012216 screening Methods 0.000 claims abstract description 7
- 239000003638 chemical reducing agent Substances 0.000 claims abstract description 4
- 230000009467 reduction Effects 0.000 claims description 7
- 230000014759 maintenance of location Effects 0.000 claims description 3
- 238000013138 pruning Methods 0.000 claims description 3
- 238000007670 refining Methods 0.000 claims description 3
- 201000010099 disease Diseases 0.000 abstract 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 abstract 3
- 238000000605 extraction Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004377 microelectronic Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a method for constructing a medical record library based on a fractal technology; the method comprises the steps of inputting a data set, carrying out scale screening, reducing a sample, an attribute reducer and outputting a medical record library; the method is based on the main characteristics of capturing the medical record library by the fractal technology, and reduces and reconstructs the historical medical record library from two aspects of medical record quantity and medical record attribute, so that the infinite expansion of the medical record library can be avoided, and the efficiency of searching and analyzing the medical record library is improved. The medical record database is mainly used for classifying, sorting and analyzing the historical record database in hospitals, and helps medical staff to recognize diseases, diagnose and treat the diseases and prevent the diseases based on the historical records.
Description
Technical Field
The invention relates to a medical record library construction method, in particular to a medical record library construction method based on a fractal technology.
Background
The attribute reduction means that an attribute subset is obtained from an original attribute set of a data set, the attribute subset can fully embody the main characteristics of the data set, and the attribute subset has the distinguishing capability basically equal to that of the original attribute set.
Here, attributes are also often referred to as features. There are two basic approaches to attribute reduction: feature extraction (featurextraction) and feature selection (featureelection). The feature extraction is mainly divided into a linear feature extraction technique and a nonlinear feature extraction technique, and no matter the linear or nonlinear feature extraction technique, the attribute of the output feature space is artificially constructed, and no obvious corresponding relation exists between the attribute and the feature of the original feature space, so that the feature extraction is not convenient for people to understand. The feature selection technology selects part of relevant features which can reflect the statistical characteristics of the mode categories most from a plurality of original features according to a certain criterion, thereby achieving the effect of reducing the feature space dimension. Compared with the feature extraction technology, the feature space obtained by the method is not subjected to abstract rotation and transformation, so that the analysis and understanding of the final result are facilitated, and the method is a common method in practical application.
The fractal theory is a very active mathematical branch in modern nonlinear scientific research, and the basic idea of the fractal theory is that a complex phenomenon is considered to be formed by iteration of simple phenomena by utilizing the characteristics of overall similarity and local similarity, so that the rules and characteristics contained in the complex phenomenon are revealed, and the fractal theory is particularly suitable for solving the complex problem. For an object with fractal characteristics, the fractal dimension is an important index, and can quantitatively describe the complexity of a fractal set. In recent years, researches show that fractal dimension has a very special function in the field of data mining, the fractal technology is applied to the field of machine learning, the defects of the traditional machine learning technology can be better overcome, and the problems of data modeling and analysis on a high-dimensional data set with a complex structure are more effectively solved.
Wherein,
the first prior art is as follows: the patent of feature selection method FDR Beijing Zhongxing microelectronics Limited yellow English based on video monitoring and the people counting method and system based on video monitoring proposed in Fast feature selection using fractional dimension, applies for patent and obtains approval to the Chinese intellectual property office in 7 th 01 th 2009, and is published in 8 th 01 th 2009 with the publication numbers as: the main idea of the CN101477641FDR algorithm is to delete the attribute with the least influence on the whole fractal dimension of the data set each time, and finally keep the attribute subset of which the difference value between the fractal dimension and the whole fractal dimension of the data set meets a certain threshold requirement.
The first prior art has the following defects:
the optimal time complexity of the currently known fractal dimension algorithm is o (nlogn) (N is the number of data points), in order to delete the attribute having the smallest influence on the fractal dimension of the current attribute set each time, (E-D) (E + D +1)/2 times (D is the number of attributes to be reserved, E is the number of data space attributes) is required for the FDR algorithm to scan the data set and calculate the fractal dimension corresponding to the current attribute subset, and accordingly, the total time complexity of the FDR algorithm is o (nlogn). In essence, the FDR algorithm still belongs to a feature selection algorithm based on the merits of feature subsets, and introduces a large amount of fractal dimension calculation work, and thus cannot be applied to high-dimensional data feature selection work. Yan radiance and Li war Huan 2008 published a paper "two-stage unsupervised sequential forward fractal attribute reduction algorithm" on computer research and development, and a fractal-based attribute reduction method was researched. The method firstly uses fractal to carry out similar attribute grouping and redundant attribute exclusion on an attribute set, and then generates a maximum irrelevant attribute subset. Compared with the FDR algorithm, the method has the advantage of improving the efficiency.
The second prior art has the defects
The disadvantages of this method are mainly:
1. the method needs to calculate more fractal dimension average times.
1) Calculating the fractal dimension of each attribute when grouping the similar attributes;
2) calculating fractal dimension between every two attributes in each similar attribute group when the redundant attributes are eliminated;
3) the fractal dimension also needs to be continuously calculated when adding attributes to the candidate maximum set of irrelevant attributes using the forward algorithm.
2. The algorithm cannot exclude dependencies between more than 2 attributes.
3. The algorithm is not good for small or large correlations or redundancies between data set attributes.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for constructing a medical record library based on a fractal technology; the method is based on the main characteristics of capturing the medical record library by the fractal technology, reduces and reconstructs the historical medical record library from two aspects of medical record quantity and medical record attribute, can avoid infinite expansion of the medical record library, and improves the efficiency of searching and analyzing the medical record library.
The invention is realized in such a way, and provides a method for constructing a case base based on a fractal technology, which is characterized by comprising the following steps: comprises the following steps;
step 1: inputting a data set;
inputting medical record data and extracting key attributes
S ═ a, E, where a denotes congestionAttribute set with m attributes { A1,A1,…,AmE denotes a set of objects comprising n tuples;
step 2, size screening;
step 2.1 calculating the fractal dimension D when q of D (A) is-5, 2, 5-5、D2、D5And a corresponding fractal scale region;
step 2.2, intersecting the corresponding fractal scale interval when q is-5, 2, 5 to obtain a public fractal scale area;
step 2.3 taking the middle scale [ r ] of the common fractal scale regionmin,rmax]As a result of the screening;
step 2.4 selecting the maximum fractal dimension rmaxAs an output scale;
and step 3: sample reduction
Step 3.1 pruning of fractal samples
Sequentially searching Pi(rmin) I is 1, …, N, if Pi(rmin) If tau, removing the sample point i;
sequentially searching Pi(rmax) I is 1, …, N, if Pi(rmax) If tau, removing the sample point i;
step 3.2 Retention of rmaxA scale sample;
step 4, attribute reducer
Step 4.1: calculating attribute independent probability, constructing an independent attribute group, and performing an algorithm:
(1) initialization: the data set D ═ a, E },
A={A1,A1,…,Ame denotes a set of objects comprising n tuples,
kmax,W={W1,W2,…,Wm}
(2) r ← calculating fractal dimension of initial dataset d (a)
(3) d ← taking the smallest integer greater than or equal to d
(5)k←0
(6)do k←k+1
(7)WhereinSelecting a function for the subset of attributes, in dependence on the probability WkD are selected from A
Properties
(8)dsStep of refining the fractal dimension of the attribute subset D (S)
(11) To Wk+1(A) Is normalized
(12)until k=kmax;
Step 4.2: based on the attribute-independent probabilities, a subset of attributes is selected,
according to Wk+1(A) The first k attributes with the largest probability of independence are selected.
The invention has the advantages that: the method is based on the main characteristics of capturing the medical record library by the fractal technology, reduces and reconstructs the historical medical record library from two aspects of medical record quantity and medical record attribute, can avoid infinite expansion of the medical record library, and improves the efficiency of searching and analyzing the medical record library.
Drawings
FIG. 1 is a process for maintaining a medical records repository according to the present invention.
Detailed Description
The present invention will be described in detail below, and technical solutions in embodiments of the present invention will be clearly and completely described below. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a method for constructing a case base based on a fractal technology through improvement, which can be implemented as follows; comprises the following steps;
step 1: inputting a data set;
inputting medical record data and extracting key attributes
S ═ a, E, where a denotes the set of attributes { a, E } that holds m attributes1,A1,…,AmE denotes a set of objects comprising n tuples;
step 2, size screening;
step 2.1 calculating the fractal dimension D when q of D (A) is-5, 2, 5-5、D2、D5And a corresponding fractal scale region;
step 2.2, intersecting the corresponding fractal scale interval when q is-5, 2, 5 to obtain a public fractal scale area;
step 2.3 taking the middle scale [ r ] of the common fractal scale regionmin,rmax]As a result of the screening;
step 2.4 selecting the maximum fractal dimension rmaxAs an output scale;
and step 3: sample reduction
Step 3.1 pruning of fractal samples
Sequentially searching Pi(rmin) I is 1, …, N, if Pi(rmin) If tau, removing the sample point i;
sequentially searching Pi(rmax) I is 1, …, N, if Pi(rmax) If tau, removing the sample point i;
step 3.2 Retention of rmaxA scale sample;
step 4, attribute reducer
Step 4.1: calculating attribute independent probability, constructing an independent attribute group, and performing an algorithm:
(1) initialization: the data set D ═ a, E },
A={A1,A1,…,Ame denotes a set of objects comprising n tuples,
kmax,W={W1,W2,…,Wm}
(2) r ← calculating fractal dimension of initial dataset d (a)
(3) d ← taking the smallest integer greater than or equal to d
(5)k←0
(6)do k←k+1
(8)WhereinSelecting a function for the subset of attributes, in dependence on the probability WkSelecting d attributes in A
(8)dsStep of refining the fractal dimension of the attribute subset D (S)
(11) To Wk+1(A) Is normalized
(12)until k=kmax;
Step 4.2: based on the attribute-independent probabilities, a subset of attributes is selected,
according to Wk+1(A) The first k attributes with the largest probability of independence are selected.
In consideration of the diversity and complexity of actual data distribution, it is difficult to distinguish a single fractal set from a multi-fractal set by using a certain fractal dimension as a feature, and in order to describe the fractal feature of a data set more accurately, the multi-fractal dimension is used herein.
The algorithm is as follows: computing multi-fractal dimensions
Multiple fractal dimension DqCalculated using the generalized G-P (Grassberger-Procaccia) algorithm. Given the q value, DqThe calculation method of (2) is as follows:
step 1: with r0For the initial value, 13.14. increment delta r is a step length, and q-order correlation integral C corresponding to a series of discrete r is repeatedly calculatedq(r)。
C of given rqThe calculation method of (r) is as follows:
if X is the data set, it is denoted as X ═ X1,x2,…,xNWhere the data item xiHaving M attributes, can be thought of as points in M-dimensional space, from which a subset of M-dimensional euclidean space is composed.
Definition of xiTo xjDistance of points dij. With xiTaking the point as the center and r as the radius as the sphere, calculating the probability that all the points are positioned in the sphere, wherein the calculation formula is as follows:
wherein (x) is the Heaviside step function:
thus, the q-th order correlation integral can be calculated by:
step 2: determining fractal scale regions
According to a series of C calculated in the step 1q(r) drawing ln Cq(r)-
lnr curve. If the dataset has a multi-fractal property, ln Cq(r)-
lnr there is a straight line in the middle of the curve, and the straight line corresponds toIn fractal scale region, denoted as [ r ]min,rmax]
And step 3: calculating a generalized dimension Dq
Fitting the slope of the fractal scale region by using a least square method to obtain DqThe value of (c).
The method is based on the main characteristics of capturing the medical record library by the fractal technology, reduces and reconstructs the historical medical record library from two aspects of medical record quantity and medical record attribute, can avoid infinite expansion of the medical record library, and improves the efficiency of searching and analyzing the medical record library.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (1)
1. A method for constructing a medical record library based on a fractal technology is characterized by comprising the following steps: comprises the following steps;
step 1: inputting a data set;
inputting medical record data and extracting key attributes
S ═ a, E, where a denotes the set of attributes { a, E } that holds m attributes1,A1,…,AmE denotes a set of objects comprising n tuples;
step 2, size screening;
step 2.1 calculating the fractal dimension D when q of D (A) is-5, 2, 5-5、D2、D5,
And a corresponding fractal scale region;
step 2.2, intersecting the corresponding fractal scale interval when q is-5, 2, 5 to obtain a public fractal scale area;
step 2.3 taking the middle scale [ r ] of the common fractal scale regionmin,rmax]As a result of the screening;
step 2.4 selecting the maximum fractal dimension rmaxAs an output scale;
and step 3: sample reduction
Step 3.1 pruning of fractal samples
Sequentially searching Pi(rmin) I is 1, …, N, if Pi(rmin) If tau, removing the sample point i;
sequentially searching Pi(rmax) I is 1, …, N, if Pi(rmax) If tau, removing the sample point i;
step 3.2 Retention of rmaxA scale sample;
step 4, attribute reducer
Step 4.1: calculating attribute independent probability, constructing an independent attribute group, and performing an algorithm:
(1) initialization: data set D ═ { a, E }, a ═ a }1,A1,…,AmE denotes a set of objects comprising n tuples, kmax,W={W1,W2,…,Wm}
(2) r ← calculating fractal dimension of initial dataset d (a)
(3) d ← taking the smallest integer greater than or equal to d
(5)k←0
(6)do k←k+1
(7)WhereinSelecting a function for the subset of attributes, in dependence on the probability WkSelecting d attributes in A
(8)dsStep of refining the fractal dimension of the attribute subset D (S)
(11) To Wk+1(A) Is normalized
(12)until k=kmax;
Step 4.2: based on the attribute-independent probabilities, a subset of attributes is selected,
according to Wk+1(A) The first k attributes with the largest probability of independence are selected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710206758.4A CN106991283B (en) | 2017-03-31 | 2017-03-31 | Method for constructing medical record library based on fractal technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710206758.4A CN106991283B (en) | 2017-03-31 | 2017-03-31 | Method for constructing medical record library based on fractal technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106991283A CN106991283A (en) | 2017-07-28 |
CN106991283B true CN106991283B (en) | 2020-07-17 |
Family
ID=59415926
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710206758.4A Active CN106991283B (en) | 2017-03-31 | 2017-03-31 | Method for constructing medical record library based on fractal technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106991283B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101881826A (en) * | 2009-05-06 | 2010-11-10 | 中国人民解放军海军航空工程学院 | Scanning-mode sea clutter local multi-fractal target detector |
WO2012144695A1 (en) * | 2011-04-20 | 2012-10-26 | Im Co., Ltd. | Prostate cancer diagnosis device using fractal dimension value |
US8892388B2 (en) * | 2010-09-30 | 2014-11-18 | Schlumberger Technology Corporation | Box counting enhanced modeling |
CN104778481A (en) * | 2014-12-19 | 2015-07-15 | 五邑大学 | Method and device for creating sample library for large-scale face mode analysis |
CN105824937A (en) * | 2016-03-17 | 2016-08-03 | 合肥工业大学 | Attribute selection method based on binary system firefly algorithm |
-
2017
- 2017-03-31 CN CN201710206758.4A patent/CN106991283B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101881826A (en) * | 2009-05-06 | 2010-11-10 | 中国人民解放军海军航空工程学院 | Scanning-mode sea clutter local multi-fractal target detector |
US8892388B2 (en) * | 2010-09-30 | 2014-11-18 | Schlumberger Technology Corporation | Box counting enhanced modeling |
WO2012144695A1 (en) * | 2011-04-20 | 2012-10-26 | Im Co., Ltd. | Prostate cancer diagnosis device using fractal dimension value |
CN104778481A (en) * | 2014-12-19 | 2015-07-15 | 五邑大学 | Method and device for creating sample library for large-scale face mode analysis |
CN105824937A (en) * | 2016-03-17 | 2016-08-03 | 合肥工业大学 | Attribute selection method based on binary system firefly algorithm |
Non-Patent Citations (5)
Title |
---|
两阶段无监督顺序前向分形属性规约算法;闫光辉,李战怀;《计算机研究与发展》;20081231;第45卷(第11期);全文 * |
分形技术在案例库维护中的应用;倪志伟 等;《计算机应用》;20090630;第29卷(第6期);引言、第1-4节 * |
基于分形理论的一种新的机器学习方法:分形学习;倪志伟 等;《中国科学技术大学学报》;20130430;第43卷(第4期);全文 * |
基于分形维数的属性约简;郭平 等;《计算机科学》;20071231;第34卷(第9期);全文 * |
基于分形维数的数据挖掘技术研究综述;倪丽萍 等;《计算机科学》;20081231;第35卷(第1期);第2-3节 * |
Also Published As
Publication number | Publication date |
---|---|
CN106991283A (en) | 2017-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sussman et al. | A consistent adjacency spectral embedding for stochastic blockmodel graphs | |
US20070250522A1 (en) | System and method for organizing, compressing and structuring data for data mining readiness | |
CN108197144B (en) | Hot topic discovery method based on BTM and Single-pass | |
CN110457405B (en) | Database auditing method based on blood relationship | |
David et al. | Hierarchical data organization, clustering and denoising via localized diffusion folders | |
CN111125469B (en) | User clustering method and device of social network and computer equipment | |
CN107292097B (en) | Chinese medicine principal symptom selection method based on feature group | |
Genender-Feltheimer | Visualizing high dimensional and big data | |
Ding et al. | A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search① | |
CN113516019B (en) | Hyperspectral image unmixing method and device and electronic equipment | |
CN114067915A (en) | scRNA-seq data dimension reduction method based on deep antithetical variational self-encoder | |
Shim et al. | Active cluster annotation for wafer map pattern classification in semiconductor manufacturing | |
Deng et al. | An improved fuzzy clustering method for text mining | |
Mandal et al. | Unsupervised non-redundant feature selection: a graph-theoretic approach | |
De Araujo et al. | Automatic cluster labeling based on phylogram analysis | |
CN106991283B (en) | Method for constructing medical record library based on fractal technology | |
CN108664548B (en) | Network access behavior characteristic group dynamic mining method and system under degradation condition | |
EP4270254A1 (en) | Optimizing method and computer system for neural network and computer-readable storage medium | |
Kumar et al. | A new Initial Centroid finding Method based on Dissimilarity Tree for K-means Algorithm | |
Liao et al. | Convolution filter pruning for transfer learning on small dataset | |
Ramkumar et al. | An effective analysis of data clustering using distance-based K-means Algorithm | |
Song et al. | Latent semantic analysis for vector space expansion and fuzzy logic-based genetic clustering | |
Feng et al. | A genetic k-means clustering algorithm based on the optimized initial centers | |
Umale et al. | Overview of k-means and expectation maximization algorithm for document clustering | |
Morvan et al. | Graph sketching-based space-efficient data clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |