CN106991283B

CN106991283B - Method for constructing medical record library based on fractal technology

Info

Publication number: CN106991283B
Application number: CN201710206758.4A
Authority: CN
Inventors: 邱航; 付波; 蒲晓蓉
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2017-03-31
Filing date: 2017-03-31
Publication date: 2020-07-17
Anticipated expiration: 2037-03-31
Also published as: CN106991283A

Abstract

The invention discloses a method for constructing a medical record library based on a fractal technology; the method comprises the steps of inputting a data set, carrying out scale screening, reducing a sample, an attribute reducer and outputting a medical record library; the method is based on the main characteristics of capturing the medical record library by the fractal technology, and reduces and reconstructs the historical medical record library from two aspects of medical record quantity and medical record attribute, so that the infinite expansion of the medical record library can be avoided, and the efficiency of searching and analyzing the medical record library is improved. The medical record database is mainly used for classifying, sorting and analyzing the historical record database in hospitals, and helps medical staff to recognize diseases, diagnose and treat the diseases and prevent the diseases based on the historical records.

Description

Method for constructing medical record library based on fractal technology

Technical Field

The invention relates to a medical record library construction method, in particular to a medical record library construction method based on a fractal technology.

Background

The attribute reduction means that an attribute subset is obtained from an original attribute set of a data set, the attribute subset can fully embody the main characteristics of the data set, and the attribute subset has the distinguishing capability basically equal to that of the original attribute set.

Here, attributes are also often referred to as features. There are two basic approaches to attribute reduction: feature extraction (featurextraction) and feature selection (featureelection). The feature extraction is mainly divided into a linear feature extraction technique and a nonlinear feature extraction technique, and no matter the linear or nonlinear feature extraction technique, the attribute of the output feature space is artificially constructed, and no obvious corresponding relation exists between the attribute and the feature of the original feature space, so that the feature extraction is not convenient for people to understand. The feature selection technology selects part of relevant features which can reflect the statistical characteristics of the mode categories most from a plurality of original features according to a certain criterion, thereby achieving the effect of reducing the feature space dimension. Compared with the feature extraction technology, the feature space obtained by the method is not subjected to abstract rotation and transformation, so that the analysis and understanding of the final result are facilitated, and the method is a common method in practical application.

The fractal theory is a very active mathematical branch in modern nonlinear scientific research, and the basic idea of the fractal theory is that a complex phenomenon is considered to be formed by iteration of simple phenomena by utilizing the characteristics of overall similarity and local similarity, so that the rules and characteristics contained in the complex phenomenon are revealed, and the fractal theory is particularly suitable for solving the complex problem. For an object with fractal characteristics, the fractal dimension is an important index, and can quantitatively describe the complexity of a fractal set. In recent years, researches show that fractal dimension has a very special function in the field of data mining, the fractal technology is applied to the field of machine learning, the defects of the traditional machine learning technology can be better overcome, and the problems of data modeling and analysis on a high-dimensional data set with a complex structure are more effectively solved.

Wherein,

the first prior art is as follows: the patent of feature selection method FDR Beijing Zhongxing microelectronics Limited yellow English based on video monitoring and the people counting method and system based on video monitoring proposed in Fast feature selection using fractional dimension, applies for patent and obtains approval to the Chinese intellectual property office in 7 th 01 th 2009, and is published in 8 th 01 th 2009 with the publication numbers as: the main idea of the CN101477641FDR algorithm is to delete the attribute with the least influence on the whole fractal dimension of the data set each time, and finally keep the attribute subset of which the difference value between the fractal dimension and the whole fractal dimension of the data set meets a certain threshold requirement.

The first prior art has the following defects:

the optimal time complexity of the currently known fractal dimension algorithm is o (nlogn) (N is the number of data points), in order to delete the attribute having the smallest influence on the fractal dimension of the current attribute set each time, (E-D) (E + D +1)/2 times (D is the number of attributes to be reserved, E is the number of data space attributes) is required for the FDR algorithm to scan the data set and calculate the fractal dimension corresponding to the current attribute subset, and accordingly, the total time complexity of the FDR algorithm is o (nlogn). In essence, the FDR algorithm still belongs to a feature selection algorithm based on the merits of feature subsets, and introduces a large amount of fractal dimension calculation work, and thus cannot be applied to high-dimensional data feature selection work. Yan radiance and Li war Huan 2008 published a paper "two-stage unsupervised sequential forward fractal attribute reduction algorithm" on computer research and development, and a fractal-based attribute reduction method was researched. The method firstly uses fractal to carry out similar attribute grouping and redundant attribute exclusion on an attribute set, and then generates a maximum irrelevant attribute subset. Compared with the FDR algorithm, the method has the advantage of improving the efficiency.

The second prior art has the defects

The disadvantages of this method are mainly:

1. the method needs to calculate more fractal dimension average times.

1) Calculating the fractal dimension of each attribute when grouping the similar attributes;

2) calculating fractal dimension between every two attributes in each similar attribute group when the redundant attributes are eliminated;

3) the fractal dimension also needs to be continuously calculated when adding attributes to the candidate maximum set of irrelevant attributes using the forward algorithm.

2. The algorithm cannot exclude dependencies between more than 2 attributes.

3. The algorithm is not good for small or large correlations or redundancies between data set attributes.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method for constructing a medical record library based on a fractal technology; the method is based on the main characteristics of capturing the medical record library by the fractal technology, reduces and reconstructs the historical medical record library from two aspects of medical record quantity and medical record attribute, can avoid infinite expansion of the medical record library, and improves the efficiency of searching and analyzing the medical record library.

The invention is realized in such a way, and provides a method for constructing a case base based on a fractal technology, which is characterized by comprising the following steps: comprises the following steps;

step 1: inputting a data set;

inputting medical record data and extracting key attributes

S ═ a, E, where a denotes congestionAttribute set with m attributes { A₁，A₁，…，A_mE denotes a set of objects comprising n tuples;

step 2, size screening;

step 2.1 calculating the fractal dimension D when q of D (A) is-5, 2, 5_-5、D₂、D₅And a corresponding fractal scale region;

step 2.2, intersecting the corresponding fractal scale interval when q is-5, 2, 5 to obtain a public fractal scale area;

step 2.3 taking the middle scale [ r ] of the common fractal scale region_min，r_max]As a result of the screening;

step 2.4 selecting the maximum fractal dimension r_maxAs an output scale;

and step 3: sample reduction

Step 3.1 pruning of fractal samples

Sequentially searching P_i(r_min) I is 1, …, N, if P_i(r_min) If tau, removing the sample point i;

sequentially searching P_i(r_max) I is 1, …, N, if P_i(r_max) If tau, removing the sample point i;

step 3.2 Retention of r_maxA scale sample;

step 4, attribute reducer

Step 4.1: calculating attribute independent probability, constructing an independent attribute group, and performing an algorithm:

(1) initialization: the data set D ═ a, E },

A＝{A₁，A₁，…，A_me denotes a set of objects comprising n tuples,

k_max,W＝{W₁，W₂，…，W_m}

(2) r ← calculating fractal dimension of initial dataset d (a)

(3) d ← taking the smallest integer greater than or equal to d

(4)

(5)k←0

(6)do k←k+1

(7)

Wherein

Selecting a function for the subset of attributes, in dependence on the probability W_kD are selected from A

Properties

(8)d_sStep of refining the fractal dimension of the attribute subset D (S)

(9)

(10)

(11) To W_k+1(A) Is normalized

(12)until k＝k_max；

Step 4.2: based on the attribute-independent probabilities, a subset of attributes is selected,

according to W_k+1(A) The first k attributes with the largest probability of independence are selected.

The invention has the advantages that: the method is based on the main characteristics of capturing the medical record library by the fractal technology, reduces and reconstructs the historical medical record library from two aspects of medical record quantity and medical record attribute, can avoid infinite expansion of the medical record library, and improves the efficiency of searching and analyzing the medical record library.

Drawings

FIG. 1 is a process for maintaining a medical records repository according to the present invention.

Detailed Description

The present invention will be described in detail below, and technical solutions in embodiments of the present invention will be clearly and completely described below. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a method for constructing a case base based on a fractal technology through improvement, which can be implemented as follows; comprises the following steps;

step 1: inputting a data set;

inputting medical record data and extracting key attributes

S ═ a, E, where a denotes the set of attributes { a, E } that holds m attributes₁，A₁，…，A_mE denotes a set of objects comprising n tuples;

step 2, size screening;

step 2.4 selecting the maximum fractal dimension r_maxAs an output scale;

and step 3: sample reduction

Step 3.1 pruning of fractal samples

step 3.2 Retention of r_maxA scale sample;

step 4, attribute reducer

(1) initialization: the data set D ═ a, E },

A＝{A₁，A₁，…，A_me denotes a set of objects comprising n tuples,

k_max,W＝{W₁，W₂，…，W_m}

(2) r ← calculating fractal dimension of initial dataset d (a)

(3) d ← taking the smallest integer greater than or equal to d

(4)

(5)k←0

(6)do k←k+1

(8)

Wherein

Selecting a function for the subset of attributes, in dependence on the probability W_kSelecting d attributes in A

(8)d_sStep of refining the fractal dimension of the attribute subset D (S)

(9)

(10)

(11) To W_k+1(A) Is normalized

(12)until k＝k_max；

In consideration of the diversity and complexity of actual data distribution, it is difficult to distinguish a single fractal set from a multi-fractal set by using a certain fractal dimension as a feature, and in order to describe the fractal feature of a data set more accurately, the multi-fractal dimension is used herein.

The algorithm is as follows: computing multi-fractal dimensions

Multiple fractal dimension D_qCalculated using the generalized G-P (Grassberger-Procaccia) algorithm. Given the q value, D_qThe calculation method of (2) is as follows:

step 1: with r₀For the initial value, 13.14. increment delta r is a step length, and q-order correlation integral C corresponding to a series of discrete r is repeatedly calculated_q(r)。

C of given r_qThe calculation method of (r) is as follows:

if X is the data set, it is denoted as X ═ X₁，x₂，…，x_NWhere the data item x_iHaving M attributes, can be thought of as points in M-dimensional space, from which a subset of M-dimensional euclidean space is composed.

Definition of x_iTo x_jDistance of points d_ij. With x_iTaking the point as the center and r as the radius as the sphere, calculating the probability that all the points are positioned in the sphere, wherein the calculation formula is as follows:

wherein (x) is the Heaviside step function:

thus, the q-th order correlation integral can be calculated by:

step 2: determining fractal scale regions

According to a series of C calculated in the step 1_q(r) drawing ln C_q(r)-

lnr curve. If the dataset has a multi-fractal property, ln C_q(r)-

lnr there is a straight line in the middle of the curve, and the straight line corresponds toIn fractal scale region, denoted as [ r ]_min，r_max]

And step 3: calculating a generalized dimension D_q

Fitting the slope of the fractal scale region by using a least square method to obtain D_qThe value of (c).

The method is based on the main characteristics of capturing the medical record library by the fractal technology, reduces and reconstructs the historical medical record library from two aspects of medical record quantity and medical record attribute, can avoid infinite expansion of the medical record library, and improves the efficiency of searching and analyzing the medical record library.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for constructing a medical record library based on a fractal technology is characterized by comprising the following steps: comprises the following steps;

step 1: inputting a data set;

inputting medical record data and extracting key attributes

step 2, size screening;

step 2.1 calculating the fractal dimension D when q of D (A) is-5, 2, 5_-5、D₂、D₅，

And a corresponding fractal scale region;

step 2.4 selecting the maximum fractal dimension r_maxAs an output scale;

and step 3: sample reduction

Step 3.1 pruning of fractal samples

step 3.2 Retention of r_maxA scale sample;

step 4, attribute reducer

(1) initialization: data set D ═ { a, E }, a ═ a }₁，A₁，…，A_mE denotes a set of objects comprising n tuples, k_max,W＝{W₁，W₂，…，W_m}

(2) r ← calculating fractal dimension of initial dataset d (a)