WO2021120934A1 - 一种基于卷积神经网络的DRGs自动分组方法 - Google Patents

一种基于卷积神经网络的DRGs自动分组方法 Download PDF

Info

Publication number
WO2021120934A1
WO2021120934A1 PCT/CN2020/128369 CN2020128369W WO2021120934A1 WO 2021120934 A1 WO2021120934 A1 WO 2021120934A1 CN 2020128369 W CN2020128369 W CN 2020128369W WO 2021120934 A1 WO2021120934 A1 WO 2021120934A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
convolutional neural
data
drgs
grouping
Prior art date
Application number
PCT/CN2020/128369
Other languages
English (en)
French (fr)
Inventor
吴健
陈晋泰
陈婷婷
应豪超
雷璧闻
刘雪晨
宋庆宇
张久成
姜晓红
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Priority to US17/627,622 priority Critical patent/US20220319706A1/en
Publication of WO2021120934A1 publication Critical patent/WO2021120934A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms

Definitions

  • the invention belongs to the field of computer medical technology, and in particular relates to a method for automatic grouping of DRGs based on a convolutional neural network.
  • the post-payment system of medical insurance funds can easily stimulate excessive medical services, and the pre-payment system can easily cause defects such as shirking patients and reducing medical services, resulting in a continuous increase in total health costs and expenditures on medical insurance funds With a sharp rise, medical insurance funds in many regions are facing the risk of insufficient funds.
  • DRGs Diagnosis Related Groups, disease diagnosis related groupings
  • DRGs Diagnosis Related Groups, disease diagnosis related groupings
  • case combination method which mainly group cases according to the principle of similar clinical process and similar cost and consumption. Pay according to different groups of diseases, and provide targeted treatment to avoid the waste of medical resources.
  • the population structure, health status, and economic development level of different regions are different. It is necessary to establish a grouping system that adapts to local characteristics and adjust the grouping system according to the operating results.
  • the proportion of cases in the hospital accounts for the proportion of all cases in the hospital, and the average cost of the DRGi group of cases represents the average cost of the i-th DRG group.
  • the Chinese patent document with the publication number CN107463771A discloses a method and system for grouping cases, including: obtaining case information, dividing it into corresponding basic groups according to the main diagnostic codes and operation codes in the case information, and obtaining basic group codes and Basic group name; when the main diagnosis corresponding to the main diagnosis code does not belong to the length of stay-affected type, or the basic group does not belong to the specific basic group, the diagnosis complexity score corresponding to each diagnosis code is calculated according to the basic group code and each diagnosis code ;According to the diagnostic complexity score corresponding to each diagnostic code, the disease complexity index corresponding to the case information is calculated; according to the disease complexity index, the case information is divided into subdivision groups from the basic group to obtain the disease diagnosis-related group code and disease diagnosis-related grouping Name and relative weight of disease diagnosis related groups, complete case grouping.
  • the present invention provides an automatic grouping method for DRGs based on a convolutional neural network, which can automatically classify disease types based on the actual information of the data.
  • a method for automatic grouping of DRGs based on convolutional neural network including the following steps:
  • step (3) Construct a convolutional neural network model and use the data obtained in step (2) for iterative training.
  • use the k-means clustering method to cluster the feature vectors extracted by the convolutional neural network to obtain k category labels ,Combine category labels and classifier-supervised convolutional neural network for iterative training;
  • the data to be divided is digitally encoded and then input into the trained model for grouping.
  • the method of the present invention avoids the shortcomings of manual feature selection and newly added grouping categories for additional labeling of data, and automatic learning grouping can be performed for data that is fuzzy and difficult to group.
  • step (2) when performing digital coding processing, the pathological data is digitized and uniformly converted into the range of 0 to 1.
  • the conversion formula is as follows:
  • V c is the current value to be calculated
  • V min and V max are the minimum and maximum values in the serial number, respectively.
  • step (3) a shallow convolutional neural network with a 3-layer convolutional layer is used to extract features from the data. .
  • step (3) the training process of the convolutional neural network model is as follows:
  • (3-1) use the convolutional neural network to extract the features of the encoded data.
  • f(x,y) is the input data
  • g(x,y) is the convolution kernel function
  • m and n are the length and width of the convolution kernel respectively.
  • the purpose of feature extraction is to synthesize different information of data and find the correlation between various information.
  • step (3-2) Pass the feature vector extracted by the convolutional neural network in step (3-1) into the k-means clusterer for classification, and use the cosine distance to calculate the distance between the two types of vectors. Divide into a cluster, use the shortest distance between all members of a certain cluster and all the members of another cluster to measure the distance between clusters, and finally use the largest distance between clusters as the shortest distance. For better results, the corresponding k value is automatically selected according to the clustering effect.
  • a and b are two different feature vectors.
  • step (3-2) use the k categories obtained in step (3-2) as data labels, use the regression model and the loss measurement function to measure the learning effect of the network, and supervise the neural network learning until the network model converges.
  • the regression model can be used for the softmax method of multi-classification problems.
  • the calculation method is as follows:
  • Z j is the output of the jth neuron
  • N is the total number of categories
  • P(z) j is the probability value of the jth category
  • the model outputs a probability value for each category
  • N categories are There are N probability values.
  • y i is the label of the i-th category
  • M is the number of samples.
  • the present invention has the following beneficial effects:
  • the method of the present invention combines the convolutional neural network with the k-means clustering method, uses the advantages of automatic feature extraction and automatic optimization of the convolutional neural network, extracts the connections between various features, and uses the labels generated by the clustering method Act in the classifier of the neural network, and then supervise the training and learning of the neural network, forming a method for automatically optimizing the grouping effect. For situations where it is difficult to group by using conventional grouping rules, this method can combine all the information of the actual data to group, and can add data to optimize the grouping effect without additional workload.
  • Fig. 1 is a schematic flowchart of a method for automatic grouping of DRGs based on a convolutional neural network according to the present invention.
  • a DRGs automatic grouping method based on convolutional neural network includes the following steps:
  • S1 Collect case data and divide the cases into their corresponding groups according to the main diagnosis categories and core disease diagnosis-related grouping methods.
  • the training data is performed in an optional group of the core disease diagnosis related group.
  • S2 encode the data.
  • the actual data is structured data described in text.
  • the data needs to be coded into digital form and input into the convolutional network for learning, and the data is digitized and uniformly limited to the range of 0 to 1.
  • V c is the current value to be calculated
  • V min and V max are the minimum and maximum values in the serial number, respectively.
  • the blood type column generally has 6 types: A, B, O, AB, unknown, and unchecked.
  • the serial numbers can be assigned as 1, 2, 3, 4, 5, and 0 respectively.
  • the serial number corresponding to A is 0, B
  • the corresponding serial number is 1, and the converted values are 0.2 and 0.4 respectively.
  • S3 construct a convolutional neural network to perform iterative training on the data obtained in S2, perform k-means clustering on the feature information output by the network to obtain k category labels, and then combine the network's classifier and category label to supervise neural network training.
  • the network structure of the first three residual blocks of ResNet is selected, and the convolution uses 1 dimension.
  • the network composed of convolution kernels extracts the features of the data.
  • the convolution method can combine the information of various types of data and has good semantic information.
  • the calculation formula is as follows:
  • f(x,y) is the input data
  • g(x,y) is the convolution kernel function
  • m and n are the length and width of the convolution kernel respectively.
  • the purpose of feature extraction is to synthesize different information of data and find the correlation between various information.
  • S3-2 input the various feature information vectors output by S3-1 into the k-means clustering method, use the cosine similarity method to measure the distance between the various vectors, optimize the clustering algorithm, and divide the feature vector into k category.
  • the initial value of k of k-means is determined according to the grouping rules of core disease diagnosis related groups. For example, according to the grouping rules, the preliminary grouping of diseases and related operations is divided into 9 groups. If this grouped data is trained, the initial value of k Tentatively set to 9. In the calculation of the clustering method, the value of k is adjusted according to the clustering effect.
  • the principle of distance between feature vectors is used to determine whether they belong to the same cluster. If the distance between two feature vectors is small, they are the same cluster, otherwise they are different clusters. Use the shortest distance between all members of a certain type of cluster and all the members of another type to measure the distance between clusters. Finally, the maximum distance between clusters is the best effect, and the cosine distance is used in the calculation. To measure the distance between feature vectors, the calculation formula is as follows:
  • a and b are two different feature vectors.
  • step S3-3 using the k categories obtained in step S3-2 as data labels, and using a regression model and a loss measurement function to measure the learning effect of the network.
  • the present invention selects the softmax method that can be used for multi-classification problems, and its calculation formula is as follows:
  • Z j is the output of the jth neuron
  • N is the total number of categories
  • P(z) j is the probability value of the jth category.
  • the model outputs a probability value for each category, and N categories have N probability values.
  • y i is the label of the i-th category
  • M is the number of samples.
  • the data to be divided is encoded and input into the classification model, which automatically divides the corresponding groups.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于卷积神经网络的DRGs自动分组方法,包括:收集并根据主要诊断大类和核心疾病诊断相关分组的方式分组;对数据进行数字化编码;构建浅层的卷积神经网络模型,使用k-means聚类方法对卷积网络提取的特征向量进行聚类得到k个类别标签,结合类别标签和分类器监督网络进行迭代训练;模型训练完毕,进行数据分组应用。利用该方法,避免了人工特征选取和新增分组类别进行额外标注数据的缺点,对于分组模糊、困难的数据可以进行自动学习分组。

Description

一种基于卷积神经网络的DRGs自动分组方法 技术领域
本发明属于计算机医疗技术领域,尤其是涉及一种基于卷积神经网络的DRGs自动分组方法。
背景技术
目前人口老龄化、新科学技术的发展,医疗保险基金的后付制容易刺激过度医疗服务、预付制容易造成推诿重病人减少医疗服务等缺陷,造成总卫生费用不断上涨,医疗保险基金的支出也大幅度上涨,许多地区的医保基金面临资金不足的风险。
DRGs(Diagnosis Related Groups,疾病诊断相关分组),是一种病例组合方式,主要依据临床过程相近、费用消耗相似的原则将病例进行分组。根据不同分组的疾病进行付费,针对性的治疗,避免医疗资源的浪费。但由于经济发展及医疗水平的不均衡,各地不同地区人口结构、健康状况、经济发展水平等各不相同,需要建立适应本地特点的分组体系,并根据运行结果调整分组体系。
公开号为CN110289088A的中国专利文献公开了一种基于DRGs的大数据智能管理方法及系统,包括:将某地区某医院的全年住院病例的住院病案首页数据放进DRG分组器,按照DRG分组原则(根据疾病诊断、手术操作、并发症/合并症、年龄、严重程度等)进行分组以获得n个DRG组以及每一个DRG组的权重数和例数、相应的住院天数和费用分布;计算该医院住院病例总权重数;计算病例组合指数(CMI)值=该医院的总权重 数/该医院的总住院病例数;计算第i个DRG组的相对权重RWi,分析该院相对权重RWi>2的病例占该院所有病例的比例,该DRGi组病例的平均费用表示第i个DRG组的平均费用。
公开号为CN107463771A的中国专利文献公开了一种病例分组的方法和系统,包括:获取病例信息,根据病例信息中的主要诊断编码和操作编码将其分入对应的基本组,得到基本组编码和基本组名称;当主要诊断编码对应的主要诊断不属于住院时间影响型,或者,基本组不属于特定基本组时,根据基本组编码和各诊断编码,计算得到各诊断编码对应的诊断复杂性得分;根据各诊断编码对应的诊断复杂性得分,计算得到病例信息对应的疾病复杂指数;根据疾病复杂指数,将病例信息从基本组分入细分组,得到疾病诊断相关分组代码、疾病诊断相关分组名称和疾病诊断相关分组相对权重,完成病例分组。
然而各地区某些病种的分组可能存在争议,用常规方式可能存在不同的分组,因此亟需设计一种可以综合各种实际信息对分组较困难的类别进行划分的方法。
发明内容
为解决现有技术存在的上述问题,本发明提供了一种基于卷积神经网络的DRGs自动分组方法,可以综合数据的实际信息对病种进行自动划分。
一种基于卷积神经网络的DRGs自动分组方法,包括以下步骤:
(1)收集病例数据并将病例按照主要诊断大类和核心疾病诊断相关分组的方式进行划分,将病例数据划分至各自对应的组别中,作为训练数据集;
(2)对训练数据集中的病例数据进行数字编码处理,将文字描述数据转换为对应的数字形式;
(3)构建卷积神经网络模型并采用步骤(2)得到的数据进行迭代训 练,训练过程中,使用k-means聚类方法对卷积神经网络提取的特征向量进行聚类得到k个类别标签,结合类别标签和分类器监督卷积神经网络进行迭代训练;
(4)模型训练完毕后,将待划分的数据进行数字编码后输入到训练完毕的模型中进行分组。
利用本发明的方法,避免了人工特征选取和新增分组类别进行额外标注数据的缺点,对于分组模糊、困难的数据可以进行自动学习分组。
步骤(2)中,进行数字编码处理时,将病理数据数值化并统一转换为0至1范围内,转换公式如下:
Figure PCTCN2020128369-appb-000001
其中,V c为当前待计算的数值,V min、V max分别为序号中的最小值、最大值。
因数据相对图像信息量较少,流行的网络结构大多层数较深容易造成数据过拟合情况,步骤(3)中,采用3层卷积层的浅层卷积神经网络对数据进行特征提取。
步骤(3)中,卷积神经网络模型的训练过程如下:
(3-1),使用卷积神经网络对编码后的数据进行特征提取。
提取特征使用的卷积计算公式如下:
Figure PCTCN2020128369-appb-000002
其中,f(x,y)是输入数据,g(x,y)是卷积核函数,m与n分别为卷积核的长、宽。特征提取目的是综合数据的不同信息,寻找各种信息之间的关联性。
(3-2),将步骤(3-1)卷积神经网络提取特征后的特征向量传入到k-means聚类器中进行分类,使用余弦距离计算两类向量之间的距离,距 离较近的划分至一个类簇,用某一类簇所有成员到另一类所有成员之间的最短两点之间的距离度量类簇之间的距离,最终以类簇之间的距离最大为最佳效果,根据聚类效果自动选择对应的k值。
所述的余弦距离的计算公式如下:
Figure PCTCN2020128369-appb-000003
其中,a、b是两个不同的特征向量。
(3-3),将步骤(3-2)得到的k个类别作为数据的标签,使用回归模型和损失度量函数对网络的学习效果进行度量,监督神经网络学习,直至网络模型收敛。
因划分的类别可能存在多种,所述的回归模型选择可用于多分类问题的softmax方法,计算方式如下:
Figure PCTCN2020128369-appb-000004
其中,Z j是第j个神经元的输出量,N是总的类别数量,P(z) j是第j个类别的概率值;模型对于每一个类别都输出一个概率值,N个类别则有N个概率值。
上述的损失度量函数为交叉熵,计算公式如下:
Figure PCTCN2020128369-appb-000005
其中,y i为第i个类别的标签,
Figure PCTCN2020128369-appb-000006
为预测为第i个类别的概率值,M为样本的数量。
与现有技术相比,本发明具有以下有益效果:
本发明的方法,通过卷积神经网络与k-means聚类方法相结合,利用卷积神经网络自动提取特征和自动优化的优势,提取各种特征之间的联系,使用聚类方法生成的标签作用到神经网络的分类器中,进而监督神经网络的训练学习,形成自动优化分组效果的方法。对使用常规分组规则较难分 组的情况,此方法可以结合实际数据的所有信息进行分组,并且可以在不额外增加工作量的情况下增添数据来优化分组效果。
附图说明
图1为本发明一种基于卷积神经网络的DRGs自动分组方法流程示意图。
具体实施方式
下面结合附图和实施例对本发明做进一步详细描述,需要指出的是,以下所述实施例旨在便于对本发明的理解,而对其不起任何限定作用。
如图1所示,一种基于卷积神经网络的DRGs自动分组方法,包括以下步骤:
S1,收集病例数据并将病例情况按照主要诊断大类和核心疾病诊断相关分组的方法划分至各自对应的组别中。在本实施例中,训练数据是在核心疾病诊断相关分组的任选一组进行。
S2,对数据进行编码。实际的数据是以文字描述的结构化数据,需要将数据编码为数字形式输入卷积网络进行学习,将数据数值化并统一限定在0至1的范围。
S2-1,本实施中对疾病有无的情况使用0、1方式进行编码;
S2-2,对于就诊科别、血型、手术级别及操作名称等已有标准的数据,通过将各种类别排序,使用0,1,…,n的序号对各类别进行标记,然后将序号数值转换为0至1对应的数值,计算公式如下:
Figure PCTCN2020128369-appb-000007
其中V c为当前待计算的数值,V min、V max分别为序号中的最小值、最大值。
以血型为例,血型栏一般有A、B、O、AB、不详、未查共6种,可以分别分配序号为1、2、3、4、5、0,A对应的序号为0,B对应的序号为1,转换后的数值分别是0.2、0.4。
S2-3,对于年龄、治疗费用类数据,S2-2中的公式同样适用,不同的是最小值和最大值从待训练的数据集中提取。
S3,构建卷积神经网络对S2得到的数据进行迭代训练,对网络输出的特征信息进行k-means聚类,得到k个类别标签,再结合网络的分类器和类别标签监督神经网络训练。
S3-1,因数据相对图像信息量较少,流行的网络结构大多层数较深容易造成数据过拟合情况,因此实例中选择ResNet前3层残差块的网络结构,卷积使用1维卷积核组成的网络对数据进行特征提取,卷积方式可以结合各类数据的信息,具有较好的语义信息,其计算公式如下:
Figure PCTCN2020128369-appb-000008
其中,f(x,y)是输入数据,g(x,y)是卷积核函数,m与n分别为卷积核的长、宽。特征提取目的是综合数据的不同信息,寻找各种信息之间的关联性。
S3-2,将S3-1输出的各种特征信息向量传入k-means聚类方法,使用余弦相似度方法度量各种向量之间的距离,优化聚类算法,将特征向量划分为k个类别。
其中k-means的k初始值根据核心疾病诊断相关组的分组规则来确定,如根据分组规则,先期分组疾病及相关操作分组初步划分为9组,若对此分组数据进行训练时k的初始值暂定为9。在聚类方法的计算中再根据聚类效果调整k值。
聚类训练中,通过特征向量之间距离的原则来判断是否为同一个类 簇,若两特征向量的距离较小则为同一个类簇,否则为不同的类簇。用某一类簇所有成员到另一类所有成员之间的最短两点之间的距离度量类簇之间的距离,最终以类簇之间的距离最大为最佳效果,计算中使用余弦距离来度量特征向量之间的距离,其计算公式如下:
Figure PCTCN2020128369-appb-000009
其中,a、b是两个不同的特征向量。
S3-3,将步骤S3-2得到的k个类别作为数据的标签,使用回归模型和损失度量函数对网络的学习效果进行度量。
因划分的类别可能存在多种,本发明选择可用于多分类问题的softmax方法,其计算公式如下:
Figure PCTCN2020128369-appb-000010
其中,Z j是第j个神经元的输出量,N是总的类别数量,P(z) j是第j个类别的概率值。模型对于每一个类别都输出一个概率值,N个类别则有N个概率值。
上述的损失度量函数使用交叉熵,其计算公式如下:
Figure PCTCN2020128369-appb-000011
其中,y i为第i个类别的标签,
Figure PCTCN2020128369-appb-000012
为预测为第i个类别的概率值,M为样本的数量。以最小化损失度量函数的方向对网络进行迭代训练,以使网络达到最好的分类效果。
在具体应用时,将待划分的数据进行编码后输入到分类模型中,其自动划分出对应的分组。
以上所述的实施例对本发明的技术方案和有益效果进行了详细说明,应理解的是以上所述仅为本发明的具体实施例,并不用于限制本发明,凡在本发明的原则范围内所做的任何修改、补充和等同替换,均应包含在本 发明的保护范围之内。

Claims (8)

  1. 一种基于卷积神经网络的DRGs自动分组方法,其特征在于,包括:
    (1)收集病例数据并将病例按照主要诊断大类和核心疾病诊断相关分组的方式进行划分,将病例数据划分至各自对应的组别中,作为训练数据集;
    (2)对训练数据集中的病例数据进行数字编码处理,将文字描述数据转换为对应的数字形式;
    (3)构建卷积神经网络模型并采用步骤(2)得到的数据进行迭代训练,训练过程中,使用k-means聚类方法对卷积神经网络提取的特征向量进行聚类得到k个类别标签,结合类别标签和分类器监督卷积神经网络进行迭代训练;
    (4)模型训练完毕后,将待划分的数据进行数字编码后输入到训练完毕的模型中进行分组。
  2. 根据权利要求1所述的基于卷积神经网络的DRGs自动分组方法,其特征在于,步骤(2)中,进行数字编码处理时,将病理数据数值化并统一转换为0至1范围内,转换公式如下:
    Figure PCTCN2020128369-appb-100001
    其中,V c为当前待计算的数值,V min、V max分别为序号中的最小值、最大值。
  3. 根据权利要求1所述的基于卷积神经网络的DRGs自动分组方法,其特征在于,步骤(3)中,采用3层卷积层的浅层卷积神经网络对数据进行特征提取。
  4. 根据权利要求1所述的基于卷积神经网络的DRGs自动分组方法, 其特征在于,步骤(3)中,卷积神经网络模型的训练过程如下:
    (3-1),使用卷积神经网络对编码后的数据进行特征提取;
    (3-2),将步骤(3-1)卷积神经网络提取特征后的特征向量传入到k-means聚类器中进行分类,使用余弦距离计算两类向量之间的距离,距离较近的划分至一个类簇,用某一类簇所有成员到另一类所有成员之间的最短两点之间的距离度量类簇之间的距离,最终以类簇之间的距离最大为最佳效果,根据聚类效果自动选择对应的k值;
    (3-3),将步骤(3-2)得到的k个类别作为数据的标签,使用回归模型和损失度量函数对网络的学习效果进行度量,监督神经网络学习,直至网络模型收敛。
  5. 根据权利要求4所述的基于卷积神经网络的DRGs自动分组方法,其特征在于,步骤(3-1)中,提取特征使用的卷积计算公式如下:
    Figure PCTCN2020128369-appb-100002
    其中,f(x,y)是输入数据,g(x,y)是卷积核函数,m与n分别为卷积核的长、宽。
  6. 根据权利要求4所述的基于卷积神经网络的DRGs自动分组方法,其特征在于,步骤(3-2)中,所述的余弦距离计算公式如下:
    Figure PCTCN2020128369-appb-100003
    其中,a、b是两个不同的特征向量。
  7. 根据权利要求4所述的基于卷积神经网络的DRGs自动分组方法,其特征在于,步骤(3-3)中,所述的回归模型为softmax方法,计算方式如下:
    Figure PCTCN2020128369-appb-100004
    其中,Z j是第j个神经元的输出量,N是总的类别数量,P(z) j是第j个类别的概率值;模型对于每一个类别都输出一个概率值,N个类别则有N个概率值。
  8. 根据权利要求4所述的基于卷积神经网络的DRGs自动分组方法,其特征在于,步骤(3-3)中,所述的损失度量函数为交叉熵,计算公式如下:
    Figure PCTCN2020128369-appb-100005
    其中,y i为第i个类别的标签,
    Figure PCTCN2020128369-appb-100006
    为预测为第i个类别的概率值,M为样本的数量。
PCT/CN2020/128369 2019-12-18 2020-11-12 一种基于卷积神经网络的DRGs自动分组方法 WO2021120934A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/627,622 US20220319706A1 (en) 2019-12-18 2020-11-12 A drgs automatic grouping method based on a convolutional neural network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911310269.9A CN111161814A (zh) 2019-12-18 2019-12-18 一种基于卷积神经网络的DRGs自动分组方法
CN201911310269.9 2019-12-18

Publications (1)

Publication Number Publication Date
WO2021120934A1 true WO2021120934A1 (zh) 2021-06-24

Family

ID=70557684

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/128369 WO2021120934A1 (zh) 2019-12-18 2020-11-12 一种基于卷积神经网络的DRGs自动分组方法

Country Status (3)

Country Link
US (1) US20220319706A1 (zh)
CN (1) CN111161814A (zh)
WO (1) WO2021120934A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114637263A (zh) * 2022-03-15 2022-06-17 中国石油大学(北京) 一种异常工况实时监测方法、装置、设备及存储介质
CN114677071A (zh) * 2022-05-31 2022-06-28 创智和宇信息技术股份有限公司 基于概率分析的医嘱数据质控方法、系统及存储介质
CN115934661A (zh) * 2023-03-02 2023-04-07 浪潮电子信息产业股份有限公司 一种图神经网络压缩方法、装置、电子设备及存储介质
CN116150698A (zh) * 2022-09-08 2023-05-23 天津大学 一种基于语义信息融合的drg自动分组方法及系统

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161814A (zh) * 2019-12-18 2020-05-15 浙江大学 一种基于卷积神经网络的DRGs自动分组方法
CN113688854A (zh) * 2020-05-19 2021-11-23 阿里巴巴集团控股有限公司 数据处理方法、装置及计算设备
CN112885481A (zh) * 2021-03-09 2021-06-01 联仁健康医疗大数据科技股份有限公司 病例分组方法、装置、电子设备及存储介质
CN113109869A (zh) * 2021-03-30 2021-07-13 成都理工大学 一种页岩超声测试波形初至的自动拾取方法
CN113705700A (zh) * 2021-08-31 2021-11-26 平安科技(深圳)有限公司 医疗数据分类方法、装置、设备及存储介质
CN113729715A (zh) * 2021-10-11 2021-12-03 山东大学 一种基于手指压力的帕金森病症智能诊断系统
CN116127402B (zh) * 2022-09-08 2023-08-22 天津大学 一种融合icd层级特征的drg自动分组方法及系统
CN117093920B (zh) * 2023-10-20 2024-01-23 四川互慧软件有限公司 一种用户DRGs分组方法
CN118351340B (zh) * 2024-06-17 2024-08-20 中国海洋大学 基于样本挖掘的双分支无监督目标重识别方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920501A (zh) * 2019-01-24 2019-06-21 西安交通大学 基于卷积神经网络和主动学习的电子病历分类方法及系统
CN110164519A (zh) * 2019-05-06 2019-08-23 北京工业大学 一种基于众智网络的用于处理电子病历混合数据的分类方法
CN111161814A (zh) * 2019-12-18 2020-05-15 浙江大学 一种基于卷积神经网络的DRGs自动分组方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874923A (zh) * 2015-12-14 2017-06-20 阿里巴巴集团控股有限公司 一种商品的风格分类确定方法及装置
CN106203330A (zh) * 2016-07-08 2016-12-07 西安理工大学 一种基于卷积神经网络的车辆分类方法
CN109934719A (zh) * 2017-12-18 2019-06-25 北京亚信数据有限公司 医保违规行为的检测方法及检测装置、医保控费系统
CN109411082B (zh) * 2018-11-08 2022-01-04 西华大学 一种医疗质量评价及就诊推荐方法
CN109817339B (zh) * 2018-12-14 2023-07-04 平安医疗健康管理股份有限公司 基于大数据的患者分组方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920501A (zh) * 2019-01-24 2019-06-21 西安交通大学 基于卷积神经网络和主动学习的电子病历分类方法及系统
CN110164519A (zh) * 2019-05-06 2019-08-23 北京工业大学 一种基于众智网络的用于处理电子病历混合数据的分类方法
CN111161814A (zh) * 2019-12-18 2020-05-15 浙江大学 一种基于卷积神经网络的DRGs自动分组方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FAN YANG; GONGSHEN LIU; KUI MENG; ZHAOYING SUN: "Neural Feedback Text Clustering With BiLSTM-CNN-Kmeans", IEEE ACCESS, IEEE, USA, vol. 6, 1 January 1900 (1900-01-01), USA, pages 57460 - 57469, XP011703483, DOI: 10.1109/ACCESS.2018.2873327 *
SWATHI B, LAKSHMI, SINI P, RAJ, VIKRAM R RAJ: "Sentiment Analysis Using Deep Learning Technique CNN with KMeans", INTERNATIONAL JOURNAL OF PURE AND APPLIED MATHEMATICS VOLUME, vol. 114, no. 11, 1 January 2017 (2017-01-01), pages 47 - 57, XP055821584 *
XU, SI, SUN REN-CHENG: "Semi-supervised Classification Method Combined with Clustering", JOURNAL OF QINGDAO UNIVERSITY(NATURAL SCIENCE EDITION), vol. 31, no. 4, 1 November 2018 (2018-11-01), pages 49 - 53, XP055821613 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114637263A (zh) * 2022-03-15 2022-06-17 中国石油大学(北京) 一种异常工况实时监测方法、装置、设备及存储介质
CN114637263B (zh) * 2022-03-15 2024-01-12 中国石油大学(北京) 一种异常工况实时监测方法、装置、设备及存储介质
CN114677071A (zh) * 2022-05-31 2022-06-28 创智和宇信息技术股份有限公司 基于概率分析的医嘱数据质控方法、系统及存储介质
CN114677071B (zh) * 2022-05-31 2022-08-02 创智和宇信息技术股份有限公司 基于概率分析的医嘱数据质控方法、系统及存储介质
CN116150698A (zh) * 2022-09-08 2023-05-23 天津大学 一种基于语义信息融合的drg自动分组方法及系统
CN116150698B (zh) * 2022-09-08 2023-08-22 天津大学 一种基于语义信息融合的drg自动分组方法及系统
CN115934661A (zh) * 2023-03-02 2023-04-07 浪潮电子信息产业股份有限公司 一种图神经网络压缩方法、装置、电子设备及存储介质
CN115934661B (zh) * 2023-03-02 2023-07-14 浪潮电子信息产业股份有限公司 一种图神经网络压缩方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN111161814A (zh) 2020-05-15
US20220319706A1 (en) 2022-10-06

Similar Documents

Publication Publication Date Title
WO2021120934A1 (zh) 一种基于卷积神经网络的DRGs自动分组方法
CN109411082B (zh) 一种医疗质量评价及就诊推荐方法
Chen et al. Deep feature learning for medical image analysis with convolutional autoencoder neural network
Li et al. Medical data stream distribution pattern association rule mining algorithm based on density estimation
Aslan et al. Multi-classification deep CNN model for diagnosing COVID-19 using iterative neighborhood component analysis and iterative ReliefF feature selection techniques with X-ray images
CN112434718B (zh) 基于深度图的新冠肺炎多模态特征提取融合方法及系统
Jiang et al. A hybrid intelligent model for acute hypotensive episode prediction with large-scale data
CN111061700A (zh) 基于相似性学习的就医迁移方案推荐方法及系统
CN107704888A (zh) 一种基于联合聚类深度学习神经网络的数据识别方法
CN112256866A (zh) 一种基于深度学习的文本细粒度情感分析方法
Chen et al. A deep learning method for judicial decision support
Wang et al. Multiple valued logic approach for matching patient records in multiple databases
CN117954090A (zh) 一种基于多模态缺失数据患者的死亡率预测方法及系统
CN110930030A (zh) 医生技术水平评级方法
CN107480426A (zh) 自迭代病历档案聚类分析系统
CN110335160A (zh) 一种基于分组和注意力改进Bi-GRU的就医迁移行为预测方法及系统
CN112990270B (zh) 一种传统特征与深度特征的自动融合方法
Huang et al. Evaluating and boosting uncertainty quantification in classification
Krawczyk et al. Breast thermogram analysis using a cost-sensitive multiple classifier system
TWI768577B (zh) 以腰圍與身體質量指數預測睡眠陽壓呼吸器最適壓力的方法
CN112287665B (zh) 基于自然语言处理和集成训练的慢病数据分析方法及系统
Zhang et al. Cost-sensitive ensemble classification algorithm for medical image
CN109800384B (zh) 一种基于粗糙集信息决策表的基本概率赋值计算方法
CN114822734A (zh) 基于循环卷积神经网络的中医病案分析方法
CN114171206A (zh) 模型训练、传感病预测方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20903197

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20903197

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20903197

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23/01/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20903197

Country of ref document: EP

Kind code of ref document: A1