WO2021204038A1 - Multi-scale clinical pathway mining method and apparatus, computer device, and storage medium - Google Patents

Multi-scale clinical pathway mining method and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2021204038A1
WO2021204038A1 PCT/CN2021/084255 CN2021084255W WO2021204038A1 WO 2021204038 A1 WO2021204038 A1 WO 2021204038A1 CN 2021084255 W CN2021084255 W CN 2021084255W WO 2021204038 A1 WO2021204038 A1 WO 2021204038A1
Authority
WO
WIPO (PCT)
Prior art keywords
day
user
matrix
users
days
Prior art date
Application number
PCT/CN2021/084255
Other languages
French (fr)
Chinese (zh)
Inventor
蒋雪涵
唐蕊
孙行智
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021204038A1 publication Critical patent/WO2021204038A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Definitions

  • FIG. 1 is a schematic flowchart of a multi-scale clinical path mining method provided by an embodiment of the application
  • the S101 includes steps S201 to S203:
  • the items used by user a in all the days of hospitalization are n1, n2, n3, n4, and n5, and the items used by user b in all the days of hospitalization
  • the items are n1, n3, n5, n7, and n8.
  • the items used by user c in all the days of hospitalization are n4, n6, n7, n9, and n10, then n is 10, which represents n1, n2, n3, n4 , N5, n6, n7, n8, n9, n10 these 10 items.
  • the S102 includes steps S301 to S302:
  • the jaccard (Jaccard coefficient) distance can be used to calculate the similarity, and the calculation formula is as follows: Where
  • the S302 includes:
  • S502 Perform vector representation on all words in each user ⁇ day through word vector-based representation learning to obtain corresponding word vectors;
  • v day dot (V I, TFIDF), where v day represents the sentence vector of the user -day, V I denotes the respective item-user within a matrix representation, where I is the set of items of user-day, the V I Each row represents the word vector of an item, dot represents the inner product operation of the elements, and TFIDF represents the word frequency article specificity matrix;
  • TFIDF calculation formula for item i is: Where D i represents the total number of users ⁇ days that include item i, D represents the total number of all users ⁇ days, A i represents the total number of users that include item i, and A represents the total number of users;
  • the analogy language model can take each user ⁇ day as a sentence, and each item in each day as a word, and perform sentence-based representation learning.
  • a user ⁇ day is ⁇ item a, item b, item c ⁇ , which means that three items of "item a", “item b” and “item c" occurred during that day; convert them into sentences, that is, " Item a item b item c", this sentence consists of 3 words, these three words are "item a", "item b” and "item c”; then through the representation learning based on the word vector, get each The vector representation of the word, the corresponding word vector is obtained, and then the word vectors of all items in a day are weighted to obtain the corresponding user ⁇ day sentence vector.
  • the embodiment of the present application obtains the sentence vector by the method of word frequency weighting. Compared with directly applying the average of all word vectors to obtain the final expression, the embodiment of the present application can improve the accuracy of the sentence vector representation, which is more in line with the embodiment of the present application. Application scenarios.
  • S103 Use the core of clustering to represent the medical treatment path of each user, and serialize the medical treatment path of each user, and then dig out frequent sequences therefrom, and use the frequent sequences as the main clinical path.
  • the step S103 includes S601 to S605:
  • the medical treatment path sequence is ⁇ c1,c2,...,ci,...>, where ci represents the cluster core of the i-th user ⁇ day. That is, after clustering, may be used for treatment x represents the class path to the user, i.e. the user indicates a day, day substituted in the prior art set S i represented by a number of users.
  • the medical treatment path sequence obtained in the foregoing step S603 is ⁇ 1,1,1,3,3,3,3,6,8,8,9>, then the redundant repeated elements can be deleted, and the repeated elements can be retained One, the simplified sequence of medical treatment route ⁇ 1,3,6,8,9>.
  • S605. Use a sequence mining algorithm to dig out frequent sequences from the sequence of medical treatment routes, and use the frequent sequences as the main clinical route.
  • the items that frequently appear in the a-type user ⁇ tianzhong are: hospitalization examination fee, blood routine, urine test, blood coagulation function test, and b-type users ⁇ tianzhong frequent Items that appear are: ventilator, anesthesia fee, gauze, surgery fee, blood transfusion fee, etc.
  • the timing association rules can also be obtained. That is, through the main clinical path obtained, the time sequence relationship of user visits is obtained, that is, a certain type of user ⁇ day must occur before another type of user ⁇ day; in addition, through frequent set mining of each type of user ⁇ day, each type of user ⁇ day can be obtained.
  • the items that frequently appear in the sky can be combined to obtain a sequential association rule such as "the item 1 that frequently appears in the day must appear in the category b patient and the item 2 that appears frequently in the sky".
  • the rule that "preoperative coagulation function test must occur before intraoperative blood transfusion" can be obtained and applied to actual quality control. For example, if the patient has an intraoperative blood transfusion, the above-mentioned examination must be done before the operation.
  • the filling unit 803 is used to fill each row element of the item usage matrix according to the items used by each user every day.
  • the arranging unit 1003 is configured to arrange the calculated similarities in order to construct the distance matrix, and record the distance matrix as m*m, where the i-th row of the distance matrix is the jth row
  • the column element d ij represents the distance between the i-th user ⁇ day and the j-th user ⁇ day.
  • the distance matrix clustering unit 902 includes:
  • the word extracting unit 1101 is configured to obtain the items used by each user ⁇ day, and use the obtained items as words;
  • the word vector representation unit 1102 is configured to perform vector representation on all words in each user ⁇ day through word vector-based representation learning to obtain corresponding word vectors;
  • the core representation unit 1201 is used to separately represent the cores of each cluster using different numbers
  • the number representation unit 1202 is used to represent the user ⁇ day in each cluster using the number of the core of the corresponding cluster;
  • the above-mentioned multi-scale clinical path mining device 700 can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 13.
  • FIG. 13 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 1300 is a server, and the server may be an independent server or a server cluster composed of multiple servers.
  • the computer device 1300 includes a processor 1302, a memory, and a network interface 1305 connected through a system bus 1301, where the memory may include a non-volatile storage medium 1303 and an internal memory 1304.
  • the processor 1302 is used to provide computing and control capabilities, and support the operation of the entire computer device 1300.
  • the internal memory 1304 provides an environment for the operation of the computer program 13032 in the non-volatile storage medium 1303.
  • the processor 1302 can execute the multi-scale clinical path mining method.
  • the network interface 1305 is used for network communication, such as providing data information transmission.
  • the structure shown in FIG. 13 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 1300 to which the solution of the present application is applied.
  • the specific computer device The 1300 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
  • the embodiment of the computer device shown in FIG. 13 does not constitute a limitation on the specific structure of the computer device.
  • the computer device may include more or less components than those shown in the figure. Or some parts are combined, or different parts are arranged.
  • the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 13 and will not be repeated here.
  • the processor 1302 may be a central processing unit (Central Processing Unit, CPU), and the processor 1302 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable GateArray, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • a computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program is executed by the processor to implement the following steps: convert the item usage data used by multiple users every day into a item usage matrix, and record the item usage matrix as m *n, m represents the sum of all the days of hospitalization of all the users, n represents the number of all items, each row in the item usage matrix represents an item used by a user in a day; the item usage matrix Each row in is regarded as a user ⁇ day, and similar users ⁇ days are clustered according to the similarity between each user ⁇ day; the core of the clustering is used to represent the medical treatment path of each user, and The medical path of each user is serialized and expressed, and then frequent sequences are excavated from them, and the frequent sequences are used as the main clinical path.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Primary Health Care (AREA)
  • General Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A multi-scale clinical pathway mining method and apparatus, a computer device and a storage medium. The method comprises: converting usage data of items used by a plurality of users each day into an item usage matrix, and denoting the item usage matrix as m*n, wherein m represents the sum of all hospital stays of all the users, n represents the number of all items, and each row in the item usage matrix represents the items used by one user in one day (S101); taking each row in the item usage matrix as a user·day, and clustering similar users·days according to the similarities between the various users·days (S102); and using cores of clusters to represent doctor-visiting pathways of the various users, performing serialization representation on the doctor-visiting pathways of the various users, then mining frequent sequences from doctor-visiting pathway sequences, and taking the frequent sequences as the main clinical pathways (S103). By means of the present method, the rationality and variability of an actual clinical operation can be better reflected.

Description

多尺度临床路径挖掘方法、装置、计算机设备及存储介质Multi-scale clinical path mining method, device, computer equipment and storage medium
本申请要求于2020年11月12日提交中国专利局、申请号为202011260888.4,发明名称为“多尺度临床路径挖掘方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on November 12, 2020, the application number is 202011260888.4, and the invention title is "Multi-scale clinical pathway mining method, device, computer equipment and storage medium", and its entire content Incorporated in this application by reference.
技术领域Technical field
本申请涉及数据挖掘领域,特别涉及多尺度临床路径挖掘方法、装置、计算机设备及存储介质。This application relates to the field of data mining, and in particular to multi-scale clinical path mining methods, devices, computer equipment and storage media.
背景技术Background technique
随着医疗信息化程度的提高,电子病历逐渐取代了纸质病历,利用数据分析和人工智能的方法从中挖掘潜在的医学信息已成为一种趋势。如何从患者的时序就医数据中理解患者的就医行为,对于归纳患者主要临床路径、提取时序临床规则并进行质控至关重要。With the improvement of medical informatization, electronic medical records have gradually replaced paper medical records. It has become a trend to use data analysis and artificial intelligence methods to dig out potential medical information. How to understand the patient's medical treatment behavior from the patient's sequential medical treatment data is essential for summarizing the patient's main clinical path, extracting sequential clinical rules, and performing quality control.
规范患者的就医行为并进行质控的方案之一就是临床路径。临床路径是一种医疗服务管理的模式,通过对某种疾病或重大手术制定程序化和标准化的诊疗计划,达到规范医疗行为、减少医疗资源浪费的目的。目前已制定上千种的临床路径,然而发明人意识到,实际上完全按照已制定的临床路径进行医疗行为质控存在诸多问题,比如临床路径是按照通用的情况制定,并未考虑每个患者的具体情况,因此完全按照临床路径的质控会显得过于严格且无意义。也就是说现有的临床路径挖掘方式不具有灵活性和多变性。One of the programs to standardize patients' medical treatment and conduct quality control is the clinical pathway. The clinical path is a mode of medical service management. By formulating a procedural and standardized diagnosis and treatment plan for a certain disease or major operation, the goal of standardizing medical behavior and reducing the waste of medical resources is achieved. Thousands of clinical pathways have been formulated. However, the inventor realizes that there are many problems in the quality control of medical behaviors in accordance with the established clinical pathways. For example, the clinical pathways are formulated in accordance with general conditions, and each patient is not considered. Therefore, the quality control based on the clinical path will be too strict and meaningless. That is to say, the existing methods of clinical path mining are not flexible and changeable.
申请内容Application content
本申请的目的是提供多尺度临床路径挖掘方法、装置、计算机设备及存储介质,旨在解决现有的临床路径挖掘方式不具有灵活性和多变性的问题。The purpose of this application is to provide a multi-scale clinical path mining method, device, computer equipment and storage medium, aiming to solve the problem that the existing clinical path mining method does not have flexibility and variability.
第一方面,本申请实施例提供一种多尺度临床路径挖掘方法,其中,包括:In the first aspect, an embodiment of the present application provides a multi-scale clinical path mining method, which includes:
将多个用户每天所使用的项目使用数据转换为项目使用矩阵,并将所述项目使用矩阵记为m*n,m表示所有所述用户的所有住院天数的加和,n表示所有项目的数量,所述项目使用矩阵中的每一行代表一个用户在一天中所使用的项目;Convert the project use data used by multiple users every day into a project use matrix, and record the project use matrix as m*n, where m represents the sum of all days of hospitalization of all the users, and n represents the number of all projects , Each row in the item usage matrix represents an item used by a user in a day;
将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类;Taking each row in the item usage matrix as a user·day, and clustering similar users·days according to the similarity between each user·day;
使用聚类的核心对各所述用户的就医路径进行表示,并将各所述用户的就医路径进行序列化表示,然后从中挖掘出频繁序列,并将所述频繁序列作为主要临床路径。The core of clustering is used to represent the medical treatment path of each user, and the medical treatment path of each user is serialized and expressed, and then frequent sequences are excavated from them, and the frequent sequences are used as the main clinical path.
第二方面,本申请实施例提供一种多尺度临床路径挖掘装置,其中,包括:In the second aspect, an embodiment of the present application provides a multi-scale clinical path mining device, which includes:
转换单元,用于将多个用户每天所使用的项目使用数据转换为项目使用矩阵,并将所述项目使用矩阵记为m*n,m表示所有所述用户的所有住院天数的加和,n表示所有项目的数量,所述项目使用矩阵中的每一行代表一个用户在一天中所使用的项目;The conversion unit is used to convert the item usage data used by multiple users every day into a item usage matrix, and record the item usage matrix as m*n, where m represents the sum of all the days of hospitalization of all the users, n Represents the number of all items, and each row in the item usage matrix represents an item used by a user in a day;
聚类单元,用于将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类;A clustering unit, configured to use each row in the item usage matrix as a user·day, and cluster similar users·days according to the similarity between each user·day;
挖掘单元,用于使用聚类的核心对各所述用户的就医路径进行表示,并将各所述用户的就医路径进行序列化表示,然后从中挖掘出频繁序列,并将所述频繁序列作为主要临床路径。The mining unit is used to use the core of clustering to represent the medical treatment path of each user, and to serialize the medical treatment path of each user, and then mine the frequent sequence from it, and use the frequent sequence as the main Clinical path.
第三方面,本申请实施例提供一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如第一方面所述的多尺度临床路径挖掘方法。In a third aspect, embodiments of the present application provide a computer device, including a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor executes the computer program Time to realize the multi-scale clinical path mining method as described in the first aspect.
第四方面,本申请实施例提供一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行如第一方面所述的多尺度临床路径挖掘方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program that, when executed by a processor, causes the processor to execute The multi-scale clinical path mining method described in the aspect.
本申请实施例可以实现对时序临床数据的模式挖掘,从数据中得到真实的临床路径,能更好的反应临床的实际操作的合理性和多变性,且通过序列化表示解决了无序项集过多带来的时间和空间复杂度高的问题。The embodiment of the application can realize the pattern mining of time series clinical data, obtain the real clinical path from the data, can better reflect the rationality and variability of clinical actual operation, and solve the disordered item set through serialized representation Too much time and space complexity problems.
附图说明Description of the drawings
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.
图1为本申请实施例提供的多尺度临床路径挖掘方法的流程示意图;FIG. 1 is a schematic flowchart of a multi-scale clinical path mining method provided by an embodiment of the application;
图2为本申请实施例提供的多尺度临床路径挖掘方法的子流程示意图;FIG. 2 is a schematic diagram of a sub-process of a multi-scale clinical path mining method provided by an embodiment of the application;
图3为本申请实施例提供的多尺度临床路径挖掘方法的又一子流程示意图;FIG. 3 is a schematic diagram of another sub-process of the multi-scale clinical path mining method provided by an embodiment of the application;
图4为本申请实施例提供的多尺度临床路径挖掘方法的又一子流程示意图;4 is a schematic diagram of another sub-process of the multi-scale clinical path mining method provided by an embodiment of the application;
图5为本申请实施例提供的多尺度临床路径挖掘方法的又一子流程示意图;FIG. 5 is a schematic diagram of another sub-process of the multi-scale clinical path mining method provided by an embodiment of the application;
图6为本申请实施例提供的多尺度临床路径挖掘方法的又一子流程示意图;6 is a schematic diagram of another sub-process of the multi-scale clinical path mining method provided by an embodiment of the application;
图7为本申请实施例提供的多尺度临床路径挖掘装置的示意性框图;FIG. 7 is a schematic block diagram of a multi-scale clinical path mining device provided by an embodiment of the application;
图8为本申请实施例提供的多尺度临床路径挖掘装置的子单元示意性框图;8 is a schematic block diagram of subunits of the multi-scale clinical path mining device provided by an embodiment of the application;
图9为本申请实施例提供的多尺度临床路径挖掘装置的又一子单元示意性框图;9 is a schematic block diagram of another sub-unit of the multi-scale clinical path mining device provided by an embodiment of the application;
图10为本申请实施例提供的多尺度临床路径挖掘装置的又一子单元示意性框图;10 is a schematic block diagram of another subunit of the multi-scale clinical path mining device provided by an embodiment of the application;
图11为本申请实施例提供的多尺度临床路径挖掘装置的又一子单元示意性框图;11 is a schematic block diagram of another subunit of the multi-scale clinical path mining device provided by an embodiment of the application;
图12为本申请实施例提供的多尺度临床路径挖掘装置的又一子单元示意性框图;12 is a schematic block diagram of another subunit of the multi-scale clinical path mining device provided by an embodiment of the application;
图13为本申请实施例提供的计算机设备的示意性框图。FIG. 13 is a schematic block diagram of a computer device provided by an embodiment of the application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in this specification and appended claims, the terms "including" and "including" indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or The existence or addition of multiple other features, wholes, steps, operations, elements, components, and/or collections thereof.
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terms used in the specification of this application are only for the purpose of describing specific embodiments and are not intended to limit the application. As used in the specification of this application and the appended claims, unless the context clearly indicates other circumstances, the singular forms "a", "an" and "the" are intended to include plural forms.
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should be further understood that the term "and/or" used in the specification and appended claims of this application refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations .
请参阅图1,图1为本申请实施例提供的一种多尺度临床路径挖掘方法的流程示意图,包括步骤S101~S103:Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a multi-scale clinical path mining method provided by an embodiment of the application, including steps S101 to S103:
S101、将多个用户每天所使用的项目使用数据转换为项目使用矩阵,并将所述项目使用矩阵记为m*n,m表示所有所述用户的所有住院天数的加和,n表示所有项目的数量,所述项目使用矩阵中的每一行代表一个用户在一天中所使用的项目;S101. Convert project use data used by multiple users every day into a project use matrix, and record the project use matrix as m*n, where m represents the sum of all hospital days of all the users, and n represents all projects Each row in the item usage matrix represents an item used by a user in a day;
本步骤中,一个用户在每一天(即每一住院天)所使用的项目可能有重复的,可能有不重复的,为了对每个用户的项目使用数据统一规格,需将其转换为项目使用矩阵。In this step, the items used by a user every day (that is, every hospitalization day) may be repeated, and there may be non-repetitive items. In order to unify the specifications for each user’s project usage data, it needs to be converted to project use matrix.
在一实施例中,如图2所示,所述S101包括步骤S201~S203:In an embodiment, as shown in FIG. 2, the S101 includes steps S201 to S203:
S201、预先构建项目使用矩阵,其中,所述项目使用矩阵的行数为m,列数为n;S201. Pre-build a project usage matrix, where the number of rows of the project usage matrix is m and the number of columns is n;
其中的m即表示所有所述用户的所有住院天数的加和,例如用户a的住院天数为m1,用户b的住院天数为m2,用户s的住院天数为ms,那么m=m1+m2+...+ms。n表示所有项目的数量,需注意的是,此处的n不包含重复的项目,即n中的每一个项目均为唯一的。例如,假设目前用户3个用户a、用户b和用户c,用户a在其所有住院天数中所使用的项目为n1、n2、n3、n4和n5,用户b在其所有住院天数中所使用的项目为n1、n3、n5、n7和n8,用户c在其所有住院天数中所使用的项目为n4、n6、n7、n9和n10,那么n为10,这个10代表n1、n2、n3、n4、n5、n6、n7、n8、n9、n10这10个项目。当然,也可以事先获取医院所有项目的数量,并以该数量作为n,采样这种方式时,则有可能对于n中的某个项目,所有用户均未使用。Where m means the sum of all hospital days of all the users. For example, the hospital stay of user a is m1, the hospital stay of user b is m2, and the hospital stay of user s is ms, then m=m1+m2+... .+ms. n represents the number of all items. It should be noted that n here does not include duplicate items, that is, each item in n is unique. For example, suppose that there are currently three users: user a, user b, and user c. The items used by user a in all the days of hospitalization are n1, n2, n3, n4, and n5, and the items used by user b in all the days of hospitalization The items are n1, n3, n5, n7, and n8. The items used by user c in all the days of hospitalization are n4, n6, n7, n9, and n10, then n is 10, which represents n1, n2, n3, n4 , N5, n6, n7, n8, n9, n10 these 10 items. Of course, it is also possible to obtain the quantity of all items in the hospital in advance, and use the quantity as n. When sampling in this way, it is possible that none of the items in n is used by all users.
S202、获取每一用户在每一天所使用的项目;S202. Obtain the items used by each user every day;
每一用户在每一天所使用的项目表示每个用户在住院期间每天支出的项目,每个用户在每一天所使用的项目都可以用一个有序的序列表示:<{项目a,项目b,项目c,…},{项目b,项目d,…},…>,其中“<…>”中表示的元素是有顺序的,“<…>”的长度即用户住院的天数,“{…}”中表示的元素是没有顺序的,医院所有收费项目的集合是S,则“{…}”表示的是S的子集;The items used by each user on each day represent the items that each user spends every day during the hospitalization period. The items used by each user on each day can be represented in an orderly sequence: <{item a, item b, Item c, …}, {item b, item d, …}, …>, where the elements represented in "<...>" are in order, and the length of "<...>" is the number of days the user is hospitalized, "{... The elements indicated in }" are in no order. The set of all charging items in the hospital is S, then "{...}" means a subset of S;
S203、根据每一用户在每一天所使用的项目对所述项目使用矩阵的各行元素进行填充。S203: Fill each row element of the item usage matrix according to the items used by each user every day.
所述项目使用矩阵的每一行表示某用户某天住院中项目的使用情况,所述项目使用矩阵的每列表示某项目在不同的用户·天的使用情况,所述项目使用矩阵中的元素可以为0或1,0表示项目在对应的用户·天未使用,1表示项目在对应的用户·天使用。Each row of the item usage matrix represents the usage of the item in the hospital on a certain day, and each column of the item usage matrix represents the usage of a certain item in different users and days, and the elements in the item usage matrix can be It is 0 or 1, 0 means the item is not used in the corresponding user·day, and 1 means the item is used in the corresponding user·day.
其中“用户·天”表示所述项目使用矩阵中“某一个用户某一天”,在所述项目使用矩阵中,可以先按照用户的顺序依次排列,针对具体的某一用户则按照天的顺序依次排列。即第一行是第一个用户第一天的项目使用情况,第二行是第一个用户第二天的项目使用情况,以 此类推,例如第一个用户共有10天住院,那么第十行就代表第一个用户第十天的项目使用情况,第十一行就代表第二个用户第一天的项目使用情况,第十二行就代表第二个用户第二天的项目使用情况,各个用户的住院天数可能有所不同。“User·Day” means “a certain user on a certain day” in the project usage matrix. In the project usage matrix, it can be arranged in the order of users first, and for a specific user in the order of days arrangement. That is, the first line is the project usage of the first user on the first day, the second line is the project usage of the first user on the second day, and so on, for example, the first user has a total of 10 days in hospital, then the tenth The row represents the project usage of the first user on the tenth day, the eleventh row represents the project usage of the second user on the first day, and the twelfth row represents the project usage of the second user on the second day , The number of days in hospital for each user may vary.
S102、将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类;S102. Use each row in the item usage matrix as a user·day, and cluster similar users·days according to the similarity between the users·days;
本步骤是对项目使用矩阵中的每一个用户·天进行聚类,本申请实施例提供了两种方式对用户·天进行计算聚类。下面先对第一种方式进行说明。This step is to cluster each user·day in the project usage matrix, and the embodiment of the present application provides two ways to calculate and cluster the user·day. The first method is described below.
在一实施例中,如图3所示,所述S102包括步骤S301~S302:In an embodiment, as shown in FIG. 3, the S102 includes steps S301 to S302:
S301、根据所述项目使用矩阵计算各所述用户·天之间的相似度,根据各所述用户·天之间的相似度,构建得到各所述用户·天的距离矩阵,并将所述距离矩阵记为m*m;S301. Calculate the similarity between the users and days according to the project usage matrix, construct and obtain the distance matrix of the users and days according to the similarity between the users and days, and combine the The distance matrix is denoted as m*m;
S302、根据所述距离矩阵对相似的用户·天进行聚类。S302. Cluster similar users·days according to the distance matrix.
本实施例中,依据所述项目使用矩阵中各用户·天的数据计算各用户·天的相似度,并根据相似度来构建距离矩阵,然后根据距离矩阵来进行聚类。In this embodiment, the similarity of each user and day is calculated according to the data of each user and day in the project usage matrix, and the distance matrix is constructed according to the similarity, and then clustering is performed according to the distance matrix.
在一实施例中,如图4所示,所述S301包括步骤S401~S403:In an embodiment, as shown in FIG. 4, the S301 includes steps S401 to S403:
S401、从所述项目使用矩阵中抽取每一行的数据;S401. Extract the data of each row from the project usage matrix;
此步骤就是从项目使用矩阵中抽取每一行的数据,每一行的数据就代表了某个用户在某一天的项目使用情况,例如为{1,0,0,1,0,...,0}。其中的1代表使用了该用户在这一天使用了对应的项目,其中的0代表了该用户在这一天未使用对应的项目。This step is to extract the data of each row from the project usage matrix. The data of each row represents a user's project usage on a certain day, for example {1, 0, 0, 1, 0,..., 0 }. The 1 represents that the user used the corresponding item on this day, and the 0 represents that the user did not use the corresponding item on this day.
S402、按顺序计算每一行的数据与所有行的数据之间的相似度;S402: Calculate the similarity between the data of each row and the data of all rows in order;
例如,某一行的数据为{1,0,0,1,0,...,0},另外一行的数据为{0,0,1,0,0,...,1},那么可以计算出这两行的数据之间的相似度。按此方法可以计算出每一行的数据与所有行的数据之间的相似度,为了使后续的距离矩阵更规整,所以其中所有行的数据也包括了自身行的数据,即计算出每一行的数据与包括自身行在内的所有行的数据之间的相似度。For example, if the data in one row is {1, 0, 0, 1, 0,..., 0}, and the data in another row is {0, 0, 1, 0, 0,..., 1}, then you can Calculate the similarity between the two rows of data. According to this method, the similarity between the data of each row and the data of all rows can be calculated. In order to make the subsequent distance matrix more regular, the data of all rows also include the data of its own row, that is, the data of each row is calculated The similarity between the data and the data of all rows including its own row.
另外,本步骤优选按顺序进行相似度的计算,例如先计算第一行的数据与所有行的数据之间的相似度,然后计算出第二行的数据与所有行的数据之间的相似度,以此类推,直至计算出最后一行的数据与所有行的数据之间的相似度。In addition, this step preferably performs the calculation of similarity in order, for example, first calculate the similarity between the data of the first row and the data of all rows, and then calculate the similarity between the data of the second row and the data of all rows , And so on, until the similarity between the last row of data and all rows of data is calculated.
另外在计算某一行的数据与所有行的数据之间的相似度时,同样是按照顺序进行计算。例如在计算第三行的数据与所有行的数据之间的相似度时,即先计算第三行的数据与第一行的数据之间的相似度,然后计算第三行的数据与第二行的数据之间的相似度,再计算第三行的数据与第三行的数据之间的相似度,再计算第三行的数据与第四行的数据之间的相似度,以此类推,直至计算第三行的数据与最后一行的数据之间的相似度。In addition, when calculating the similarity between the data of a certain row and the data of all rows, the calculation is also performed in order. For example, when calculating the similarity between the data in the third row and the data in all rows, the similarity between the data in the third row and the data in the first row is calculated first, and then the data in the third row and the second row are calculated. The similarity between the rows of data, then calculate the similarity between the third row of data and the third row of data, and then calculate the similarity between the third row of data and the fourth row of data, and so on , Until calculating the similarity between the third row of data and the last row of data.
本申请实施例中,可以使用jaccard(杰卡德系数)距离计算相似度,其计算公式如下:
Figure PCTCN2021084255-appb-000001
其中|·|表示·的长度,S i表示第i个用户·天使用的项目集合(其中的i并非表示第i个用户,而是表示第i行的数据),S j表示第j用户·天使用的项目集合(其中的j并非表示第j个用户,而是表示第j行的数据)。
In the embodiment of this application, the jaccard (Jaccard coefficient) distance can be used to calculate the similarity, and the calculation formula is as follows:
Figure PCTCN2021084255-appb-000001
Where |·| indicates the length of ·, S i indicates the set of items used by the i-th user·day (where i does not indicate the i-th user, but the data of the i-th row), and S j indicates the j-th user· A collection of items used by days (where j does not represent the jth user, but represents the jth row of data).
S403、将所述计算出的相似度按顺序进行排列,构建得到所述距离矩阵,并将所述距离 矩阵记为m*m,其中,所述距离矩阵的第i行第j列元素d ij表示第i个用户·天和第j个用户·天的距离。 S403. Arrange the calculated similarities in order to construct and obtain the distance matrix, and record the distance matrix as m*m, where the element d ij in the i-th row and j-th column of the distance matrix Represents the distance between the i-th user·day and the j-th user·day.
此步骤中,将前面计算出的相似度按顺序插入到矩阵中,从而构建出距离矩阵。所述距离矩阵的排列形式可以是:第一行的元素表示项目使用矩阵中第一行的数据依次与所有行的数据之间的相似度,即所述距离矩阵中第一行第一列的元素表示项目使用矩阵中第一行的数据与第一行的数据之间的相似度,所述距离矩阵中第一行第二列的元素表示项目使用矩阵中第一行的数据与第二行的数据之间的相似度...所述距离矩阵中第一行第m列的元素表示项目使用矩阵中第一行的数据与最后一行的数据之间的相似度。第二行的元素表示项目使用矩阵中第二行的数据依次与所有行的数据之间的相似度,以此类推,最后一行表示项目使用矩阵中最后一行的数据依次与所有行的数据之间的相似度。In this step, the previously calculated similarities are inserted into the matrix in order to construct a distance matrix. The arrangement form of the distance matrix may be: the elements in the first row represent the similarity between the data in the first row of the project usage matrix and the data in all rows in turn, that is, the data in the first row and the first column of the distance matrix The element represents the similarity between the data in the first row and the data in the first row of the item usage matrix, and the element in the first row and second column in the distance matrix represents the data in the first row and the second row in the item usage matrix. The similarity between the data in the distance matrix. The element in the first row of the m-th column in the distance matrix represents the similarity between the data in the first row and the data in the last row in the project usage matrix. The elements in the second row represent the similarity between the data in the second row of the project usage matrix and the data in all rows, and so on, the last row represents the data in the last row of the project usage matrix and the data in all rows in turn的similarity.
在一实施例中,所述S302包括:In an embodiment, the S302 includes:
使用层次聚类的方式将所述距离矩阵中最近的两个元素聚为一类,并遍历全部元素,实现全局的聚类。A hierarchical clustering method is used to cluster the two closest elements in the distance matrix into one category, and all the elements are traversed to achieve global clustering.
本步骤是对相似的用户·天进行聚类,即根据使用项目的相似度,将不同的用户·天进行聚类。聚类的方式可以采用层次聚类。通过聚类可以获取距离矩阵中哪些元素更为相似,可以归为一类。This step is to cluster similar users and days, that is, to cluster different users and days according to the similarity of the used items. The way of clustering can be hierarchical clustering. Through clustering, which elements in the distance matrix are more similar can be obtained, which can be classified into one category.
由于距离矩阵中的元素表示了项目使用矩阵中不同行的数据之间的相似度,即不同的用户·天的相似度,所以对距离矩阵中的元素进行聚类,实际上也实现了对项目使用矩阵中用户·天的聚类,也即将项目使用矩阵中每一行的数据进行聚类。原有的项目使用矩阵中,一共有m个用户·天,经过聚类,假设共得到x个类别,那么一共得到x类的用户·天,其中m大于x,实际情况可能是m远大于x。Since the elements in the distance matrix represent the similarity between the data of different rows in the project usage matrix, that is, the similarity of different users and days, clustering the elements in the distance matrix actually realizes the project Use the clustering of users·days in the matrix, that is, use the data of each row in the matrix to cluster the items. In the original project usage matrix, there are a total of m users·days. After clustering, assuming that a total of x categories are obtained, then a total of x categories of users·days are obtained, where m is greater than x, and the actual situation may be that m is much greater than x. .
除了采用上述距离矩阵的方法来计算相似度,并进行聚类的方法之外,本申请实施例还提供第二种方式进行计算聚类,即应用语言模型对每天的项目进行表示学习。这样做的好处是,可将高维稀疏矩阵降维到低维稠密矩阵,不仅可提高方法的性能,还可以对每个用户·天进行更为精确的表示,从而获得更好的聚类效果。In addition to using the distance matrix method described above to calculate the similarity and perform clustering, the embodiment of the present application also provides a second method for calculating clustering, that is, applying a language model to express learning for daily items. The advantage of this is that the dimensionality of the high-dimensional sparse matrix can be reduced to a low-dimensional dense matrix, which not only improves the performance of the method, but also can represent each user·day more accurately, so as to obtain a better clustering effect .
在一实施例中,如图5所示,所述S102包括步骤S501~S504:In an embodiment, as shown in FIG. 5, the S102 includes steps S501 to S504:
S501、获取每一所述用户·天中所使用的项目,并将获取到的项目作为单词;S501. Obtain items used by each of the users and days, and use the obtained items as words;
S502、通过基于词向量的表示学习对每一所述用户·天中的所有单词进行向量表示,得到对应的单词向量;S502: Perform vector representation on all words in each user·day through word vector-based representation learning to obtain corresponding word vectors;
S503、通过词频加权的方法对每一所述用户·天中的所有单词的单词向量进行加权,得到每一所述用户·天的句子向量,其中,词频加权的计算公式为:v day=dot(V I,TFIDF),其中v day表示所述用户·天的句子向量,V I表示所述用户·天内各个项目表示的矩阵,其中I为所述用户·天中项目的集合,V I的每一行表示一个项目的单词向量,dot表示元素的内积运算,TFIDF表示词频文章特异度矩阵;项目i的TFIDF计算公式为:
Figure PCTCN2021084255-appb-000002
其中D i表示包含项目i的用户·天的总数,D表示所有用户·天的总数,A i表示包含项目i的总用户数量,A表示总用户数量;
S503. Weight the word vectors of all words in each user·day by the word frequency weighting method to obtain sentence vectors for each user·day, wherein the calculation formula for the word frequency weighting is: v day =dot (V I, TFIDF), where v day represents the sentence vector of the user -day, V I denotes the respective item-user within a matrix representation, where I is the set of items of user-day, the V I Each row represents the word vector of an item, dot represents the inner product operation of the elements, and TFIDF represents the word frequency article specificity matrix; the TFIDF calculation formula for item i is:
Figure PCTCN2021084255-appb-000002
Where D i represents the total number of users·days that include item i, D represents the total number of all users·days, A i represents the total number of users that include item i, and A represents the total number of users;
S504、根据各所述用户·天的句子向量之间的距离对相似的用户·天进行聚类。S504. Cluster similar users and days according to the distance between the sentence vectors of each user and day.
在实际应用语言模型进行表示学习的时候,类比语言模型,可将每个用户·天作为一句话,每天中的每个项目作为一个单词,进行基于句子的表示学习。例如,某用户·天为{项目a,项目b,项目c},表示该天中发生了“项目a”、“项目b”和“项目c”3个项目;将其转化为句子,即“项目a项目b项目c”,这句话有3个词组成,这三个词分别是“项目a”、“项目b”和“项目c”;然后通过基于词向量的表示学习,得到每个单词的向量表示,得到对应的单词向量,再将一天内所有项目的单词向量进行加权,得到了对应的用户·天的句子向量。最后通过各个句子向量之间的距离对相似的用户·天进行聚类。至于句子向量之间的距离的计算方式,可以采用欧式距离、夹角余弦距离、曼哈顿距离、切比雪夫距离等等,根据计算出的距离即可实现聚类,即将距离小的聚类在一起。本申请实施例中,所述语言模型可以是word2vec,是一种基于某单词邻居窗口内其他单词共现的概率对每个单词进行表示学习的方法。When the language model is actually used for representation learning, the analogy language model can take each user·day as a sentence, and each item in each day as a word, and perform sentence-based representation learning. For example, a user · day is {item a, item b, item c}, which means that three items of "item a", "item b" and "item c" occurred during that day; convert them into sentences, that is, " Item a item b item c", this sentence consists of 3 words, these three words are "item a", "item b" and "item c"; then through the representation learning based on the word vector, get each The vector representation of the word, the corresponding word vector is obtained, and then the word vectors of all items in a day are weighted to obtain the corresponding user·day sentence vector. Finally, the similar users·days are clustered by the distance between each sentence vector. As for the calculation method of the distance between sentence vectors, Euclidean distance, angle cosine distance, Manhattan distance, Chebyshev distance, etc. can be used. Clustering can be achieved based on the calculated distance, that is, clustering together with small distances. . In the embodiment of the present application, the language model may be word2vec, which is a method for learning each word based on the probability of co-occurrence of other words in the neighbor window of a certain word.
此外,本申请实施例中,在应用语言模型时,由于在项目使用数据中,每天的项目是没有顺序的,即同一天内所有项目都应该认为是其他项目的邻居,因此在实际应用中,可设定最大滑动时间窗为一天内最多出现的项目个数,从而获得了每个项目的表达,例如“项目a”、“项目b”和“项目c”通过表示学习,分别用V a、V b和V c表示。 In addition, in the embodiments of this application, when the language model is applied, since the items of each day are not in order in the item usage data, that is, all items in the same day should be considered as neighbors of other items. Therefore, in actual applications, setting the maximum number of items in a sliding time window of the day most occurring, thereby obtaining expression of each item, such as "project a", "project b" and "item c" expressed by the learning, respectively, V a, V b and V c indicate.
另外,本申请实施例是通过词频加权的方法得到句子向量,相较于直接应用所有单词向量平均得到最终的表示,本申请实施例可以提高句子向量表示的准确度,更符合本申请实施例的应用场景。In addition, the embodiment of the present application obtains the sentence vector by the method of word frequency weighting. Compared with directly applying the average of all word vectors to obtain the final expression, the embodiment of the present application can improve the accuracy of the sentence vector representation, which is more in line with the embodiment of the present application. Application scenarios.
S103、使用聚类的核心对各所述用户的就医路径进行表示,并将各所述用户的就医路径进行序列化表示,然后从中挖掘出频繁序列,并将所述频繁序列作为主要临床路径。S103. Use the core of clustering to represent the medical treatment path of each user, and serialize the medical treatment path of each user, and then dig out frequent sequences therefrom, and use the frequent sequences as the main clinical path.
在一实施例中,如图6所示,所述步骤S103包括S601~S605:In an embodiment, as shown in FIG. 6, the step S103 includes S601 to S605:
S601、使用不同的数字对每一聚类的核心分别进行表示;S601. Use different numbers to represent the core of each cluster separately;
例如使用数字1代表第一类,数字2代表第二类,...,数字x代表第x类。For example, use the number 1 to represent the first category, the number 2 to represent the second category,..., and the number x to represent the xth category.
S602、将每一聚类下的所述用户·天使用对应聚类的核心的数字进行表示;S602. Use the number of the core of the corresponding cluster to represent the user·day in each cluster;
由于前述步骤中已经将项目使用矩阵中的用户·天进行聚类,所以此时可以直接使用对应数字来表示对应的用户·天,例如第1个用户·天为第一类,那么就只需利用一个数字1表示,第2个用户·天为第三类,那么就只需利用一个数字3表示。Since the project has been clustered using the user·day in the matrix in the preceding steps, the corresponding number can be used directly to indicate the corresponding user·day. For example, if the first user·day is the first category, then only Use a number 1 to indicate that the second user·day is the third type, then only a number 3 is needed to indicate.
S603、将数字表示后的每一用户·天进行序列化表示,得到就医路径序列;S603. Perform serialized representation for each user·day after the number is represented to obtain a medical treatment path sequence;
通过前述步骤,每一用户·天均使用了一个数字来表示,所以此步骤可以按用户·天的顺序进行序列化,也就是按项目使用矩阵中每一行的顺序进行序列化,从而得到就医路径序列。Through the foregoing steps, each user·day is represented by a number, so this step can be serialized in the order of user·day, that is, serialized in the order of each row in the project usage matrix, so as to obtain the medical treatment path sequence.
例如就医路径序列为<c1,c2,…,ci,…>,其中,ci表示第i个用户·天的聚类核心。也就是说,经过聚类,可以使用x类表示用户的就诊路径,即用一个数字表示某个用户的一天,取代了现有技术中使用集合S i表示用户的一天。 For example, the medical treatment path sequence is <c1,c2,...,ci,...>, where ci represents the cluster core of the i-th user·day. That is, after clustering, may be used for treatment x represents the class path to the user, i.e. the user indicates a day, day substituted in the prior art set S i represented by a number of users.
S604、删除所述就医路径序列中连续相同元素且仅保留其中一个,得到简化后的就医路径序列;S604. Delete consecutive identical elements in the sequence of medical treatment routes and keep only one of them to obtain a simplified medical treatment route sequence;
例如前述步骤S603得到的就医路径序列为<1,1,1,3,3,3,3,3,6,8,8,9>,那么可以删除其中多 余的重复元素,并保留重复元素中的一个,得到简化后的就医路径序列<1,3,6,8,9>。For example, the medical treatment path sequence obtained in the foregoing step S603 is <1,1,1,3,3,3,3,3,6,8,8,9>, then the redundant repeated elements can be deleted, and the repeated elements can be retained One, the simplified sequence of medical treatment route <1,3,6,8,9>.
当然,在得到就医路径序列后,可以不进行简化,那么可以得到每类用户·天持续时间的统计,比如例子“<1,1,1,3,3,3,3,3,6,8,8,9>”中,对应第1类用户·天持续了3天,第3类用户·天持续了5天,这样的统计可以看出每类用户·天一般情况下的持续时长,如通过取95%情况出现的持续时间为阈值,可得到规则如:第e类用户·天持续时间应该小于y天,则实际数据中,若e类用户·天持续时间超过y天,则认为是过度医疗,甚至为骗保的可能性较高。Of course, after obtaining the medical treatment path sequence, no simplification is required, then the statistics of each type of user·day duration can be obtained, such as the example "<1,1,1,3,3,3,3,3,6,8 ,8,9>”, corresponding to the first type of user·day lasted for 3 days, and the third type of user·day lasted for 5 days. Such statistics can show the duration of each type of user·day in general, such as By taking the duration of 95% of the cases as the threshold, we can get rules such as: the e-th user·day duration should be less than y days. In the actual data, if the e-type user·day duration exceeds y days, it is considered to be The possibility of over-medical treatment or even fraudulent insurance is higher.
S605、使用序列挖掘算法从所述就医路径序列中挖掘出频繁序列,并将所述频繁序列作为主要临床路径。S605. Use a sequence mining algorithm to dig out frequent sequences from the sequence of medical treatment routes, and use the frequent sequences as the main clinical route.
本步骤可以使用prefixspan(前缀投影的模式挖掘)等序列挖掘算法,挖掘出频繁序列。In this step, sequence mining algorithms such as prefixspan (pattern mining of prefix projection) can be used to mine frequent sequences.
通过聚类,本申请可以将m个患者·天分为x类的用户·天,并通过分别对每类的用户·天进行挖掘,从而理解该类用户·天具体的行为是什么;再将对该类用户·天的理解映射到频繁序列中,得到对频繁序列的理解,这样的频繁序列即从数据中挖掘得到的主要临床路径。Through clustering, this application can classify m patients·days into x categories of users·days, and by separately mining each category of users·days, to understand what the specific behaviors of that category of users·days are; The understanding of this type of user·day is mapped to the frequent sequence, and the understanding of the frequent sequence is obtained. Such frequent sequence is the main clinical path mined from the data.
以某大型手术为例,首先得到了做了该手术的所有用户每天使用项目的数据,按照用户·天进行聚类后,得到了5类聚类核心,用a/b/c/d/e表示,将这5类聚类核心映射回用户的就医路径序列中,对用户的就医路径序列进行简化并从中挖掘出频繁序列,得到了符合要求的频繁序列为ab、bde、ae;其次分别对属于这5类聚类核心的用户·天进行频繁集挖掘,比如a类用户·天中频繁出现的项目为:住院诊查费、血常规、尿检、凝血功能检查,b类用户·天中频繁出现的项目为:呼吸机、麻醉费、纱布、手术费、输血费等,d类用户·天中频繁出现的项目为:营养输液、血常规、尿检、C反应蛋白测定等,e类用户·天中频繁出现的项目为:康复训练、抗生素等,可以分别将a/b/d/e理解为手术前准备事项/手术中/术后检查/术后康复,从而映射回挖掘得到的频繁序列ab就表示:先进行术前准备,之后做手术;ae表示先进行术前准备,之后是术后康复等。通过这样的分析,可以从数据中得到主要的临床路径,并对每个用户的就诊轨迹进行基本的解读。本申请实施例可以以用户·天为尺度,对频繁序列进行分析,其次在每类用户·天中进行频繁项目的挖掘,得到了对每类用户·天的理解,再映射回用户的频繁序列中,进而得到全方位的理解。Take a large-scale operation as an example. First, we obtained the daily data of all users who underwent the operation. After clustering according to users and days, we obtained 5 cluster cores, using a/b/c/d/e Indicates that these five cluster cores are mapped back to the user's medical treatment path sequence, and the user's medical treatment path sequence is simplified and the frequent sequences are extracted from them. The frequent sequences that meet the requirements are obtained as ab, bde, and ae; Users who belong to the core of these 5 clusters conduct frequent collection mining. For example, the items that frequently appear in the a-type user·tianzhong are: hospitalization examination fee, blood routine, urine test, blood coagulation function test, and b-type users·tianzhong frequent Items that appear are: ventilator, anesthesia fee, gauze, surgery fee, blood transfusion fee, etc. For d users, the items that appear frequently in the day are: nutrition infusion, blood routine, urine test, C-reactive protein determination, etc., e-type users· The items that frequently appear in the sky are: rehabilitation training, antibiotics, etc., a/b/d/e can be understood as pre-operative preparations/in-surgery/post-operative examinations/post-operative rehabilitation, respectively, so as to map back to the frequent sequence obtained by mining ab means: preoperative preparation first, followed by surgery; ae means preoperative preparation first, followed by postoperative rehabilitation. Through this analysis, the main clinical path can be obtained from the data, and a basic interpretation of the trajectory of each user's visit can be made. The embodiment of this application can analyze frequent sequences on the scale of users·days, and then conduct frequent item mining in each type of users·days to obtain an understanding of each type of users·days, and then map back to the frequent sequences of users In order to gain a comprehensive understanding.
基于上面分析,还可以得到时序关联规则。即通过得到的主要临床路径,获得了用户就诊的时序关系,即某类用户·天必须发生在另外一类用户·天之前;另外通过对每类用户·天的频繁集挖掘,可以得到每类天中频繁出现的项目,则结合起来,就可以得到如“在a类患者·天中频繁出现的项目1必须出现在b类患者·天中频繁出现的项目2”这样的时序关联规则。例如,在某大型手术中,可以得到“术前做凝血功能检查必须发生在术中输血之前”这样的规则,并将其应用于实际的质控中。比如患者发生了术中的输血,那么在术前一定要做上述检查。Based on the above analysis, the timing association rules can also be obtained. That is, through the main clinical path obtained, the time sequence relationship of user visits is obtained, that is, a certain type of user·day must occur before another type of user·day; in addition, through frequent set mining of each type of user·day, each type of user·day can be obtained. The items that frequently appear in the sky can be combined to obtain a sequential association rule such as "the item 1 that frequently appears in the day must appear in the category b patient and the item 2 that appears frequently in the sky". For example, in a large-scale operation, the rule that "preoperative coagulation function test must occur before intraoperative blood transfusion" can be obtained and applied to actual quality control. For example, if the patient has an intraoperative blood transfusion, the above-mentioned examination must be done before the operation.
在实际场景中,单个的用户·天的数据已经非常复杂,而各个用户·天之间是有序的排列,而每用户·天之内的数据则是无序的排列。采用现有的频繁序列挖掘的方法,所需要的计算量非常大。本申请实施例则解决了含有多个项集的序列模式挖掘的问题,本申请实施例根据每一行(在业务场景中一行数据对应的是一个用户在一天中的使用项目)出现的项集,对所有 用户的所有用户·天进行聚类,聚类之后,用每一类的类别编号表示这一用户·天,这样就将每一行用一个数字代替,从而一次住院就可以用类别编号的序列表示,可快速实现对频繁序列的挖掘。In the actual scenario, the data of a single user·day is very complicated, and each user·day is arranged in an orderly manner, while the data within each user·day is arranged in an disorderly manner. With the existing frequent sequence mining method, the amount of calculation required is very large. The embodiment of this application solves the problem of mining sequential patterns containing multiple item sets. According to the embodiment of this application, according to each row (in a business scenario, a row of data corresponds to a user's use item in a day), the item set appears. Cluster all users·days of all users. After clustering, use the category number of each category to represent this user·day, so that each row is replaced with a number, so that a sequence of category numbers can be used for one hospitalization Indicates that the mining of frequent sequences can be quickly realized.
由此,本提案提出的多尺度临床路径挖掘的方案,核心是通过对住院天进行分类,不仅从住院数据中挖掘出频繁路径,也可以挖掘出每类天的频繁项集;这是直接应用现有的模式挖掘技术所不能实现。Therefore, the core of the multi-scale clinical path mining scheme proposed in this proposal is to classify hospitalization days, not only to mine frequent paths from hospitalization data, but also to mine frequent itemsets of each type of day; this is a direct application Existing pattern mining technology cannot be realized.
请参阅图7,图7为本申请实施例一种多尺度临床路径挖掘装置的示意性框图,所述多尺度临床路径挖掘装置700包括:Please refer to FIG. 7. FIG. 7 is a schematic block diagram of a multi-scale clinical path mining device according to an embodiment of the application. The multi-scale clinical path mining device 700 includes:
转换单元701,用于将多个用户每天所使用的项目使用数据转换为项目使用矩阵,并将所述项目使用矩阵记为m*n,m表示所有所述用户的所有住院天数的加和,n表示所有项目的数量,所述项目使用矩阵中的每一行代表一个用户在一天中所使用的项目;The conversion unit 701 is configured to convert the item usage data used by multiple users every day into a item usage matrix, and record the item usage matrix as m*n, where m represents the sum of all the days of hospitalization of all the users, n represents the number of all items, and each row in the item usage matrix represents an item used by a user in a day;
聚类单元702,用于将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类;The clustering unit 702 is configured to use each row in the item usage matrix as a user·day, and cluster similar users·days according to the similarity between the users·days;
挖掘单元703,用于使用聚类的核心对各所述用户的就医路径进行表示,并将各所述用户的就医路径进行序列化表示,然后从中挖掘出频繁序列,并将所述频繁序列作为主要临床路径。The mining unit 703 is configured to use the core of the cluster to represent the medical treatment path of each user, and to serialize the medical treatment path of each user, and then mine the frequent sequence from it, and use the frequent sequence as The main clinical path.
在一实施例中,如图8所示,所述转换单元701包括:In an embodiment, as shown in FIG. 8, the conversion unit 701 includes:
项目使用矩阵构建单元801,用于预先构建项目使用矩阵,其中,所述项目使用矩阵的行数为m,列数为n;The project use matrix construction unit 801 is used to construct a project use matrix in advance, wherein the number of rows of the project use matrix is m and the number of columns is n;
获取单元802,用于获取每一用户在每一天所使用的项目;The obtaining unit 802 is used to obtain the items used by each user every day;
填充单元803,用于根据每一用户在每一天所使用的项目对所述项目使用矩阵的各行元素进行填充。The filling unit 803 is used to fill each row element of the item usage matrix according to the items used by each user every day.
在一实施例中,如图9所示,所述聚类单元702包括:In an embodiment, as shown in FIG. 9, the clustering unit 702 includes:
距离矩阵构建单元901,用于根据所述项目使用矩阵计算各所述用户·天之间的相似度,根据各所述用户·天之间的相似度,构建得到各所述用户·天的距离矩阵,并将所述距离矩阵记为m*m;The distance matrix construction unit 901 is configured to calculate the similarity between each user·day according to the project usage matrix, and construct the distance between each user·day according to the similarity between each user·day Matrix, and record the distance matrix as m*m;
距离矩阵聚类单元902,用于根据所述距离矩阵对相似的用户·天进行聚类。The distance matrix clustering unit 902 is configured to cluster similar users·days according to the distance matrix.
在一实施例中,如图10所示,所述距离矩阵构建单元901包括:In an embodiment, as shown in FIG. 10, the distance matrix construction unit 901 includes:
抽取单元1001,用于从所述项目使用矩阵中抽取每一行的数据;The extraction unit 1001 is used to extract data of each row from the item usage matrix;
相似度计算单元1002,用于按顺序计算每一行的数据与所有行的数据之间的相似度;The similarity calculation unit 1002 is configured to sequentially calculate the similarity between each row of data and all rows of data;
排列单元1003,用于将所述计算出的相似度按顺序进行排列,构建得到所述距离矩阵,并将所述距离矩阵记为m*m,其中,所述距离矩阵的第i行第j列元素d ij表示第i个用户·天和第j个用户·天的距离。 The arranging unit 1003 is configured to arrange the calculated similarities in order to construct the distance matrix, and record the distance matrix as m*m, where the i-th row of the distance matrix is the jth row The column element d ij represents the distance between the i-th user·day and the j-th user·day.
在一实施例中,所述距离矩阵聚类单元902包括:In an embodiment, the distance matrix clustering unit 902 includes:
层次聚类单元,用于使用层次聚类的方式将所述距离矩阵中最近的两个元素聚为一类,并遍历全部元素,实现全局的聚类。The hierarchical clustering unit is used to cluster the two closest elements in the distance matrix into one class by using hierarchical clustering, and traverse all the elements to realize global clustering.
在一实施例中,如图11所示,所述聚类单元702包括:In an embodiment, as shown in FIG. 11, the clustering unit 702 includes:
单词提取单元1101,用于获取每一所述用户·天中所使用的项目,并将获取到的项目作为单词;The word extracting unit 1101 is configured to obtain the items used by each user·day, and use the obtained items as words;
单词向量表示单元1102,用于通过基于词向量的表示学习对每一所述用户·天中的所有单词进行向量表示,得到对应的单词向量;The word vector representation unit 1102 is configured to perform vector representation on all words in each user·day through word vector-based representation learning to obtain corresponding word vectors;
词频加权单元1103,用于通过词频加权的方法对每一所述用户·天中的所有单词的单词向量进行加权,得到每一所述用户·天的句子向量,其中,词频加权的计算公式为:v day=dot(V I,TFIDF),其中v day表示所述用户·天的句子向量,V I表示所述用户·天内各个项目表示的矩阵,其中I为所述用户·天中项目的集合,V I的每一行表示一个项目的单词向量,dot表示元素的内积运算,TFIDF表示词频文章特异度矩阵;项目i的TFIDF计算公式为:
Figure PCTCN2021084255-appb-000003
Figure PCTCN2021084255-appb-000004
其中D i表示包含项目i的用户·天的总数,D表示所有用户·天的总数,A i表示包含项目i的总用户数量,A表示总用户数量;
The word frequency weighting unit 1103 is used to weight the word vectors of all words in each user·day by the word frequency weighting method to obtain the sentence vectors of each user·day, wherein the calculation formula for the word frequency weighting is : v day = dot (V I , TFIDF), where v day represents the sentence vector of the user -day, V I denotes the matrix within the respective user-item represented by wherein I is the user-item days collection, each row represents a word vector V i a project, dot represents an inner product computation elements, TFIDF represents term frequency specificity article matrix; TFIDF item i is calculated as:
Figure PCTCN2021084255-appb-000003
Figure PCTCN2021084255-appb-000004
Where D i represents the total number of users·days that include item i, D represents the total number of all users·days, A i represents the total number of users that include item i, and A represents the total number of users;
距离聚类单元1104,用于根据各所述用户·天的句子向量之间的距离对相似的用户·天进行聚类。The distance clustering unit 1104 is configured to cluster similar users·days according to the distance between the sentence vectors of each user·day.
在一实施例中,如图12所示,所述挖掘单元703包括:In an embodiment, as shown in FIG. 12, the mining unit 703 includes:
核心表示单元1201,用于使用不同的数字对每一聚类的核心分别进行表示;The core representation unit 1201 is used to separately represent the cores of each cluster using different numbers;
数字表示单元1202,用于将每一聚类下的所述用户·天使用对应聚类的核心的数字进行表示;The number representation unit 1202 is used to represent the user·day in each cluster using the number of the core of the corresponding cluster;
序列化表示单元1203,用于将数字表示后的每一用户·天进行序列化表示,得到就医路径序列;The serialized representation unit 1203 is used to serialize and represent each user·day after the number is represented to obtain a medical treatment path sequence;
简化单元1204,用于删除所述就医路径序列中连续相同元素且仅保留其中一个,得到简化后的就医路径序列;The simplification unit 1204 is configured to delete consecutive identical elements in the medical treatment path sequence and retain only one of them to obtain a simplified medical treatment path sequence;
序列挖掘单元1205,用于使用序列挖掘算法从所述就医路径序列中挖掘出频繁序列,并将所述频繁序列作为主要临床路径。The sequence mining unit 1205 is configured to use a sequence mining algorithm to dig out frequent sequences from the medical treatment path sequence, and use the frequent sequence as the main clinical path.
上述装置实施例的具体内容与上述方法实施例的具体内容一一对应,关于上述装置实施例的具体实施细节可参考方法实施例的描述,此处不再赘述。The specific content of the foregoing device embodiment corresponds to the specific content of the foregoing method embodiment one-to-one. For the specific implementation details of the foregoing device embodiment, reference may be made to the description of the method embodiment, which will not be repeated here.
本申请实施例提供的装置,可以实现对时序临床数据的模式挖掘,从数据中得到真实的临床路径,能更好的反应临床的实际操作的合理性和多变性,且通过序列化表示解决了无序项集过多带来的时间和空间复杂度高的问题。The device provided by the embodiment of the application can realize the pattern mining of time series clinical data, obtain the real clinical path from the data, can better reflect the rationality and variability of clinical actual operation, and solve the problem through serialized representation. The problem of high time and space complexity caused by too many unordered itemsets.
上述多尺度临床路径挖掘装置700可以实现为计算机程序的形式,该计算机程序可以在如图13所示的计算机设备上运行。The above-mentioned multi-scale clinical path mining device 700 can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 13.
请参阅图13,图13是本申请实施例提供的计算机设备的示意性框图。该计算机设备1300是服务器,服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。Please refer to FIG. 13, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 1300 is a server, and the server may be an independent server or a server cluster composed of multiple servers.
参阅图13,该计算机设备1300包括通过系统总线1301连接的处理器1302、存储器和网络接口1305,其中,存储器可以包括非易失性存储介质1303和内存储器1304。Referring to FIG. 13, the computer device 1300 includes a processor 1302, a memory, and a network interface 1305 connected through a system bus 1301, where the memory may include a non-volatile storage medium 1303 and an internal memory 1304.
该非易失性存储介质1303可存储操作系统13031和计算机程序13032。该计算机程序13032被执行时,可使得处理器1302执行多尺度临床路径挖掘方法。The non-volatile storage medium 1303 can store an operating system 13031 and a computer program 13032. When the computer program 13032 is executed, the processor 1302 can execute the multi-scale clinical path mining method.
该处理器1302用于提供计算和控制能力,支撑整个计算机设备1300的运行。The processor 1302 is used to provide computing and control capabilities, and support the operation of the entire computer device 1300.
该内存储器1304为非易失性存储介质1303中的计算机程序13032的运行提供环境,该计算机程序13032被处理器1302执行时,可使得处理器1302执行多尺度临床路径挖掘方法。The internal memory 1304 provides an environment for the operation of the computer program 13032 in the non-volatile storage medium 1303. When the computer program 13032 is executed by the processor 1302, the processor 1302 can execute the multi-scale clinical path mining method.
该网络接口1305用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图13中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备1300的限定,具体的计算机设备1300可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The network interface 1305 is used for network communication, such as providing data information transmission. Those skilled in the art can understand that the structure shown in FIG. 13 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 1300 to which the solution of the present application is applied. The specific computer device The 1300 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
其中,所述处理器1302用于运行存储在存储器中的计算机程序13032,以实现如下功能:将多个用户每天所使用的项目使用数据转换为项目使用矩阵,并将所述项目使用矩阵记为m*n,m表示所有所述用户的所有住院天数的加和,n表示所有项目的数量,所述项目使用矩阵中的每一行代表一个用户在一天中所使用的项目;将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类;使用聚类的核心对各所述用户的就医路径进行表示,并将各所述用户的就医路径进行序列化表示,然后从中挖掘出频繁序列,并将所述频繁序列作为主要临床路径。Wherein, the processor 1302 is used to run a computer program 13032 stored in the memory to realize the following function: convert the project usage data used by multiple users every day into a project usage matrix, and record the project usage matrix as m*n, m represents the sum of all days of hospitalization of all the users, n represents the number of all items, each row in the item usage matrix represents an item used by a user in a day; use the item Each row in the matrix is regarded as a user·day, and similar users·days are clustered according to the similarity between each user·day; the core of the clustering is used to represent the medical treatment path of each user, and The medical treatment path of each user is serialized and expressed, and then frequent sequences are excavated therefrom, and the frequent sequences are used as the main clinical path.
本领域技术人员可以理解,图13中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图13所示实施例一致,在此不再赘述。Those skilled in the art can understand that the embodiment of the computer device shown in FIG. 13 does not constitute a limitation on the specific structure of the computer device. In other embodiments, the computer device may include more or less components than those shown in the figure. Or some parts are combined, or different parts are arranged. For example, in some embodiments, the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 13 and will not be repeated here.
应当理解,在本申请实施例中,处理器1302可以是中央处理单元(Central Processing Unit,CPU),该处理器1302还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable GateArray,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in this embodiment of the application, the processor 1302 may be a central processing unit (Central Processing Unit, CPU), and the processor 1302 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable GateArray, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
在本申请的另一实施例中提供计算机可读存储介质。该计算机可读存储介质可以为非易失性的计算机可读存储介质,也可以为易失性的计算机可读存储介质。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现以下步骤:将多个用户每天所使用的项目使用数据转换为项目使用矩阵,并将所述项目使用矩阵记为m*n,m表示所有所述用户的所有住院天数的加和,n表示所有项目的数量,所述项目使用矩阵中的每一行代表一个用户在一天中所使用的项目;将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类;使用聚类的核心对各所述用户的就医路径进行表示,并将各所述用户的就医路径进行序列化表示,然后从中挖掘出频繁序列,并将所述频繁序列作为主要临床路径。In another embodiment of the present application, a computer-readable storage medium is provided. The computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium. The computer-readable storage medium stores a computer program, where the computer program is executed by the processor to implement the following steps: convert the item usage data used by multiple users every day into a item usage matrix, and record the item usage matrix as m *n, m represents the sum of all the days of hospitalization of all the users, n represents the number of all items, each row in the item usage matrix represents an item used by a user in a day; the item usage matrix Each row in is regarded as a user·day, and similar users·days are clustered according to the similarity between each user·day; the core of the clustering is used to represent the medical treatment path of each user, and The medical path of each user is serialized and expressed, and then frequent sequences are excavated from them, and the frequent sequences are used as the main clinical path.
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。应当指出,对于本技术领域的普通 技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。The various embodiments in the specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other. It should be pointed out that for those of ordinary skill in the art, without departing from the principles of this application, several improvements and modifications can be made to this application, and these improvements and modifications also fall within the protection scope of the claims of this application.
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的状况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that in this specification, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is any such actual relationship or sequence between operations. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. Under the condition of no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other same elements in the process, method, article or equipment including the element.

Claims (20)

  1. 一种多尺度临床路径挖掘方法,其中,包括:A multi-scale clinical path mining method, which includes:
    将多个用户每天所使用的项目使用数据转换为项目使用矩阵,并将所述项目使用矩阵记为m*n,m表示所有所述用户的所有住院天数的加和,n表示所有项目的数量,所述项目使用矩阵中的每一行代表一个用户在一天中所使用的项目;Convert the project use data used by multiple users every day into a project use matrix, and record the project use matrix as m*n, where m represents the sum of all days of hospitalization of all the users, and n represents the number of all projects , Each row in the item usage matrix represents an item used by a user in a day;
    将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类;Taking each row in the item usage matrix as a user·day, and clustering similar users·days according to the similarity between each user·day;
    使用聚类的核心对各所述用户的就医路径进行表示,并将各所述用户的就医路径进行序列化表示,然后从中挖掘出频繁序列,并将所述频繁序列作为主要临床路径。The core of clustering is used to represent the medical treatment path of each user, and the medical treatment path of each user is serialized and expressed, and then frequent sequences are excavated from them, and the frequent sequences are used as the main clinical path.
  2. 根据权利要求1所述的多尺度临床路径挖掘方法,其中,所述将多个用户每天所使用的项目使用数据转换为项目使用矩阵,并将所述项目使用矩阵记为m*n,包括:The multi-scale clinical path mining method according to claim 1, wherein said converting the item usage data used by a plurality of users every day into an item usage matrix, and denoting the item usage matrix as m*n, comprises:
    预先构建项目使用矩阵,其中,所述项目使用矩阵的行数为m,列数为n;Pre-build a project usage matrix, where the number of rows of the project usage matrix is m and the number of columns is n;
    获取每一用户在每一天所使用的项目;Get the items used by each user every day;
    根据每一用户在每一天所使用的项目对所述项目使用矩阵的各行元素进行填充。Fill in each row element of the item usage matrix according to the items used by each user every day.
  3. 根据权利要求1所述的多尺度临床路径挖掘方法,其中,所述将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类,包括:The multi-scale clinical path mining method according to claim 1, wherein each row in the item usage matrix is regarded as a user·day, and similar users are compared according to the similarity between each user·day. ·Day clustering, including:
    根据所述项目使用矩阵计算各所述用户·天之间的相似度,根据各所述用户·天之间的相似度,构建得到各所述用户·天的距离矩阵,并将所述距离矩阵记为m*m;Calculate the similarity between the users·days according to the project usage matrix, construct and obtain the distance matrix of each user·day according to the similarities between the users·days, and combine the distance matrix Denoted as m*m;
    根据所述距离矩阵对相似的用户·天进行聚类。Clustering similar users·days according to the distance matrix.
  4. 根据权利要求3所述的多尺度临床路径挖掘方法,其中,所述根据所述项目使用矩阵计算各所述用户·天之间的相似度,根据各所述用户·天之间的相似度,构建得到各所述用户·天的距离矩阵,并将所述距离矩阵记为m*m,包括:The multi-scale clinical path mining method according to claim 3, wherein the calculation of the similarity between the users·days according to the item usage matrix is based on the similarity between the users·days, Construct and obtain the distance matrix of each user·day, and record the distance matrix as m*m, including:
    从所述项目使用矩阵中抽取每一行的数据;Extract the data of each row from the project usage matrix;
    按顺序计算每一行的数据与所有行的数据之间的相似度;Calculate the similarity between each row of data and all rows of data in order;
    将所述计算出的相似度按顺序进行排列,构建得到所述距离矩阵,并将所述距离矩阵记为m*m,其中,所述距离矩阵的第i行第j列元素d ij表示第i个用户·天和第j个用户·天的距离。 Arrange the calculated similarities in order to construct the distance matrix, and record the distance matrix as m*m, where the element d ij in the i-th row and j-th column of the distance matrix represents the The distance between the i user·day and the jth user·day.
  5. 根据权利要求3所述的多尺度临床路径挖掘方法,其中,所述根据所述距离矩阵对相似的用户·天进行聚类,包括:The multi-scale clinical path mining method according to claim 3, wherein the clustering of similar users·days according to the distance matrix comprises:
    使用层次聚类的方式将所述距离矩阵中最近的两个元素聚为一类,并遍历全部元素,实现全局的聚类。A hierarchical clustering method is used to cluster the two closest elements in the distance matrix into one category, and all the elements are traversed to achieve global clustering.
  6. 根据权利要求1所述的多尺度临床路径挖掘方法,其中,所述将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类,包括:The multi-scale clinical path mining method according to claim 1, wherein each row in the item usage matrix is regarded as a user·day, and similar users are compared according to the similarity between each user·day. ·Day clustering, including:
    获取每一所述用户·天中所使用的项目,并将获取到的项目作为单词;Obtain the items used by each user·tianzhong, and use the obtained items as words;
    通过基于词向量的表示学习对每一所述用户·天中的所有单词进行向量表示,得到对应的单词向量;Performing vector representation of all words in each user·day through word vector-based representation learning to obtain corresponding word vectors;
    通过词频加权的方法对每一所述用户·天中的所有单词的单词向量进行加权,得到每一所述用户·天的句子向量,其中,词频加权的计算公式为:v day=dot(V I,TFIDF),其中v day表示所述用户·天的句子向量,V I表示所述用户·天内各个项目表示的矩阵,其中I为所述用户·天中项目的集合,V I的每一行表示一个项目的单词向量,dot表示元素的内积运算,TFIDF表示词频文章特异度矩阵;项目i的TFIDF计算公式为:
    Figure PCTCN2021084255-appb-100001
    其中D i表示包含项目i的用户·天的总数,D表示所有用户·天的总数,A i表示包含项目i的总用户数量,A表示总用户数量;
    The word vector of all words in each user·day is weighted by the method of word frequency weighting to obtain the sentence vector of each user·day. The calculation formula for word frequency weighting is: v day =dot(V I, the TFIDF), where v day represents the sentence vector of the user -day, V I denotes the respective item-user within a matrix representation, where I is the set of user-day items, each row of V I Represents the word vector of an item, dot represents the inner product operation of the elements, and TFIDF represents the word frequency article specificity matrix; the calculation formula of TFIDF for item i is:
    Figure PCTCN2021084255-appb-100001
    Where D i represents the total number of users·days that include item i, D represents the total number of all users·days, A i represents the total number of users that include item i, and A represents the total number of users;
    根据各所述用户·天的句子向量之间的距离对相似的用户·天进行聚类。Clustering similar users·days based on the distance between the sentence vectors of each user·day.
  7. 根据权利要求1所述的多尺度临床路径挖掘方法,其中,所述使用聚类的核心对各所述用户的就医路径进行表示,并将各所述用户的就医路径进行序列化表示,然后从中挖掘出频繁序列,并将所述频繁序列作为主要临床路径,包括:The multi-scale clinical path mining method according to claim 1, wherein the core of the cluster is used to represent the medical path of each user, and the medical path of each user is serialized and displayed, and then Mining frequent sequences and using the frequent sequences as the main clinical path includes:
    使用不同的数字对每一聚类的核心分别进行表示;Use different numbers to represent the core of each cluster separately;
    将每一聚类下的所述用户·天使用对应聚类的核心的数字进行表示;Use the number of the core of the corresponding cluster to represent the user·day in each cluster;
    将数字表示后的每一用户·天进行序列化表示,得到就医路径序列;Serialize each user·day after the digital representation to obtain the medical treatment path sequence;
    删除所述就医路径序列中连续相同元素且仅保留其中一个,得到简化后的就医路径序列;Delete consecutive identical elements in the medical treatment path sequence and keep only one of them to obtain a simplified medical treatment path sequence;
    使用序列挖掘算法从所述就医路径序列中挖掘出频繁序列,并将所述频繁序列作为主要临床路径。A sequence mining algorithm is used to dig out frequent sequences from the medical path sequence, and use the frequent sequences as the main clinical path.
  8. 一种多尺度临床路径挖掘装置,其中,包括:A multi-scale clinical path mining device, which includes:
    转换单元,用于将多个用户每天所使用的项目使用数据转换为项目使用矩阵,并将所述项目使用矩阵记为m*n,m表示所有所述用户的所有住院天数的加和,n表示所有项目的数量,所述项目使用矩阵中的每一行代表一个用户在一天中所使用的项目;The conversion unit is used to convert the item usage data used by multiple users every day into a item usage matrix, and record the item usage matrix as m*n, where m represents the sum of all the days of hospitalization of all the users, n Represents the number of all items, and each row in the item usage matrix represents an item used by a user in a day;
    聚类单元,用于将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类;A clustering unit, configured to use each row in the item usage matrix as a user·day, and cluster similar users·days according to the similarity between each user·day;
    挖掘单元,用于使用聚类的核心对各所述用户的就医路径进行表示,并将各所述用户的就医路径进行序列化表示,然后从中挖掘出频繁序列,并将所述频繁序列作为主要临床路径。The mining unit is used to use the core of clustering to represent the medical treatment path of each user, and to serialize the medical treatment path of each user, and then mine the frequent sequence from it, and use the frequent sequence as the main Clinical path.
  9. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1所述的多尺度临床路径挖掘方法。A computer device, comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements the computer program described in claim 1 when the processor executes the computer program Multi-scale clinical pathway mining method.
  10. 根据权利要求9所述的计算机设备,其中,所述将多个用户每天所使用的项目使用数据转换为项目使用矩阵,并将所述项目使用矩阵记为m*n,包括:9. The computer device according to claim 9, wherein said converting the item usage data used by a plurality of users every day into an item usage matrix, and denoting the item usage matrix as m*n, comprises:
    预先构建项目使用矩阵,其中,所述项目使用矩阵的行数为m,列数为n;Pre-build a project usage matrix, where the number of rows of the project usage matrix is m and the number of columns is n;
    获取每一用户在每一天所使用的项目;Get the items used by each user every day;
    根据每一用户在每一天所使用的项目对所述项目使用矩阵的各行元素进行填充。Fill in each row element of the item usage matrix according to the items used by each user every day.
  11. 根据权利要求9所述的计算机设备,其中,所述将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类,包括:The computer device according to claim 9, wherein each row in the item usage matrix is regarded as a user·day, and similar users·days are aggregated according to the similarity between the user·days. Classes, including:
    根据所述项目使用矩阵计算各所述用户·天之间的相似度,根据各所述用户·天之间的相似度,构建得到各所述用户·天的距离矩阵,并将所述距离矩阵记为m*m;Calculate the similarity between the users·days according to the project usage matrix, construct and obtain the distance matrix of each user·day according to the similarities between the users·days, and combine the distance matrix Denoted as m*m;
    根据所述距离矩阵对相似的用户·天进行聚类。Clustering similar users·days according to the distance matrix.
  12. 根据权利要求11所述的计算机设备,其中,所述根据所述项目使用矩阵计算各所述用户·天之间的相似度,根据各所述用户·天之间的相似度,构建得到各所述用户·天的距离矩阵,并将所述距离矩阵记为m*m,包括:11. The computer device according to claim 11, wherein the calculation of the similarity between the users and the days is based on the item usage matrix, and the similarity between the users and the days is constructed to obtain the State the distance matrix of the user·day, and record the distance matrix as m*m, including:
    从所述项目使用矩阵中抽取每一行的数据;Extract the data of each row from the project usage matrix;
    按顺序计算每一行的数据与所有行的数据之间的相似度;Calculate the similarity between each row of data and all rows of data in order;
    将所述计算出的相似度按顺序进行排列,构建得到所述距离矩阵,并将所述距离矩阵记为m*m,其中,所述距离矩阵的第i行第j列元素d ij表示第i个用户·天和第j个用户·天的距离。 Arrange the calculated similarities in order to construct the distance matrix, and record the distance matrix as m*m, where the element d ij in the i-th row and j-th column of the distance matrix represents the The distance between the i user·day and the jth user·day.
  13. 根据权利要求11所述的计算机设备,其中,所述根据所述距离矩阵对相似的用户·天进行聚类,包括:11. The computer device according to claim 11, wherein the clustering of similar users·days according to the distance matrix comprises:
    使用层次聚类的方式将所述距离矩阵中最近的两个元素聚为一类,并遍历全部元素,实现全局的聚类。A hierarchical clustering method is used to cluster the two closest elements in the distance matrix into one category, and all the elements are traversed to achieve global clustering.
  14. 根据权利要求9所述的计算机设备,其中,所述将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类,包括:The computer device according to claim 9, wherein each row in the item usage matrix is regarded as a user·day, and similar users·days are aggregated according to the similarity between the user·days. Classes, including:
    获取每一所述用户·天中所使用的项目,并将获取到的项目作为单词;Obtain the items used by each user·tianzhong, and use the obtained items as words;
    通过基于词向量的表示学习对每一所述用户·天中的所有单词进行向量表示,得到对应的单词向量;Performing vector representation of all words in each user·day through word vector-based representation learning to obtain corresponding word vectors;
    通过词频加权的方法对每一所述用户·天中的所有单词的单词向量进行加权,得到每一所述用户·天的句子向量,其中,词频加权的计算公式为:v day=dot(V I,TFIDF),其中v day表示所述用户·天的句子向量,V I表示所述用户·天内各个项目表示的矩阵,其中I为所述用户·天中项目的集合,V I的每一行表示一个项目的单词向量,dot表示元素的内积运算,TFIDF表示词频文章特异度矩阵;项目i的TFIDF计算公式为:
    Figure PCTCN2021084255-appb-100002
    其中D i表示包含项目i的用户·天的总数,D表示所有用户·天的总数,A i表示包含项目i的总用户数量,A表示总用户数量;
    The word vector of all words in each user·day is weighted by the method of word frequency weighting to obtain the sentence vector of each user·day. The calculation formula for word frequency weighting is: v day =dot(V I, the TFIDF), where v day represents the sentence vector of the user -day, V I denotes the respective item-user within a matrix representation, where I is the set of user-day items, each row of V I Represents the word vector of an item, dot represents the inner product operation of the elements, and TFIDF represents the word frequency article specificity matrix; the calculation formula of TFIDF for item i is:
    Figure PCTCN2021084255-appb-100002
    Where D i represents the total number of users·days that include item i, D represents the total number of all users·days, A i represents the total number of users that include item i, and A represents the total number of users;
    根据各所述用户·天的句子向量之间的距离对相似的用户·天进行聚类。Clustering similar users and days based on the distance between sentence vectors of each user and day.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行如权利要求1所述的多尺度临床路径挖掘方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program that, when executed by a processor, causes the processor to execute the multi-scale clinical path mining according to claim 1 method.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述将多个用户每天所使用的项目使用数据转换为项目使用矩阵,并将所述项目使用矩阵记为m*n,包括:15. The computer-readable storage medium according to claim 15, wherein said converting the item usage data used by a plurality of users every day into an item usage matrix, and denoting the item usage matrix as m*n, comprises:
    预先构建项目使用矩阵,其中,所述项目使用矩阵的行数为m,列数为n;Pre-build a project usage matrix, where the number of rows of the project usage matrix is m and the number of columns is n;
    获取每一用户在每一天所使用的项目;Get the items used by each user every day;
    根据每一用户在每一天所使用的项目对所述项目使用矩阵的各行元素进行填充。Fill in each row element of the item usage matrix according to the items used by each user every day.
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类,包括:The computer-readable storage medium according to claim 15, wherein each row in the item usage matrix is regarded as a user·day, and similar users·days are compared according to the similarity between the users·days. Clustering is performed every day, including:
    根据所述项目使用矩阵计算各所述用户·天之间的相似度,根据各所述用户·天之间的相似 度,构建得到各所述用户·天的距离矩阵,并将所述距离矩阵记为m*m;Calculate the similarity between the users·days according to the project usage matrix, construct and obtain the distance matrix of each user·day according to the similarities between the users·days, and combine the distance matrix Denoted as m*m;
    根据所述距离矩阵对相似的用户·天进行聚类。Clustering similar users·days according to the distance matrix.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述根据所述项目使用矩阵计算各所述用户·天之间的相似度,根据各所述用户·天之间的相似度,构建得到各所述用户·天的距离矩阵,并将所述距离矩阵记为m*m,包括:The computer-readable storage medium according to claim 17, wherein the calculation of the similarity between the users·days according to the item usage matrix is constructed according to the similarity between the users·days Obtain the distance matrix of each user·day, and record the distance matrix as m*m, including:
    从所述项目使用矩阵中抽取每一行的数据;Extract the data of each row from the project usage matrix;
    按顺序计算每一行的数据与所有行的数据之间的相似度;Calculate the similarity between each row of data and all rows of data in order;
    将所述计算出的相似度按顺序进行排列,构建得到所述距离矩阵,并将所述距离矩阵记为m*m,其中,所述距离矩阵的第i行第j列元素d ij表示第i个用户·天和第j个用户·天的距离。 Arrange the calculated similarities in order to construct the distance matrix, and record the distance matrix as m*m, where the element d ij in the i-th row and j-th column of the distance matrix represents the The distance between the i user·day and the jth user·day.
  19. 根据权利要求17所述的计算机可读存储介质,其中,所述根据所述距离矩阵对相似的用户·天进行聚类,包括:18. The computer-readable storage medium according to claim 17, wherein the clustering of similar users·days according to the distance matrix comprises:
    使用层次聚类的方式将所述距离矩阵中最近的两个元素聚为一类,并遍历全部元素,实现全局的聚类。A hierarchical clustering method is used to cluster the two closest elements in the distance matrix into one category, and all the elements are traversed to achieve global clustering.
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述将所述项目使用矩阵中的每一行作为用户·天,并根据各所述用户·天之间的相似度对相似的用户·天进行聚类,包括:The computer-readable storage medium according to claim 15, wherein each row in the item usage matrix is regarded as a user·day, and similar users·days are compared according to the similarity between each user·day. Clustering in days, including:
    获取每一所述用户·天中所使用的项目,并将获取到的项目作为单词;Obtain the items used by each of the users and days, and use the obtained items as words;
    通过基于词向量的表示学习对每一所述用户·天中的所有单词进行向量表示,得到对应的单词向量;Performing vector representation of all words in each user·day through word vector-based representation learning to obtain corresponding word vectors;
    通过词频加权的方法对每一所述用户·天中的所有单词的单词向量进行加权,得到每一所述用户·天的句子向量,其中,词频加权的计算公式为:v day=dot(V I,TFIDF),其中v day表示所述用户·天的句子向量,V I表示所述用户·天内各个项目表示的矩阵,其中I为所述用户·天中项目的集合,V I的每一行表示一个项目的单词向量,dot表示元素的内积运算,TFIDF表示词频文章特异度矩阵;项目i的TFIDF计算公式为:
    Figure PCTCN2021084255-appb-100003
    其中D i表示包含项目i的用户·天的总数,D表示所有用户·天的总数,A i表示包含项目i的总用户数量,A表示总用户数量;
    The word vector of all words in each user·day is weighted by the method of word frequency weighting to obtain the sentence vector of each user·day. The calculation formula for word frequency weighting is: v day =dot(V I, the TFIDF), where v day represents the sentence vector of the user -day, V I denotes the respective item-user within a matrix representation, where I is the set of user-day items, each row of V I Represents the word vector of an item, dot represents the inner product operation of the elements, and TFIDF represents the word frequency article specificity matrix; the calculation formula of TFIDF for item i is:
    Figure PCTCN2021084255-appb-100003
    Where D i represents the total number of users·days that include item i, D represents the total number of all users·days, A i represents the total number of users that include item i, and A represents the total number of users;
    根据各所述用户·天的句子向量之间的距离对相似的用户·天进行聚类。Clustering similar users·days based on the distance between the sentence vectors of each user·day.
PCT/CN2021/084255 2020-11-12 2021-03-31 Multi-scale clinical pathway mining method and apparatus, computer device, and storage medium WO2021204038A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011260888.4A CN112382398B (en) 2020-11-12 2020-11-12 Multi-scale clinical path mining method and device, computer equipment and storage medium
CN202011260888.4 2020-11-12

Publications (1)

Publication Number Publication Date
WO2021204038A1 true WO2021204038A1 (en) 2021-10-14

Family

ID=74583277

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084255 WO2021204038A1 (en) 2020-11-12 2021-03-31 Multi-scale clinical pathway mining method and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN112382398B (en)
WO (1) WO2021204038A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112382398B (en) * 2020-11-12 2022-08-30 平安科技(深圳)有限公司 Multi-scale clinical path mining method and device, computer equipment and storage medium
CN114418008A (en) * 2022-01-21 2022-04-29 平安国际智慧城市科技股份有限公司 Medical treatment behavior identification method and device, terminal equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228023A (en) * 2016-08-01 2016-12-14 清华大学 A kind of clinical path method for digging based on body and topic model
CN111145910A (en) * 2019-12-12 2020-05-12 平安医疗健康管理股份有限公司 Abnormal case identification method and device based on artificial intelligence and computer equipment
CN111192644A (en) * 2019-12-11 2020-05-22 平安医疗健康管理股份有限公司 Construction method and device of clinical path, computer equipment and storage medium
CN112382398A (en) * 2020-11-12 2021-02-19 平安科技(深圳)有限公司 Multi-scale clinical path mining method and device, computer equipment and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007249251A (en) * 2004-04-14 2007-09-27 Philips Electronics Japan Ltd Clinical communication device and hospital information system
CN103455578A (en) * 2013-08-23 2013-12-18 华南师范大学 Association rule and bi-clustering-based airline customer data mining method
CN105095281B (en) * 2014-05-13 2018-12-25 南京理工大学 A kind of web catalogue method for optimization analysis based on Web log mining
US10839947B2 (en) * 2016-01-06 2020-11-17 International Business Machines Corporation Clinically relevant medical concept clustering
CN106339587B (en) * 2016-08-23 2019-04-23 浙江工业大学 A kind of clinical path modeling method based on sequential network
CN110019809B (en) * 2018-01-02 2021-11-19 中国移动通信有限公司研究院 Classification determination method and device and network equipment
CN108615560A (en) * 2018-03-19 2018-10-02 安徽锐欧赛智能科技有限公司 A kind of clinical medical data analysis method based on data mining
US11069447B2 (en) * 2018-09-29 2021-07-20 Intego Group, LLC Systems and methods for topology-based clinical data mining
CN110135450B (en) * 2019-03-26 2020-06-23 中电莱斯信息系统有限公司 Hot spot path analysis method based on density clustering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228023A (en) * 2016-08-01 2016-12-14 清华大学 A kind of clinical path method for digging based on body and topic model
CN111192644A (en) * 2019-12-11 2020-05-22 平安医疗健康管理股份有限公司 Construction method and device of clinical path, computer equipment and storage medium
CN111145910A (en) * 2019-12-12 2020-05-12 平安医疗健康管理股份有限公司 Abnormal case identification method and device based on artificial intelligence and computer equipment
CN112382398A (en) * 2020-11-12 2021-02-19 平安科技(深圳)有限公司 Multi-scale clinical path mining method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAN XU, ZHUANG YAN ,CHENG SHAO-YIN: "Mining Clinical Pathways Algorithm Based on Prefix Constraints", COMPUTER SYSTEMS & APPLICATIONS, 15 November 2017 (2017-11-15), pages 1 - 6, XP055855919, ISSN: 1003-3254, DOI: 10.15888/j.cnki.csa.006073 *

Also Published As

Publication number Publication date
CN112382398A (en) 2021-02-19
CN112382398B (en) 2022-08-30

Similar Documents

Publication Publication Date Title
WO2021109787A1 (en) Synonym mining method, synonym dictionary application method, medical synonym mining method, medical synonym dictionary application method, synonym mining apparatus and storage medium
CN113707297B (en) Medical data processing method, device, equipment and storage medium
US9058374B2 (en) Concept driven automatic section identification
US10152575B2 (en) Adherence measurement for carepath protocol compliance
US20140344195A1 (en) System and method for machine learning and classifying data
WO2021204038A1 (en) Multi-scale clinical pathway mining method and apparatus, computer device, and storage medium
CN103631847A (en) Method and system for context-based search for a data store related to a graph node
US10984024B2 (en) Automatic processing of ambiguously labeled data
US20160098456A1 (en) Implicit Durations Calculation and Similarity Comparison in Question Answering Systems
WO2022242449A1 (en) Knowledge graph alignment model training method and apparatus, knowledge graph alignment method and apparatus, and device
Kumar et al. Classification of heart disease using naive bayes and genetic algorithm
US20220198815A1 (en) Systems and methods for classification of scholastic works
Wen et al. Cross domains adversarial learning for Chinese named entity recognition for online medical consultation
Berlanga et al. Exploring and linking biomedical resources through multidimensional semantic spaces
WO2022081712A1 (en) Systems and methods for retrieving clinical information based on clinical patient data
Jin et al. Multi-label literature classification based on the gene ontology graph
Leng et al. Bi-level artificial intelligence model for risk classification of acute respiratory diseases based on Chinese clinical data
US11847415B2 (en) Automated detection of safety signals for pharmacovigilance
Liu et al. Automatic extraction and visualization of semantic relations between medical entities from medicine instructions
US20210357586A1 (en) Reinforcement Learning Approach to Modify Sentences Using State Groups
US6732093B2 (en) Systems and methods for performing temporal logic queries
US11270800B1 (en) Specialized health care system for selecting treatment paths
Sun et al. Knowledge-guided text structuring in clinical trials
CN115620915A (en) Diagnosis and treatment data-based user portrait label mining method and device and computer equipment
Neustein et al. Application of text mining to biomedical knowledge extraction: analyzing clinical narratives and medical literature

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21784284

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21784284

Country of ref document: EP

Kind code of ref document: A1