WO2021051938A1 - 基于图分析的数据异常解析方法、系统和计算机设备 - Google Patents
基于图分析的数据异常解析方法、系统和计算机设备 Download PDFInfo
- Publication number
- WO2021051938A1 WO2021051938A1 PCT/CN2020/099235 CN2020099235W WO2021051938A1 WO 2021051938 A1 WO2021051938 A1 WO 2021051938A1 CN 2020099235 W CN2020099235 W CN 2020099235W WO 2021051938 A1 WO2021051938 A1 WO 2021051938A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- node data
- doctor
- community
- patient
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
Definitions
- the embodiments of the present application relate to the field of big data analysis, and in particular to a method, system, computer device, and computer-readable storage medium for analyzing data anomalies based on graph analysis.
- the embodiment of the present application provides a data abnormal analysis method based on graph analysis, and the method steps include:
- the medical insurance data to be analyzed is obtained from the medical insurance database, and node data and association relationship data are extracted from the medical insurance data according to keyword extraction and semantic analysis.
- the node data includes multiple patient node data, Multiple doctor node data and multiple pharmacy node data, where the association relationship data is the data that characterizes the association relationship between the node data;
- an embodiment of the present application also provides a data anomaly analysis system based on graph analysis, including:
- the receiving module is used to receive the data anomaly analysis request sent by the user terminal;
- the response module is used to obtain the medical insurance data to be analyzed from the medical insurance database in response to the data abnormality analysis request, and extract node data and associated relationship data from the medical insurance data according to keyword extraction and semantic analysis.
- the node data includes multiple Data of one patient node, multiple doctor node data, and multiple pharmacy node data, where the association relationship data is data that characterizes the association relationship between the node data;
- a building module configured to construct a relationship heterogeneous graph based on the node data and the association relationship, the relationship heterogeneous graph being constructed by using multiple node data and the association relationship between the multiple node data as edges;
- Obtaining module configured to obtain a plurality of characteristics of individual community C i of the plurality of characteristic data, the number of said plurality of characteristic data includes node data, the community density and / or medical average amount;
- the calculation module is used to calculate the abnormal detection coefficient of each community C i according to the multiple feature data of the multiple characteristics of each community C i;
- the judgment module is used for judging the abnormal patient node data in the community according to the C i abnormality detection coefficient in each community;
- the output module is used to output the abnormal patient node data to the user terminal.
- an embodiment of the present application also provides a computer device, the computer device including a memory, a processor, and computer-readable instructions stored on the memory and running on the processor, so When the computer-readable instructions are executed by the processor, the following steps are implemented:
- the medical insurance data to be analyzed is obtained from the medical insurance database, and node data and association relationship data are extracted from the medical insurance data according to keyword extraction and semantic analysis.
- the node data includes multiple patient node data, Multiple doctor node data and multiple pharmacy node data, where the association relationship data is the data that characterizes the association relationship between the node data;
- the plurality of feature data of each of a plurality of features of the community C i, C i is calculated communities abnormality detection coefficient
- an embodiment of the present application also provides a computer-readable storage medium having computer-readable instructions stored in the computer-readable storage medium, and the computer-readable instructions may be executed by at least one processor, So that the at least one processor executes the following steps:
- the medical insurance data to be analyzed is obtained from the medical insurance database, and node data and association relationship data are extracted from the medical insurance data according to keyword extraction and semantic analysis.
- the node data includes multiple patient node data, Multiple doctor node data and multiple pharmacy node data, where the association relationship data is the data that characterizes the association relationship between the node data;
- the plurality of feature data of each of a plurality of features of the community C i, C i is calculated communities abnormality detection coefficient
- the data anomaly analysis method, system, computer equipment, and computer readable storage medium provided by the embodiments of the application provided by the embodiment of the application provide an effective data anomaly analysis method for medical insurance; through the analysis of the heterogeneous graphs of the relationship between entities, high efficiency is achieved To mine fraud situations and accurately locate fraud entities, so as to further improve the accuracy and flexibility of the analysis of medical insurance data abnormalities.
- FIG. 1 is a schematic flowchart of a method for analyzing data anomalies based on graph analysis according to an embodiment of the present application.
- FIG. 2 is a schematic diagram of program modules of Embodiment 2 of a data abnormal analysis system based on graph analysis of this application.
- FIG. 3 is a schematic diagram of the hardware structure of the third embodiment of the computer equipment of this application.
- the computer device 2 will be used as an execution subject for exemplary description.
- FIG. 1 there is shown a flow chart of the method for analyzing data anomalies based on graph analysis according to an embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps.
- the following is an exemplary description with the computer device 2 as the execution subject. details as follows.
- Step S100 Receive a data abnormality analysis request sent by the user terminal.
- Step S102 in response to the data abnormality analysis request, obtain medical insurance data to be analyzed from the medical insurance database, and extract node data and associated relationship data from the medical insurance data according to keyword extraction and semantic analysis, the node data including multiple patients Node data, multiple doctor node data, and multiple pharmacy node data, and the association relationship data is the data that characterizes the association relationship between the node data.
- the medical insurance data to be analyzed is obtained from a database.
- the medical insurance data to be analyzed includes insurance information, bank information, securities information, payment information, trust information and futures information; the medical insurance database covers insurance, bank, Securities, payment, trust, futures and other fields.
- node data and association relationships are extracted from the medical insurance data through keyword extraction and semantic analysis, where the association relationships are generated based on the characteristics or relationships shared between patients, doctors, and pharmacies, for example, doctors When a patient sees a doctor, the patient will be prescribed a medication order, so the behavior of prescribing the medication order to the patient can be one of the common features of doctors, and these features can be obtained from the medical insurance data through keyword extraction and semantic analysis.
- Step S104 Construct a relational heterogeneous graph based on the data and the association relationship, and the relational heterogeneous graph is constructed by using the node data and the association relationship between the node data as edges.
- the relationship heterogeneous graph includes a first bipartite graph, a second bipartite graph, and a third bipartite graph; the step S104 may further include:
- Step S104a Acquire multiple entity features corresponding to multiple entities according to the node data, the entity features include multiple patient features of multiple patients, multiple doctor features of multiple doctors, and multiple pharmacy features of multiple pharmacies .
- the multiple entities include multiple patients, multiple doctors, and multiple pharmacies, and multiple entity features corresponding to the multiple entities are acquired according to the node data, that is, based on the multiple patient node data, According to the multiple doctor node data and the multiple pharmacy node data, multiple patient characteristics of multiple patients, multiple doctor characteristics of multiple doctors, and multiple pharmacy characteristics of multiple pharmacies are extracted.
- Step S104b construct a first bipartite graph between patient node data and doctor node data according to the multiple patient characteristics and multiple doctor characteristics; construct patient node data and pharmacy based on the multiple patient characteristics and multiple pharmacy characteristics A second bipartite graph between node data; constructing a third bipartite graph between doctor node data and pharmacy according to the multiple doctor characteristics and multiple pharmacy features.
- the bipartite graph is generated by each paired relationship including patient and doctor, patient and pharmacy, doctor and pharmacy.
- a bipartite graph is constructed using patient visits and medication records in the field of financial and social insurance as a data set.
- the bipartite graph includes a node patient and node medical insurance card graph, a node patient and node ID card graph, and a node patient and node birth city graph. , Node patient and node doctor graph, node patient and node bill graph, node doctor and node department graph, node doctor and node doctor's order item, node bill and node doctor's order item, node doctor's order item and node subcategories, etc.
- the bipartite graphs are merged to construct a heterogeneous relational graph based on the relationship between the patient, the doctor, and the pharmacy.
- the step of constructing a relationship heterogeneous graph based on the relationship between the patient, the doctor, and the pharmacy includes:
- step S104b1 the two complementary sets of vertices in each bipartite graph are split to obtain a single set of vertices.
- step S104b2 separate sets of vertices in different bipartite graphs are gathered according to the characteristics of each vertex, wherein the vertices with high similarity are merged, and the characteristics of the new vertices are updated at the same time.
- step S104b3 the edges are merged to obtain the relationship between the patient, the doctor, and the pharmacy to form a relationship heterogeneous graph, where the merged edges may include three situations:
- the first type If the two node data connected by the edge are both fused new node data, then the attributes of the edge are directly accumulated and averaged by multiple edges. Among them, the new node data is generated by the fusion of multiple node data, so There are multiple edges.
- the second type If one of the two node data connected by the edge is the new node data and the other is the original node data, the edges of the new node data are accumulated and averaged, and then the averaged result is compared with the edge of the original node data Perform cumulative averaging.
- the third type If the data of the two nodes connected by the edge are the data of the original node, the edge between the two points remains unchanged.
- the step S106 may further include:
- Step S106a clustering multiple patient node data in the heterogeneous relationship graph to obtain multiple clusters, each cluster corresponding to a cluster center.
- the multiple patient nodes are clustered according to doctor nodes in the heterogeneous relationship graph to obtain multiple clusters, each doctor corresponds to a cluster, and each cluster corresponds to a cluster center .
- Step S106b multiple extractions are performed from the multiple cluster centers according to multiple doctor nodes, one cluster center is extracted according to one doctor node each time, and the relationship heterogeneous graph is extracted from the relationship heterogeneous graph according to one cluster center extracted each time.
- One community is established, and multiple communities are obtained, wherein each of the multiple communities is a close community, and the close community is a community that has an intersection with other communities.
- the behavior of each patient who is sick and seeking medical treatment will be recorded, that is, all patients who have visited the same doctor will have the same medical care feature.
- the patient node can be clustered by the doctor node. Most doctors can cluster the patient nodes.
- clustering multiple patient node data in the relationship heterogeneous graph obtaining a patient relationship network according to the clustering, and calculating two adjacent node data B i and A i in the patient relationship network
- the cosine similarity between A ij is the j-th component of the i-th node data vector A
- B ij is the j-th component of the i-th node data vector B
- the i and j are both positive integers , Where the value range of j is [1,3]; and update the weight corresponding to each edge in the patient relationship network according to the cosine similarity formula:
- the suspect gang mining technology based on community clustering divides different communities according to the patient's medical behavior, and calculates the average similarity of the community by using the similarity of the medical behavior between patients in the community, which can be based on the average similarity. Measure the consistency of the overall behavior of the community to confirm whether it is a fraudulent behavior.
- patients in different communities are differentiated according to the medical treatment behavior of different patients.
- Patients who have seen the same doctor or have seen the same type of doctor will have the same or acquainted medical behavior, and the same medical behavior can be
- Different medical experiences can be divided into different medical treatment experiences according to the different medical treatments of patients, which can be understood as seeing different diseases at the same doctor.
- the different medical experiences of patients can be judged by the similarity of medical treatment behaviors, and the normal medical experience of patients can be obtained. In this way, the abnormal medical experience can be judged; for example, the doctor only prescribes an anesthetic order to a single patient or mainly prescribes a large amount of anesthetic order to a single patient, and it can be judged that the patient has an abnormal medical treatment behavior.
- the close community is composed of multiple strong-connected structures, where multiple vertices form a closed-loop structure; the multiple vertices are multiple corresponding to the doctor node data Patient node data, there is an edge between each doctor node data and patient node data in the closed loop.
- each patient in the closed loop has seen the same doctor or the same type of doctor, and the same type of doctor is a doctor who has a similar behavior of seeking a doctor.
- the established close community is based on the clustering of doctors, and includes multiple strong connection structures in the patient relationship network; wherein, the multiple vertices form a closed loop structure, and any two in the loop There are edges between node data.
- the strong connection structure is a closed loop of the community.
- Different closed loop structures represent different communities.
- the community refers to the collective collection of doctors, patients, and pharmacies based on doctors with similar behaviors or characteristics. There are strong doctors in this community. Similarity, the patients and pharmacies in the community are strongly associated with doctors.
- the “residents” in the community refer to the patients in the community who have seen the same doctor or have seen the same type of doctor, that is, they have similar medical behaviors. This can be used to investigate the possibility of crimes committed by the investigating team.
- Step S108 the obtaining a plurality of features each community C i of a plurality of characteristic data, the characteristic data includes the node number of the plurality of data, community density and / or medical average amount.
- Step S110 the feature data in accordance with a plurality of features of a plurality of individual community C i, C i is calculated communities abnormality detection coefficient.
- Step S112 Determine the abnormal patient node data in the community based on the C i abnormality detection coefficient in each community.
- the ratio of the total number of node data in the community C i to the total number of node data in the relational heterogeneous graph is calculated to check whether the extracted community is abnormal; the node data ratio here refers to the degree of the node data Than the sum of Degree (degree) of all node data in this community.
- the ratio when the ratio is less than the preset threshold, it indicates that the relational heterogeneous graph is a network with basically no communities, and therefore, all communities in the relational heterogeneous graph can be considered abnormal.
- the ratio when the ratio is greater than the preset threshold, not all the communities in the heterogeneous relationship graph are abnormal communities, and the preset threshold can be controllably adjusted based on the abnormality detection result.
- the community features of a given community C i are extracted to find abnormal communities related to the community features; the following groups of features are used to characterize any of the recommendations in the recommendation network Set the community C i ; the community features include: community size feature, the number of node data in C i ; community density feature, the ratio of the total number of edges in C i to the number of node data; the average amount feature, the total amount in C i and ratio of the number of node data; and edge statistics calculated anomaly score of all the doctors, and the average anomaly score in the community C i; characterized according to the community community C i performs abnormality detection.
- abnormal points are easier to isolate than normal points.
- This method uses the iForest anomaly detection algorithm, which is based on a randomly generated classification tree to isolate points from the rest to detect anomalies.
- Step S114 output the abnormal patient node data to the user terminal.
- the method further includes:
- Step S300 Extract multiple aggregated features of the relational heterogeneous graph, where the multiple aggregated features include degrees, weights, and entropy ratios between different entities.
- Step S302 Determine an abnormal entity according to the multiple aggregated characteristics.
- the aggregation feature includes: degree, the number of neighbor node data, that is
- the given node data n and its 1-hop neighbor set N where 1-hop neighbor means that the data of two nodes are connected or reachable, but there is only one intermediate node data between them;
- p k is The business between node data n and neighbor k accounts for the percentage of the total business of node data n.
- the summation term is empirical entropy, which measures the deviation of the number of services generated by different node data in n and its neighborhood set N. If the business generated by all node data in N is evenly distributed, the entropy ratio is 1. On the contrary, if n conducts most of the business with a neighbor, the distribution is very skewed, resulting in an entropy ratio close to 0.
- the empirical entropy of node data n is divided by log(
- the abnormal entity corresponds to multiple abnormal types, and the multiple abnormal types include abnormal personal level, abnormal relationship level, and abnormal medical behavior.
- Exemplary, personal-level abnormalities include: who is the main consumer of narcotics and the source of the narcotics; to whom the doctor prescribes the narcotics; which pharmacy sells a large amount of narcotics, and to whom.
- the abnormal relationship level is that the relationship is too concentrated.
- the sales target of anesthetics in a pharmacy is a very small number of patients and doctors; the doctor writes out a large number of anesthetic prescriptions and instructs patients to buy them in several pharmacies; the doctor only asks A few patients were prescribed anesthetics.
- the strong connection between node data can be considered as potential collusion. "Shopping patient” means that the patient visits a large number of doctors to obtain more prescriptions for anesthetics.
- the entropy ratio is the information difference. For example, if a doctor prescribes a large number of medicines to a small number of people (the quantity is Gaussian), the entropy ratio between them will be extremely large, which also shows this There is a problem with the doctor-patient relationship; on the contrary, if a doctor prescribes evenly distributed medicines to most patients, their entropy ratio will be relatively small. Relatively speaking, the node data doctor is more reliable.
- the abnormal medical behavior refers to abnormal behavior that cannot be proved by medical practice. These include: patients only consume anesthetics; patients and doctors only focus on anesthetics.
- these two indicators can be obtained by directly using personal anesthetic consumption/personal total drug consumption and personal anesthetic consumption/total anesthetic consumption consumption; both can be calculated separately by selecting the attribute of anesthetic after the composition is completed .
- the method further includes the step of determining that the personal level of the abnormal entity is abnormal:
- Step S400 Calculate the degree, out-degree, and in-degree of the data of each patient node and the degree, out-degree, and in-degree of the data of each doctor node in the first bipartite graph by using the PageRank algorithm.
- Step S402 According to the degree, out degree and in degree of each patient node data and the degree, out degree and in degree of each doctor node data, the patient node data and the doctor node data are connected through a directed edge to obtain the A directed graph of the relationship between the patient and the doctor, and a directed graph matrix is generated according to the directed graph of the relationship.
- Step S404 Perform a two-dimensional matrix multiplication on the directed graph matrix and iteratively change the weight to obtain a PageRank value.
- Step S406 Determine an abnormal personal level according to the PageRank value.
- a two-dimensional matrix is used to multiply, and iteratively change the weight value, multiply again, and change the weight value again. After repeated iterations, the final weight value is the PageRank value.
- the PageRank value of the node data should be the sum of the weights of all the link data of the node.
- the data anomaly analysis system 20 may include or be divided into one or more program modules.
- One or more program modules are stored in a storage medium and executed by one or more processors to complete the application and realize The above-mentioned data anomaly analysis method based on graph analysis.
- the program module referred to in the embodiment of the present application refers to a series of computer program instruction segments capable of completing specific functions, and is more suitable for describing the execution process of the data abnormality analysis system 20 in the storage medium than the program itself. The following description will specifically introduce the functions of each program module in this embodiment:
- the receiving module 200 is configured to receive a data abnormality analysis request sent by a user terminal.
- the response module 202 is configured to obtain the medical insurance data to be analyzed from the medical insurance database in response to the data abnormality analysis request, and extract node data and association relationship data from the medical insurance data according to keyword extraction and semantic analysis, the node data including Multiple patient node data, multiple doctor node data, and multiple pharmacy node data, and the association relationship data is data that characterizes the association relationship between the node data.
- the construction module 204 is configured to construct a relationship heterogeneous graph based on the node data and the association relationship, the relationship heterogeneous graph being constructed by using multiple node data and the association relationship between the multiple node data as edges.
- Obtaining module 208 configured to obtain a plurality of characteristics of individual community C i of the plurality of characteristic data, the number of said plurality of characteristic data includes node data, the community density and / or medical average amount.
- Calculation module 210 a plurality of data for each of the plurality of features of the community C i, C i is calculated communities abnormality detection coefficient.
- the judging module 212 is used for judging the abnormal patient node data in the community according to the abnormality detection coefficient of each community C i.
- the construction module 200 is further configured to: obtain multiple entity features corresponding to multiple entities according to the node data, the entity features including multiple patient features of multiple patients, multiple doctors Doctor characteristics and multiple pharmacy characteristics of multiple pharmacies; and constructing a first bipartite graph between patient node data and doctor node data according to the multiple patient characteristics and multiple doctor characteristics; according to the multiple patient characteristics and The multiple pharmacy features construct a second bipartite graph between the patient node data and the pharmacy node data; and the third bipartite graph between the doctor node data and the pharmacy is constructed according to the multiple doctor features and the multiple pharmacy features.
- the extraction module 206 is further configured to: cluster multiple patient node data in the relationship heterogeneous graph to obtain multiple clusters, each cluster corresponding to a cluster center;
- Each doctor node data is extracted multiple times from the multiple cluster centers, each time a cluster center is extracted according to one doctor node data, and a community is established from the relationship heterogeneous graph according to each cluster center extracted each time ,
- the close community is composed of multiple strong-connected structures, where multiple vertices form a closed-loop structure; the multiple vertices are multiple corresponding to the doctor node data Patient node data, there is an edge between each doctor node data and patient node data in the closed loop.
- the judgment module 212 is further configured to: extract multiple aggregated features of the relational heterogeneous graph, the multiple aggregated features including the degree, weight, and entropy ratio between different entities; An aggregate feature determines anomalous entities.
- the abnormal entity corresponds to multiple abnormal types, and the multiple abnormal types include abnormal personal level, abnormal relationship level, and abnormal medical behavior.
- the judgment module 212 is further configured to calculate the degree, out degree, and in degree of each patient node data in the first bipartite graph and the degree and out degree of each doctor node data through the PageRank algorithm And in degree; according to the degree, out degree and in degree of each patient node data and the degree, out degree and in degree of each doctor node data, the patient node data and the doctor node data are connected through a directed edge to obtain all Describe the directed graph of the relationship between the patient and the doctor, and generate a directed graph matrix according to the directed graph of the relationship; perform a two-dimensional matrix multiplication on the directed graph matrix and iteratively change the weight to obtain the PageRank value; According to the PageRank value, it is determined that the personal level is abnormal.
- the output module 214 is used to output the abnormal patient node data to the user terminal.
- the computer device 2 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
- the computer device 2 may be a rack server, a blade server, a tower server, or a cabinet server (including an independent server or a server cluster composed of multiple servers).
- the computer device 2 at least includes, but is not limited to, a memory 21, a processor 22, a network interface 23, and a data abnormality analysis system 20 that can communicate with each other through a system bus.
- the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
- the memory 21 may be an internal storage unit of the computer device 2, for example, a hard disk or a memory of the computer device 2.
- the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (Secure Digital, SD) card, flash card (Flash Card), etc.
- the memory 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
- the memory 21 is generally used to store the operating system and various application software installed in the computer device 2, for example, the program code of the data anomaly analysis system 20 based on graph analysis in the second embodiment.
- the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
- the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
- the processor 22 is generally used to control the overall operation of the computer device 2.
- the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run the data anomaly analysis system 20 based on graph analysis, so as to implement the data anomaly analysis method based on graph analysis of the first embodiment.
- the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the computer device 2 and other electronic devices.
- the network interface 23 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and the external terminal.
- the network may be Intranet, Internet, Global System of Mobile Communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
- FIG. 3 only shows the computer device 2 with components 20-23, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
- the data anomaly analysis system 20 based on graph analysis stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and are One or more processors (the processor 22 in this embodiment) are executed to complete the application.
- FIG. 2 shows a schematic diagram of program modules for implementing the graph analysis-based data anomaly analysis system 20 according to the second embodiment of the present application.
- the graph analysis-based data anomaly analysis system 20 can be divided into The receiving module 200, the response module 202, the construction module 204, the extraction module 206, the acquisition training 208, the calculation module 210, the judgment module 212, and the output module 214.
- the program module referred to in this application refers to a series of computer program instruction segments that can complete specific functions, and is more suitable than a program to describe the execution process of the graph analysis-based data anomaly analysis classification 20 in the computer device 2 .
- the specific functions of the program modules 200-214 have been described in detail in the second embodiment, and will not be repeated here.
- the computer-readable storage medium may be non-volatile or volatile, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX). Memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory , Magnetic disks, optical disks, servers, App application malls, etc., on which computer programs are stored, and the corresponding functions are realized when the programs are executed by the processor.
- the computer-readable storage medium of this embodiment is used in the data anomaly analysis system 20 based on graph analysis, and the processor executes the following steps:
- the medical insurance data to be analyzed is obtained from the medical insurance database, and node data and association relationship data are extracted from the medical insurance data according to keyword extraction and semantic analysis.
- the node data includes multiple patient node data, Multiple doctor node data and multiple pharmacy node data, where the association relationship data is the data that characterizes the association relationship between the node data;
- the plurality of feature data of each of a plurality of features of the community C i, C i is calculated communities abnormality detection coefficient
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Technology Law (AREA)
- Software Systems (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
一种基于图分析的数据异常解析方法,所述方法包括:获取待分析医保数据,基于待分析医保数据构建关系异构图,关系异构图是以多个节点数据以及多个节点数据之间的关联关系为边构建得到的;对关系异构图中的患者节点数据进行聚类,以根据各个医生节点数据从所述关系异构图中提取多个社区C={C 1,C 2,...,C k}(S106);获取各个社区C i的多个特征的多个特征数据;根据各个社区C i的多个特征的多个特征数据,计算各个社区C i异常检测系数(S110);及根据各个社区C i异常检测系数,确定是否出现欺诈事件。上述方法通过对医保数据构成的关系异构图进行分析,实现高效地挖掘欺诈情形并精确定位欺诈实体,提高了医保数据异常解析的准确性和灵活性。
Description
本申请申明2019年09月16日递交的申请号为201910871381.3、名称为“基于图分析的数据异常解析方法、系统和计算机设备”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
本申请实施例涉及大数据分析领域,尤其涉及一种基于图分析的数据异常解析方法、系统、计算机设备及计算机可读存储介质。
随着医保市场逐渐扩大,医保领域的数据分析技术得到快速发展。麦肯锡在一份基于大数据分析得出的权威报告中指出,医保是未来最有前途的应用领域。医保领域中存在的欺诈、浪费和滥用(FWA)等问题造成了巨额的医保经费损失。医保行业、数据分析行业的研究团队为解决欺诈问题付出了很多努力。医保反欺诈研究具有很高的经济价值,但由于面临技术上的诸多困难,欺诈检测问题仍未得到解决。医疗数据通常规模大且多样化,还随时间动态变化,因此需要从多个角度开展分析,挖掘欺诈情形。
传统的欺诈检测方法从专业领域知识出发,设计一套欺诈检测规则,着重发现违反这些规则的行为。但是,发明人意识到,虽然这种方法很有效,但会受限于领域专家的知识层面,这些知识可能是缺乏准确性和完整性。此外,欺诈行为经过不断进化,可以规避事先设定的检测规则。数据驱动的机器学习方法,可以从真实数据中识别正常模式并检测偏差,这种方法更灵活,但由于搜索空间很大会造成庞大的计算量。
因此,如何高效地挖掘欺诈情形并精确定位欺诈实体,从而进一步提高医保数据异常解析的准确性和灵活性,成为了当前要解决的技术问题之一。
发明内容
有鉴于此,有必要提供一种基于图分析的数据异常解析方法、系统、计算机设备及计算机可读存储介质,以解决当前反欺诈手段缺乏准确性和完整性和欺诈情形难以精确定位等技术问题。
为实现上述目的,本申请实施例提供了基于图分析的数据异常解析方法,所述方法步骤包括:
接收用户终端发送的数据异常分析请求;
响应于所述数据异常分析请求,从医保数据库中获取待分析医保数据,根据关键词提取及语义分析从所述医保数据提取节点数据及关联关系数据,所述节点数据包括多个患者节点数据、多个医生节点数据和多个药房节点数据,所述关联关系数据为所述表征所述节点数据之间关联关系的数据;
根据所述数据及所述关联关系构建关系异构图,所述关系异构图是以节点数据以及节点数据之间的关联关系为边构建得到的;
根据各个医生节点数据从所述关系异构图中提取多个社区;
获取各个社区的多个特征的多个特征数据,所述多个特征包括节点数据数量特征、社区密度特征和/或平均医疗金额特征;
根据各个社区的多个特征的多个特征数据,计算各个社区异常检测系数;及
根据各个社区异常检测系数,判断所述社区中出现异常的患者节点数据;
输出所述出现异常的患者节点数据至用户终端。
为实现上述目的,本申请实施例还提供了基于图分析的数据异常解析系统,包括:
接收模块,用于接收用户终端发送的数据异常分析请求;
响应模块,用于响应于所述数据异常分析请求,从医保数据库中获取待分析医保数据,根据关键词提取及语义分析从所述医保数据提取节点数据及关联关系数据,所述节点数据包括多个患者节点数据、多个医生节点数据和多个药房节点数据,所述关联关系数据为表征所述节点数据之间关联关系的数据;
构建模块,用于根据所述节点数据及所述关联关系构建关系异构图,所述关系异构图是以多个节点数据以及多个节点数据之间的关联关系为边构建得到的;
提取模块,根据各个医生节点数据从所述关系异构图中提取多个社区C={C
1,C
2,...,C
k};
获取模块,用于获取各个社区C
i的多个特征的多个特征数据,所述多个特征数据包括节点数据数量、社区密度和/或平均医疗金额;
计算模块,用于根据各个社区C
i的多个特征的多个特征数据,计算各个社区C
i异常检测系数;及
判断模块,用于根据各个社区C
i异常检测系数,判断所述社区中出现异常的患者节点数据;
输出模块,用于输出所述出现异常的患者节点数据至用户终端。
为实现上述目的,本申请实施例还提供了一种计算机设备,,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述计算机可读指令被处理器执行时实现以下步骤:
接收用户终端发送的数据异常分析请求;
响应于所述数据异常分析请求,从医保数据库中获取待分析医保数据,根据关键词提取及语义分析从所述医保数据提取节点数据及关联关系数据,所述节点数据包括多个患者节点数据、多个医生节点数据和多个药房节点数据,所述关联关系数据为所述表征所述节点数据之间关联关系的数据;
根据所述数据及所述关联关系构建关系异构图,所述关系异构图是以节点数据以及节点数据之间的关联关系为边构建得到的;
根据各个医生节点数据从所述关系异构图中提取多个社区C={C
1,C
2,...,C
k};
获取各个社区C
i的多个特征的多个特征数据,所述多个特征包括节点数据数量特征、社区密度特征和/或平均医疗金额特征;
根据各个社区C
i的多个特征的多个特征数据,计算各个社区C
i异常检测系数;
根据各个社区C
i异常检测系数,判断所述社区中出现异常的患者节点数据;及
输出所述出现异常的患者节点数据至用户终端。
为实现上述目的,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行如下步骤:
接收用户终端发送的数据异常分析请求;
响应于所述数据异常分析请求,从医保数据库中获取待分析医保数据,根据关键词提取及语义分析从所述医保数据提取节点数据及关联关系数据,所述节点数据包括多个患者节点数据、多个医生节点数据和多个药房节点数据,所述关联关系数据为所述表征所述节 点数据之间关联关系的数据;
根据所述数据及所述关联关系构建关系异构图,所述关系异构图是以节点数据以及节点数据之间的关联关系为边构建得到的;
根据各个医生节点数据从所述关系异构图中提取多个社区C={C
1,C
2,...,C
k};
获取各个社区C
i的多个特征的多个特征数据,所述多个特征包括节点数据数量特征、社区密度特征和/或平均医疗金额特征;
根据各个社区C
i的多个特征的多个特征数据,计算各个社区C
i异常检测系数;
根据各个社区C
i异常检测系数,判断所述社区中出现异常的患者节点数据;及
输出所述出现异常的患者节点数据至用户终端。
本申请实施例提供的基于图分析的数据异常解析方法、系统、计算机设备及计算机可读存储介质,为医保提供了有效的数据异常解析方法;通过对实体构成的关系异构图分析,实现高效地挖掘欺诈情形并精确定位欺诈实体,从而进一步提高医保数据异常解析的准确性和灵活性。
图1为本申请实施例基于图分析的数据异常解析方法的流程示意图。
图2为本申请基于图分析的数据异常解析系统实施例二的程序模块示意图。
图3为本申请计算机设备实施例三的硬件结构示意图。
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
以下实施例中,将以计算机设备2为执行主体进行示例性描述。
实施例一
参阅图1,示出了本申请实施例之基于图分析的数据异常解析方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备2为执行主体进行示例性描述。具体如下。
步骤S100,接收用户终端发送的数据异常分析请求。
步骤S102,响应于所述数据异常分析请求,从医保数据库中获取待分析医保数据,根据关键词提取及语义分析从所述医保数据提取节点数据及关联关系数据,所述节点数据包括多个患者节点数据、多个医生节点数据和多个药房节点数据,所述关联关系数据为所述表征所述节点数据之间关联关系的数据。
示例性的,从数据库中获取所述待分析医保数据,所述待分析医保数据包括保险信息、 银行信息、证券信息、支付信息、信托信息和期货信息;所述医保数据库涵盖了保险、银行、证券、支付、信托、期货等领域。
示例性的,通过关键词提取及语义分析从所述医保数据提取节点数据及关联关系,其中,所述关联关系是根据患者、医生和药房之间共有的特征或关系生成得到的,例如,医生给患者看病都会给患者开药单,那么所述给患者开药单的行为就可作为医生的共有特征之一,这些特征都可通过关键词提取及语义分析从所述医保数据中获取得到。
步骤S104,根据所述数据及所述关联关系构建关系异构图,所述关系异构图是以节点数据以及节点数据之间的关联关系为边构建得到的。
具体的,所述关系异构图包括第一二分图、第二二分图和第三二分图;所述步骤S104可以进一步包括:
步骤S104a,根据所述节点数据获取多个实体对应的多个实体特征,所述实体特征包括多个患者的多个患者特征、多个医生的多个医生特征和多个药房的多个药房特征。
示例性的,所述多个实体包括多个患者、多个医生和多个药房,根据所述节点数据获取多个实体对应的多个实体特征,即,根据所述多个患者节点数据、所述多个医生节点数据和所述多个药房节点数据,提取多个患者的多个患者特征、多个医生的多个医生特征和多个药房的多个药房特征。
步骤S104b,根据所述多个患者特征和多个医生特征构建患者节点数据与医生节点数据之间的第一二分图;根据所述多个患者特征和多个药房特征构建患者节点数据与药房节点数据之间的第二二分图;根据所述多个医生特征和多个药房特征构建医生节点数据与药房之间的第三二分图。
将每个成对关系包括患者和医生,患者和药房,医生和药房,生成二分图。
示例性的,以金融社保领域的患者就诊和拿药记录作为数据集构建二分图,所述二分图包括节点患者与节点医保卡图、节点患者与节点身份证图、节点患者与节点出生城市图、节点患者与节点医生图、节点患者与节点账单图、节点医生与节点科室图、节点医生与节点医嘱项、节点账单与节点医嘱项、节点医嘱项与节点子类等等。
所述二分图进行融合,根据患者、医生和药房的关系构建关系异构图。
示例性的,所述根据患者、医生和药房的关系构建关系异构图的步骤,包括:
步骤S104b1,将每个二分图中互补相交的两个顶点集合进行拆分,以得到单独的顶点集合。
步骤S104b2,将不同二分图中的单独的顶点集合根据每个顶点的特征进行聚集,其中,相似度高的顶点会进行合并,且同时更新新顶点的特征。
步骤S104b3,对边进行融合,以得到所述患者、所述医生和所述药房之间关系形成关系异构图,其中,所述对边进行融合的可以包括三种情况:
第一种:若边连接的两个节点数据皆是融合的新节点数据,则直接对边的属性进行多条边的累加平均,其中,新节点数据是由多个节点数据融合生成的,故存在多条边。
第二种:若边连接的两个节点数据中有一个是新节点数据另外一个为原节点数据,则先对新节点数据的边进行累加平均,再将平均得到的结果与原节点数据的边进行累加平均。
第三种:若边连接的两个节点数据均为原节点数据,则两点之间的边不变。
通过通过以上多个二分图融合的方法,患者、医生和药房之间关系自动会形成关系异构图。
步骤S106,根据各个医生节点数据从所述关系异构图中提取多个社区C={C
1,C
2,...,C
k}。
具体的,所述步骤S106可以进一步包括:
步骤S106a,对所述关系异构图中的多个患者节点数据进行聚类,得到多个聚类,每个聚类对应一个聚类中心。
示例性的,在所述关系异构图中根据医生节点对所述多个患者节点进行聚类,以得到多个聚类,每个医生对应一个聚类,每个聚类对应一个聚类中心。
步骤S106b,根据多个医生节点从所述多个聚类中心中进行多次提取,每次根据一个医生节点提取一个聚类中心,根据每次提取的一个聚类中心从所述关系异构图建立一个社区,得到多个社区,其中,所述多个社区中的每个社区都为紧密社区,所述紧密社区为与其他社区有交集的社区。
示例性的,每个患者生病就医的行为都会被记录,即所有就诊过相同医生患者都会有一个相同的就医特征,根据这一就医特征就可以通过该医生节点对患者节点进行聚类,每个医生多可以对所述患者节点进行聚类。
示例性的,对所述关系异构图中的多个患者节点数据进行聚类,根据所述聚类得到患者关系网络,计算所述患者关系网络中两个相邻节点数据B
i和A
i之间的余弦相似度,其中A
ij为第i个节点数据向量A的第j个分量,其中B
ij为第i个节点数据向量B的第j个分量,所述i和j均为正整数,其中j的取值范围为[1,3];以及根据余弦相似度公式更新患者关系网络中每条边对应的权重:
基于平均相似度计算公式,根据所述患者关系网络中每条边更新后的权重系数w
i,其中w
i为第i条边的权重系数,N表示总共有N个社区封闭环,计算每个社区封闭环对应的平均相似度:
示例性的,基于社区聚类的嫌疑团伙挖掘技术,根据患者就诊行为划分不同的社区,利用社区内患者之间的就诊行为相似度计算该社区的平均相似度,由此可以根据平均相似度以衡量社区的整体行为的一致性,以确认是否为骗保行为。
示例性的,不同社区的患者是根据不同患者的就医行为来进行区分的,就诊过同一个医生或都看过同一类医生的患者会有相同或相识的就医行为,而相同的就医行为又可根据患者就医的不同可分为不同的就医经历,可理解为在相同的医生处看过不同的病,通过就诊行为相似度可以判断患者的不同医经历,即可得到患者的正常就医经历,以此来判断异常的就医经历;例如医生只给单一患者开麻醉药的单或主要给单一患者开出大量麻醉药的单,即可判断出该医生该患者存在异常的就医行为。
步骤S106c,从所述多个社区中提取一组社区C={C
1,C
2,...,C
k}。
示例性的,所述紧密社区是由多个强联通结构构成的,所述多个强联通结构为多个顶点构成一个封闭环结构;所述多个顶点为所述医生节点数据对应的多个患者节点数据,所述封闭环内的每个医生节点数据和患者节点数据之间都存在边。
示例性的,所述封闭环中的每个患者都看过同一个医生或同一类医生,所述同一类医生为拥有相似的就医行为的医生。
示例性的,所述建立的紧密社区即是根据医生聚类后,在患者关系网络中包括多个强联通结构;其中,所述多个顶点构成一个封闭环结构,且环内的任意两个节点数据之间存在边。强联通结构为社区封闭环,不同的封闭环结构代表不同的社区,所述社区是指基于相似行为或特征的医生聚集而成的医生、患者和药房的共同集合,该社区中的医生存在强相似性,该社区中的患者和药房与医生具有强关联的特性,社区中的“居民”是指就诊过同 一个医生或都看过同一类医生的社区患者,即他们拥有相似的就医行为,可以此来侦查团队作案可能。
步骤S108,获取各个社区C
i的多个特征的多个特征数据,所述多个特征数据包括节点数据数量、社区密度和/或平均医疗金额。
步骤S110,根据各个社区C
i的多个特征的多个特征数据,计算各个社区C
i异常检测系数。
步骤S112,根据各个社区C
i异常检测系数,判断所述社区中出现异常的患者节点数据。
示例性的,计算社区C
i中节点数据总数相对于关系异构图中节点数据总数的比率来检查提取的社区是否存在异常;这里的节点数据比率指的是该节点数据的Degree(度)数比该社区所有节点数据的Degree(度)数之和。
示例性的,当所述比率小于预设的阈值时,则表明关系异构图中是一个基本上没有社区的网络,因此可认为所述关系异构图中所有的社区都是异常的。相反,,当所述比率大于预设的阈值时,则关系异构图中存在的社区并非全部都是异常社区,其中,所述预设的阈值可以通过异常检测结果进行可控调整。
示例性的,当所述比率大于预设的阈值时,通过提取给定社区C
i的社区特征,寻找与所述社区特征相关的异常社区;通过以下几组特征来表征推荐网络中的任何给定社区C
i;所述社区特征包括:社区大小特征,C
i中的节点数据数;社区密度特征,C
i中总边数与节点数据数的比值;平均金额特征,C
i中总金额与节点数据数的比值;并根据边统计数据计算所有医生的异常分数,并计算社区C
i的平均异常分数;根据所述社区C
i的社区特征,进行异常检测。
示例性的,异常点比正常点更容易隔离,本方法使用iForest异常检测算法,所述iForest异常检测算法基于随机生成的分类树将点与其余点隔离来检测异常。
步骤S114,输出所述出现异常的患者节点数据至用户终端。
示例性的,所述方法还包括:
步骤S300,提取所述关系异构图的多个聚合特征,所述多个聚合特征包括不同实体之间的度、权重以及熵比。
步骤S302,根据所述多个聚合特征确定异常实体。
示例性的,所述聚合特征包括:度,邻居节点数据数,即|S|其中S代表节点数据的邻居集合;熵比:
其中给定节点数据n及其1跳邻居集合N,其中,1跳邻居是指两个节点数据之间是连通或可达的,但它们之间存在且仅存在一个中间节点数据;p
k是节点数据n与邻居k发生的业务占节点数据n总业务的百分比。求和项是经验熵,衡量n与其邻域集合N中的不同节点数据产生业务数量的偏差。如果n在N中所有节点数据产生的业务是均匀分布的,则熵比为1。相反,n与一个邻居进行大部分业务,则表现为分布非常偏斜,导致熵比接近 0。节点数据n的经验熵除以log(|N|)实现归一化,其中|N|表示邻居集合N中节点数据的数量。
示例性的,所述异常实体对应有多个异常类型,所述多个异常类型包括个人水平异常、关系水平异常以及医疗行为异常。
示例性的,基于个人水平的异常包括:谁是麻醉药的主要消费者以及获取麻醉品药的来源;医生给谁开出了麻醉药;哪家药房出售大量麻醉药,以及出售对象是谁。
示例性的,所述关系水平异常为关系过于集中,例如:药房的麻醉药销售对象是极少数患者和医生;医生开出了大量的麻醉药处方,指导患者在几家药房购买;医生只向少数病人开了麻醉药。节点数据之间的强联系可认为有潜在的共谋。“购物型患者”,即患者访问大量医生以获得更多的麻醉药处方。
示例性的,所述熵比即是信息差,例如:一个医生如果给少数人开出了大量的药(数量呈高斯分布),则他们之间的熵比会特别大,也说明了这种医患关系存在问题;相反,如果一个医生给大多数病人开了数量呈均匀分布的药品,则他们的熵比会比较小,相对而言该节点数据医生较靠谱。
示例性的,所述医疗行为异常是指医疗实践无法证明的行为异常。这些包括:患者只消费麻醉药;患者和医生之间仅关注麻醉药。
示例性的,为了量化这些指标,我们将计算消费金额和医嘱总量中麻醉药所占的百分比。对于异常患者个体,直接利用个人麻醉药消费/个人总用药消费以及个人麻醉消费/总麻醉药用量消费即可得到这两项指标;均可在构图完成后单独选取麻醉药这一属性进行计算。
示例性的,所述方法还包括确定所述异常实体的个人水平异常的步骤:
步骤S400,通过PageRank算法计算所述第一二分图中每个患者节点数据的度、出度和入度与每个医生节点数据的度、出度和入度。
步骤S402,根据每个患者节点数据的度、出度和入度与各个医生节点数据的度、出度和入度,在患者节点数据与医生节点数据之间通过有向边连接,得到所述患者和所述医生的关系有向图,并根据所述关系有向图生成有向图矩阵。
步骤S404,对所述有向图矩阵进行二维矩阵相乘并反复迭代改变权值,得到PageRank值。
步骤S406,根据所述PageRank值确定个人水平异常。
示例性的,对于这个有向图矩阵,利用二维矩阵相乘,反复迭代改变权值,再次相乘,再次改变权值,反复多次后最终权值的收敛值即是PageRank值。同样,如果一个节点数据被其他很多节点数据链接到,那么说明该节点数据受到普遍认可和信赖;不同节点数据被其他的节点数据链接时的权重也各不相同;一个节点数据的重要性,也即该节点数据PageRank值应该为所有链接该节点数据的权重之和。
实施例二
图2为本申请基于图分析的数据异常解析系统实施例二的程序模块示意图。数据异常解析系统20可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述基于图分析的数据异常解析方法。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序本身更适合于描述数据异常解析系统20在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能:
接收模块200,用于接收用户终端发送的数据异常分析请求。
响应模块202,用于响应于所述数据异常分析请求,从医保数据库中获取待分析医保数据,根据关键词提取及语义分析从所述医保数据提取节点数据及关联关系数据,所述节点数据包括多个患者节点数据、多个医生节点数据和多个药房节点数据,所述关联关系数据为表征所述节点数据之间关联关系的数据。
构建模块204,用于根据所述节点数据及所述关联关系构建关系异构图,所述关系异构图是以多个节点数据以及多个节点数据之间的关联关系为边构建得到的。
提取模块206,根据各个医生节点数据从所述关系异构图中提取多个社区C={C
1,C
2,...,C
k}。
获取模块208,用于获取各个社区C
i的多个特征的多个特征数据,所述多个特征数据包括节点数据数量、社区密度和/或平均医疗金额。
计算模块210,用于根据各个社区C
i的多个特征的多个特征数据,计算各个社区C
i异常检测系数。
判断模块212,用于根据各个社区C
i异常检测系数,判断所述社区中出现异常的患者节点数据。
示例性的,所述构建模块200,还用于:根据所述节点数据获取多个实体对应的多个实体特征,所述实体特征包括多个患者的多个患者特征、多个医生的多个医生特征和多个药房的多个药房特征;及根据所述多个患者特征和多个医生特征构建患者节点数据与医生节点数据之间的第一二分图;根据所述多个患者特征和多个药房特征构建患者节点数据与药房节点数据之间的第二二分图;根据所述多个医生特征和多个药房特征构建医生节点数据与药房之间的第三二分图。
示例性的,所述提取模块206,还用于:对所述关系异构图中的多个患者节点数据进行聚类,得到多个聚类,每个聚类对应一个聚类中心;根据多个医生节点数据从所述多个聚类中心中进行多次提取,每次根据一个医生节点数据提取一个聚类中心,根据每次提取的一个聚类中心从所述关系异构图建立一个社区,得到多个社区,其中,所述多个社区中的每个社区都为紧密社区,所述紧密社区为与其他社区有交集的社区,所述多个紧密社区的数量等于所述多个聚类中心的数量;从所述多个社区中提取一组社区C={C
1,C
2,...,C
k}。
示例性的,所述紧密社区是由多个强联通结构构成的,所述多个强联通结构为多个顶点构成一个封闭环结构;所述多个顶点为所述医生节点数据对应的多个患者节点数据,所述封闭环内的每个医生节点数据和患者节点数据之间都存在边。
示例性的,所述判断模块212,还用于:提取所述关系异构图的多个聚合特征,所述多个聚合特征包括不同实体之间的度、权重以及熵比;根据所述多个聚合特征确定异常实体。所述异常实体对应有多个异常类型,所述多个异常类型包括个人水平异常、关系水平异常以及医疗行为异常。
示例性的,所述判断模块212,还用于:通过PageRank算法计算所述第一二分图中每个患者节点数据的度、出度和入度与每个医生节点数据的度、出度和入度;根据每个患者节点数据的度、出度和入度与各个医生节点数据的度、出度和入度,在患者节点数据与医生节点数据之间通过有向边连接,得到所述患者和所述医生的关系有向图,并根据所述关系有向图生成有向图矩阵;对所述有向图矩阵进行二维矩阵相乘并反复迭代改变权值,得到PageRank值;根据所述PageRank值确定个人水平异常。
输出模块214,用于输出所述出现异常的患者节点数据至用户终端。
实施例三
参阅图3,是本申请实施例三之计算机设备的硬件架构示意图。本实施例中,所述计算机设备2是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。该计算机设备2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图所示,所述计算机设备2至少包括,但不限于,可通过系统总线相互通信连接存储器21、处理器22、网络接 口23、以及数据异常解析系统20。
本实施例中,存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备2的内部存储单元,例如该计算机设备2的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,存储器21还可以既包括计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,存储器21通常用于存储安装于计算机设备2的操作系统和各类应用软件,例如实施例二的基于图分析的数据异常解析系统20的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制计算机设备2的总体操作。本实施例中,处理器22用于运行存储器21中存储的程序代码或者处理数据,例如运行基于图分析的数据异常解析系统20,以实现实施例一的基于图分析的数据异常解析方法。
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述计算机设备2与其他电子装置之间建立通信连接。例如,所述网络接口23用于通过网络将所述计算机设备2与外部终端相连,在所述计算机设备2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图3仅示出了具有部件20-23的计算机设备2,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本实施例中,存储于存储器21中的基于图分析的数据异常解析系统20还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并由一个或多个处理器(本实施例为处理器22)所执行,以完成本申请。
例如,图2示出了本申请实施例二之所述实现基于图分析的数据异常解析系统20的程序模块示意图,该实施例中,所述基于图分析的数据异常解析系统20可以被划分为接收模块200、响应模块202、构建模块204、提取模块206、获取训练208、计算模块210、判断模块212和输出模块214。其中,本申请所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述所述基于图分析的数据异常解析分类20在所述计算机设备2中的执行过程。所述程序模块200-214的具体功能在实施例二中已有详细描述,在此不再赘述。
实施例四
本实施例还提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器执行时实现相应功能。本实施例的计算机可读存储介质用于基于图分析的数据异常解析系统20,被处理器执行如下步骤:
接收用户终端发送的数据异常分析请求;
响应于所述数据异常分析请求,从医保数据库中获取待分析医保数据,根据关键词提取及语义分析从所述医保数据提取节点数据及关联关系数据,所述节点数据包括多个患者节点数据、多个医生节点数据和多个药房节点数据,所述关联关系数据为所述表征所述节点数据之间关联关系的数据;
根据所述数据及所述关联关系构建关系异构图,所述关系异构图是以节点数据以及节点数据之间的关联关系为边构建得到的;
根据各个医生节点数据从所述关系异构图中提取多个社区C={C
1,C
2,...,C
k};
获取各个社区C
i的多个特征的多个特征数据,所述多个特征包括节点数据数量特征、社区密度特征和/或平均医疗金额特征;
根据各个社区C
i的多个特征的多个特征数据,计算各个社区C
i异常检测系数;
根据各个社区C
i异常检测系数,判断所述社区中出现异常的患者节点数据;及
输出所述出现异常的患者节点数据至用户终端。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。
Claims (20)
- 一种基于图分析的数据异常解析方法,其中,所述方法包括:接收用户终端发送的数据异常分析请求;响应于所述数据异常分析请求,从医保数据库中获取待分析医保数据,根据关键词提取及语义分析从所述医保数据提取节点数据及关联关系数据,所述节点数据包括多个患者节点数据、多个医生节点数据和多个药房节点数据,所述关联关系数据为所述表征所述节点数据之间关联关系的数据;根据所述数据及所述关联关系构建关系异构图,所述关系异构图是以节点数据以及节点数据之间的关联关系为边构建得到的;根据各个医生节点数据从所述关系异构图中提取多个社区C={C 1,C 2,...,C k};获取各个社区C i的多个特征的多个特征数据,所述多个特征包括节点数据数量特征、社区密度特征和/或平均医疗金额特征;根据各个社区C i的多个特征的多个特征数据,计算各个社区C i异常检测系数;根据各个社区C i异常检测系数,判断所述社区中出现异常的患者节点数据;及输出所述出现异常的患者节点数据至用户终端。
- 如权利要求1所述的基于图分析的数据异常解析方法,其中,所述关系异构图包括第一二分图、第二二分图和第三二分图;所述根据所述节点数据及所述关联关系构建关系异构图的步骤,包括:根据所述节点数据获取多个实体对应的多个实体特征,所述实体特征包括多个患者的多个患者特征、多个医生的多个医生特征和多个药房的多个药房特征;根据所述多个患者特征和多个医生特征构建患者节点数据与医生节点数据之间的第一二分图;根据所述多个患者特征和多个药房特征构建患者节点数据与药房节点数据之间的第二二分图;根据所述多个医生特征和多个药房特征构建医生节点数据与药房之间的第三二分图。
- 如权利要求2所述的基于图分析的数据异常解析方法,其中,所述方法还包括:提取所述关系异构图的多个聚合特征,所述多个聚合特征包括不同实体之间的度、权重以及熵比;及根据所述多个聚合特征确定异常实体;所述异常实体对应有多个异常类型,所述多个异常类型包括个人水平异常、关系水平异常以及医疗行为异常。
- 如权利要求3所述的基于图分析的数据异常解析方法,其中,所述方法还包括确定所述异常实体的个人水平异常的步骤:通过PageRank算法计算所述第一二分图中每个患者节点数据的度、出度和入度与每个医生节点数据的度、出度和入度;根据每个患者节点数据的度、出度和入度与各个医生节点数据的度、出度和入度,在患者节点数据与医生节点数据之间通过有向边连接,得到所述患者和所述医生的关系有向图,并根据所述关系有向图生成有向图矩阵;对所述有向图矩阵进行二维矩阵相乘并反复迭代改变权值,得到PageRank值;根据所述PageRank值确定个人水平异常。
- 如权利要求1所述的基于图分析的数据异常解析方法,其中,所述根据各个医生节 点数据从所述关系异构图中提取多个社区C={C 1,C 2,...,C k}的步骤,包括:对所述关系异构图中的多个患者节点数据进行聚类,得到多个聚类,每个聚类对应一个聚类中心;根据多个医生节点数据从所述多个聚类中心中进行多次提取,每次根据一个医生节点数据提取一个聚类中心,根据每次提取的一个聚类中心从所述关系异构图建立一个社区,得到多个社区,其中,所述多个社区中的每个社区都为紧密社区,所述紧密社区为与其他社区有交集的社区;及从所述多个社区中提取一组社区C={C 1,C 2,...,C k}。
- 如权利要求5所述的基于图分析的数据异常解析方法,其中,所述紧密社区是由多个强联通结构构成的,所述多个强联通结构为多个顶点构成一个封闭环结构;所述多个顶点为所述医生节点数据对应的多个患者节点数据,所述封闭环内的每个医生节点数据和患者节点数据之间都存在边。
- 一种基于图分析的数据异常解析系统,其中,包括:接收模块,用于接收用户终端发送的数据异常分析请求;响应模块,用于响应于所述数据异常分析请求,从医保数据库中获取待分析医保数据,根据关键词提取及语义分析从所述医保数据提取节点数据及关联关系数据,所述节点数据包括多个患者节点数据、多个医生节点数据和多个药房节点数据,所述关联关系数据为表征所述节点数据之间关联关系的数据;构建模块,用于根据所述节点数据及所述关联关系构建关系异构图,所述关系异构图是以多个节点数据以及多个节点数据之间的关联关系为边构建得到的;提取模块,根据各个医生节点数据从所述关系异构图中提取多个社区C={C 1,C 2,...,C k};获取模块,用于获取各个社区C i的多个特征的多个特征数据,所述多个特征数据包括节点数据数量、社区密度和/或平均医疗金额;计算模块,用于根据各个社区C i的多个特征的多个特征数据,计算各个社区C i异常检测系数;判断模块,用于根据各个社区C i异常检测系数,判断所述社区中出现异常的患者节点数据;及输出模块,用于输出所述出现异常的患者节点数据至用户终端。
- 如权利要求5所述的基于图分析的数据异常解析系统,其中,所述提取模块还用于:对所述关系异构图中的多个患者节点数据进行聚类,得到多个聚类,每个聚类对应一个聚类中心;根据多个医生节点数据从所述多个聚类中心中进行多次提取,每次根据一个医生节点数据提取一个聚类中心,根据每次提取的一个聚类中心从所述关系异构图建立一个社区,得到多个社区,其中,所述多个社区中的每个社区都为紧密社区,所述紧密社区为与其他社区有交集的社区,所述多个紧密社区的数量等于所述多个聚类中心的数量;及从所述多个社区中提取一组社区C={C 1,C 2,...,C k}。
- 一种计算机设备,所述计算机设备包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,其中,所述计算机可读指令被处理器执行时实现 以下步骤:接收用户终端发送的数据异常分析请求;响应于所述数据异常分析请求,从医保数据库中获取待分析医保数据,根据关键词提取及语义分析从所述医保数据提取节点数据及关联关系数据,所述节点数据包括多个患者节点数据、多个医生节点数据和多个药房节点数据,所述关联关系数据为所述表征所述节点数据之间关联关系的数据;根据所述数据及所述关联关系构建关系异构图,所述关系异构图是以节点数据以及节点数据之间的关联关系为边构建得到的;根据各个医生节点数据从所述关系异构图中提取多个社区C={C 1,C 2,...,C k};获取各个社区C i的多个特征的多个特征数据,所述多个特征包括节点数据数量特征、社区密度特征和/或平均医疗金额特征;根据各个社区C i的多个特征的多个特征数据,计算各个社区C i异常检测系数;根据各个社区C i异常检测系数,判断所述社区中出现异常的患者节点数据;及输出所述出现异常的患者节点数据至用户终端。
- 如权利要求9所述的计算机设备,其中,所述计算机可读指令被处理器执行时还实现以下步骤:根据所述节点数据获取多个实体对应的多个实体特征,所述实体特征包括多个患者的多个患者特征、多个医生的多个医生特征和多个药房的多个药房特征;根据所述多个患者特征和多个医生特征构建患者节点数据与医生节点数据之间的第一二分图;根据所述多个患者特征和多个药房特征构建患者节点数据与药房节点数据之间的第二二分图;根据所述多个医生特征和多个药房特征构建医生节点数据与药房之间的第三二分图。
- 如权利要求10所述的计算机设备,其中,所述计算机可读指令被处理器执行时还实现以下步骤:提取所述关系异构图的多个聚合特征,所述多个聚合特征包括不同实体之间的度、权重以及熵比;及根据所述多个聚合特征确定异常实体;所述异常实体对应有多个异常类型,所述多个异常类型包括个人水平异常、关系水平异常以及医疗行为异常。
- 如权利要求11所述的计算机设备,其中,所述计算机可读指令被处理器执行时还实现以下步骤:通过PageRank算法计算所述第一二分图中每个患者节点数据的度、出度和入度与每个医生节点数据的度、出度和入度;根据每个患者节点数据的度、出度和入度与各个医生节点数据的度、出度和入度,在患者节点数据与医生节点数据之间通过有向边连接,得到所述患者和所述医生的关系有向图,并根据所述关系有向图生成有向图矩阵;对所述有向图矩阵进行二维矩阵相乘并反复迭代改变权值,得到PageRank值;根据所述PageRank值确定个人水平异常。
- 如权利要求9所述的计算机设备,其中,所述计算机可读指令被处理器执行时还实现以下步骤:对所述关系异构图中的多个患者节点数据进行聚类,得到多个聚类,每个聚类对应一个聚类中心;根据多个医生节点数据从所述多个聚类中心中进行多次提取,每次根据一个医生节点数据提取一个聚类中心,根据每次提取的一个聚类中心从所述关系异构图建立一个社区,得到多个社区,其中,所述多个社区中的每个社区都为紧密社区,所述紧密社区为与其他社区有交集的社区;及从所述多个社区中提取一组社区C={C 1,C 2,...,C k}。
- 如权利要求13所述的计算机设备,其中,所述紧密社区是由多个强联通结构构成的,所述多个强联通结构为多个顶点构成一个封闭环结构;所述多个顶点为所述医生节点数据对应的多个患者节点数据,所述封闭环内的每个医生节点数据和患者节点数据之间都存在边。
- 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行如下步骤:接收用户终端发送的数据异常分析请求;响应于所述数据异常分析请求,从医保数据库中获取待分析医保数据,根据关键词提取及语义分析从所述医保数据提取节点数据及关联关系数据,所述节点数据包括多个患者节点数据、多个医生节点数据和多个药房节点数据,所述关联关系数据为所述表征所述节点数据之间关联关系的数据;根据所述数据及所述关联关系构建关系异构图,所述关系异构图是以节点数据以及节点数据之间的关联关系为边构建得到的;根据各个医生节点数据从所述关系异构图中提取多个社区C={C 1,C 2,...,C k};获取各个社区C i的多个特征的多个特征数据,所述多个特征包括节点数据数量特征、社区密度特征和/或平均医疗金额特征;根据各个社区C i的多个特征的多个特征数据,计算各个社区C i异常检测系数;根据各个社区C i异常检测系数,判断所述社区中出现异常的患者节点数据;及输出所述出现异常的患者节点数据至用户终端。
- 如权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令还可被至少一个处理器所执行,以使所述至少一个处理器执行如下步骤:根据所述节点数据获取多个实体对应的多个实体特征,所述实体特征包括多个患者的多个患者特征、多个医生的多个医生特征和多个药房的多个药房特征;根据所述多个患者特征和多个医生特征构建患者节点数据与医生节点数据之间的第一二分图;根据所述多个患者特征和多个药房特征构建患者节点数据与药房节点数据之间的第二二分图;根据所述多个医生特征和多个药房特征构建医生节点数据与药房之间的第三二分图。
- 如权利要求16所述的计算机可读存储介质,其中,所述计算机可读指令还可被至少一个处理器所执行,以使所述至少一个处理器执行如下步骤:提取所述关系异构图的多个聚合特征,所述多个聚合特征包括不同实体之间的度、权重以及熵比;及根据所述多个聚合特征确定异常实体;所述异常实体对应有多个异常类型,所述多个异常类型包括个人水平异常、关系水平异常以及医疗行为异常。
- 如权利要求17所述的计算机可读存储介质,其中,所述计算机可读指令还可被至 少一个处理器所执行,以使所述至少一个处理器执行如下步骤:通过PageRank算法计算所述第一二分图中每个患者节点数据的度、出度和入度与每个医生节点数据的度、出度和入度;根据每个患者节点数据的度、出度和入度与各个医生节点数据的度、出度和入度,在患者节点数据与医生节点数据之间通过有向边连接,得到所述患者和所述医生的关系有向图,并根据所述关系有向图生成有向图矩阵;对所述有向图矩阵进行二维矩阵相乘并反复迭代改变权值,得到PageRank值;根据所述PageRank值确定个人水平异常。
- 如权利要求15所述的计算机可读存储介质,其中,所述计算机可读指令还可被至少一个处理器所执行,以使所述至少一个处理器执行如下步骤:对所述关系异构图中的多个患者节点数据进行聚类,得到多个聚类,每个聚类对应一个聚类中心;根据多个医生节点数据从所述多个聚类中心中进行多次提取,每次根据一个医生节点数据提取一个聚类中心,根据每次提取的一个聚类中心从所述关系异构图建立一个社区,得到多个社区,其中,所述多个社区中的每个社区都为紧密社区,所述紧密社区为与其他社区有交集的社区;及从所述多个社区中提取一组社区C={C 1,C 2,...,C k}。
- 如权利要求19所述的计算机可读存储介质,其中,所述紧密社区是由多个强联通结构构成的,所述多个强联通结构为多个顶点构成一个封闭环结构;所述多个顶点为所述医生节点数据对应的多个患者节点数据,所述封闭环内的每个医生节点数据和患者节点数据之间都存在边。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910871381.3A CN110766557B (zh) | 2019-09-16 | 2019-09-16 | 基于图分析的数据异常解析方法、系统和计算机设备 |
CN201910871381.3 | 2019-09-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021051938A1 true WO2021051938A1 (zh) | 2021-03-25 |
Family
ID=69330045
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/099235 WO2021051938A1 (zh) | 2019-09-16 | 2020-06-30 | 基于图分析的数据异常解析方法、系统和计算机设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110766557B (zh) |
WO (1) | WO2021051938A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117764759A (zh) * | 2023-12-29 | 2024-03-26 | 北京度友信息技术有限公司 | 主体集合的挖掘方法、装置、设备和介质 |
CN118378201A (zh) * | 2024-06-25 | 2024-07-23 | 浙江大学 | 一种医保群体异常行为检测方法和装置 |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110766557B (zh) * | 2019-09-16 | 2024-03-19 | 平安科技(深圳)有限公司 | 基于图分析的数据异常解析方法、系统和计算机设备 |
CN111427926B (zh) * | 2020-03-23 | 2023-02-03 | 平安医疗健康管理股份有限公司 | 异常医保群组识别方法、装置、计算机设备及存储介质 |
CN111428198B (zh) * | 2020-03-23 | 2023-02-07 | 平安医疗健康管理股份有限公司 | 一种确定异常医疗清单的方法、装置、设备和存储介质 |
CN112837078B (zh) * | 2021-03-03 | 2023-11-03 | 万商云集(成都)科技股份有限公司 | 一种基于集群的用户异常行为检测方法 |
CN113239240A (zh) * | 2021-03-15 | 2021-08-10 | 北京大学 | 医保违规对象发现方法及装置 |
CN113361093A (zh) * | 2021-06-01 | 2021-09-07 | 宿迁学院产业技术研究院 | 基于声子烟花算法的芒德棋盘收发装置控制方法及系统 |
CN113553446B (zh) * | 2021-07-28 | 2022-05-24 | 厦门国际银行股份有限公司 | 一种基于异构图解构的金融反欺诈方法及装置 |
CN113657549B (zh) * | 2021-08-31 | 2024-09-27 | 深圳平安医疗健康科技服务有限公司 | 医疗数据审核方法、装置、设备以及存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145587A (zh) * | 2017-05-11 | 2017-09-08 | 成都四方伟业软件股份有限公司 | 一种基于大数据挖掘的医保反欺诈系统 |
CN109903169A (zh) * | 2019-01-23 | 2019-06-18 | 平安科技(深圳)有限公司 | 基于图计算技术的理赔反欺诈方法、装置、设备及存储介质 |
CN109919780A (zh) * | 2019-01-23 | 2019-06-21 | 平安科技(深圳)有限公司 | 基于图计算技术的理赔反欺诈方法、装置、设备及存储介质 |
CN110766557A (zh) * | 2019-09-16 | 2020-02-07 | 平安科技(深圳)有限公司 | 基于图分析的数据异常解析方法、系统和计算机设备 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080172257A1 (en) * | 2007-01-12 | 2008-07-17 | Bisker James H | Health Insurance Fraud Detection Using Social Network Analytics |
US8612169B2 (en) * | 2011-04-26 | 2013-12-17 | International Business Machines Corporation | Method and system for detecting anomalies in a bipartite graph |
US20140278479A1 (en) * | 2013-03-15 | 2014-09-18 | Palantir Technologies, Inc. | Fraud detection in healthcare |
EP3327727A3 (en) * | 2016-11-23 | 2018-08-22 | Optum, Inc. | Data processing systems and methods implementing improved analytics platform and networked information systems |
CN107153713B (zh) * | 2017-05-27 | 2018-02-23 | 合肥工业大学 | 社交网络中基于节点间相似性的重叠社区检测方法及系统 |
-
2019
- 2019-09-16 CN CN201910871381.3A patent/CN110766557B/zh active Active
-
2020
- 2020-06-30 WO PCT/CN2020/099235 patent/WO2021051938A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145587A (zh) * | 2017-05-11 | 2017-09-08 | 成都四方伟业软件股份有限公司 | 一种基于大数据挖掘的医保反欺诈系统 |
CN109903169A (zh) * | 2019-01-23 | 2019-06-18 | 平安科技(深圳)有限公司 | 基于图计算技术的理赔反欺诈方法、装置、设备及存储介质 |
CN109919780A (zh) * | 2019-01-23 | 2019-06-21 | 平安科技(深圳)有限公司 | 基于图计算技术的理赔反欺诈方法、装置、设备及存储介质 |
CN110766557A (zh) * | 2019-09-16 | 2020-02-07 | 平安科技(深圳)有限公司 | 基于图分析的数据异常解析方法、系统和计算机设备 |
Non-Patent Citations (2)
Title |
---|
LIU JUAN, BIER ERIC, WILSON AARON, GUERRA-GOMEZ JOHN ALEXIS, HONDA TOMONORI, SRICHARAN KUMAR, GILPIN LEILANI, DAVIES DANIEL: "Graph analysis for detecting fraud, waste, and abuse in health-care data", AI MAGAZINE., AMERICAN ASSOCIATION FOR ARTIFICIAL INTELLIGENCE, LA CANADA, CA, vol. 37, no. 2, 22 June 2016 (2016-06-22), CA, pages 33 - 46, XP055793055, ISSN: 0738-4602 * |
SEO JIWON; MENDELEVITCH OFER: "Identifying frauds and anomalies in Medicare-B dataset", 2017 39TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), IEEE, 11 July 2017 (2017-07-11), pages 3664 - 3667, XP033152827, DOI: 10.1109/EMBC.2017.8037652 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117764759A (zh) * | 2023-12-29 | 2024-03-26 | 北京度友信息技术有限公司 | 主体集合的挖掘方法、装置、设备和介质 |
CN118378201A (zh) * | 2024-06-25 | 2024-07-23 | 浙江大学 | 一种医保群体异常行为检测方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN110766557A (zh) | 2020-02-07 |
CN110766557B (zh) | 2024-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021051938A1 (zh) | 基于图分析的数据异常解析方法、系统和计算机设备 | |
US20190311377A1 (en) | Social security fraud behaviors identification method, device, apparatus and computer-readable storage medium | |
Austin et al. | Optimal full matching for survival outcomes: a method that merits more widespread use | |
WO2020253467A1 (zh) | 一种基于区块链系统的数据处理方法、系统及装置 | |
US20190005115A1 (en) | Tda enhanced nearest neighbors | |
CN108596770B (zh) | 基于离群值分析的医疗保险欺诈检测装置及方法 | |
US20180349993A1 (en) | Systems and methods for increasing efficiency in the detection of identity-based fraud indicators | |
US20140297642A1 (en) | Systems and methods for mapping patient data from mobile devices for treatment assistance | |
JP7106743B2 (ja) | グラフ計算技術に基づく請求不正防止方法、装置、機器及び記憶媒体 | |
CN112991079B (zh) | 多卡共现就医欺诈行为检测方法、系统、云端及介质 | |
JP6892454B2 (ja) | データの秘匿性−実用性間のトレードオフを算出するためのシステムおよび方法 | |
US20170169174A1 (en) | Detection of fraud or abuse | |
WO2021135449A1 (zh) | 基于深度强化学习的数据分类方法、装置、设备及介质 | |
US20140129256A1 (en) | System and method for identifying healthcare fraud | |
Gao et al. | An efficient fraud identification method combining manifold learning and outliers detection in mobile healthcare services | |
US20190354993A1 (en) | System and method for generation of case-based data for training machine learning classifiers | |
Anbarasi et al. | Fraud detection using outlier predictor in health insurance data | |
US20210174367A1 (en) | System and method including accurate scoring and response | |
Sun et al. | Patient cluster divergence based healthcare insurance fraudster detection | |
CN112416979B (zh) | 基于地理位置的反欺诈方法、装置、设备及存储介质 | |
WO2022060454A1 (en) | Self learning machine learning pipeline for enabling identity verification | |
US20230052225A1 (en) | Methods and computer systems for automated event detection based on machine learning | |
CN111951116A (zh) | 基于无监督孤立点检测的医保反欺诈监测分析方法和系统 | |
Fan et al. | Smart contract scams detection with topological data analysis on account interaction | |
Peng et al. | Fraud detection of medical insurance employing outlier analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20866670 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20866670 Country of ref document: EP Kind code of ref document: A1 |