CN110852895A

CN110852895A - Medical security cheat-insurance behavior discovery method based on knowledge graph

Info

Publication number: CN110852895A
Application number: CN201911108926.1A
Authority: CN
Inventors: 卢洪满; 郭骁昌; 江秋; 姚历强; 潘才色; 肖永懿
Original assignee: Fujian Yilianzhong Baoruitong Information Technology Co Ltd; Yi Lianzhong Information Technology Co Ltd
Current assignee: Fujian Yilianzhong Baoruitong Information Technology Co Ltd; Yi Lianzhong Information Technology Co Ltd
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2020-02-28

Abstract

The invention provides a medical insurance cheating insurance behavior discovery method based on knowledge graph, which comprises the following steps: constructing corresponding domain ontology libraries aiming at different domains, and constructing a global ontology library through mapping to complete the construction of a knowledge graph mode layer; extracting information from the knowledge graph pattern layer; establishing a knowledge graph entity relationship according to the entity type and the ontology to complete the construction of the knowledge graph; according to the acquired knowledge graph, people are taken as the center, time and space are blended into the knowledge graph, and a hospitalizing behavior track knowledge graph integrating the space, the time and the three-dimensional relationship of the people is constructed; clustering similar personnel, recommending suspected cheat insurance participators and excavating potential cheat insurance personnel according to the produced doctor-seeking behavior track knowledge graph. The medical security cheating-insurance behavior discovery method based on the knowledge graph solves the problem that the traditional medical security cannot effectively mine potential cheating-insurance behaviors, and provides a new path for discovering suspected cheating-insurance behaviors.

Description

Medical security cheat-insurance behavior discovery method based on knowledge graph

Technical Field

The invention relates to the field of medical security, in particular to a medical security cheating and insurance behavior discovery method based on a knowledge graph.

Background

Medical insurance funds are important funds concerning the life and health of people, and with the continuous improvement of the medical insurance system in China, management service modes such as card holding, medical treatment seeking, instant settlement, mobile payment and the like are advanced all the time, and the acquisition sense of the people who participate in the insurance is continuously enhanced. However, in the process of continuous and deep innovation of medical insurance system, the coverage of insurance participation is gradually enlarged, the fund supervision difficulty is gradually increased, one of the main problems is that medical insurance fraud cases frequently occur, the medical insurance fraud cases have the characteristics of large case, group and concealment, and the safety of medical insurance funds is seriously damaged.

In the traditional medical insurance fund checking method based on the medical knowledge rule base, the security participants who cheat the medical insurance fund are difficult to find out effectively, for example, part of cheat security personnel check with doctors, the single record of the medical prescription is reasonable, the medical rule check of the medical insurance system can be passed effectively, but the whole medical behavior is false and fake, and the part of cheat security personnel is difficult to find out.

Disclosure of Invention

In order to solve the problems mentioned in the background art, the invention provides a medical insurance cheating and insurance behavior discovery method based on knowledge graph, which comprises the following steps:

s10, constructing corresponding domain ontology libraries aiming at different domains, and constructing a global ontology library through mapping to complete the construction of a knowledge graph mode layer;

s20, extracting information of the knowledge graph pattern layer;

s30, constructing a knowledge graph entity relation according to the entity type and the ontology, and completing construction of a knowledge graph;

s40, according to the knowledge graph obtained in the step S30, the person is used as the center, time and space are blended into the knowledge graph, and the hospitalizing behavior track knowledge graph integrating the space, the time and the three-dimensional relationship of the person is constructed;

s50, clustering similar people, recommending suspected cheat insurance participation personnel, and excavating potential cheat insurance personnel according to the generated doctor-seeking behavior track knowledge graph.

Further, the body in the step S10 includes: the knowledge map entity types and the entity relationship types, wherein the entity types comprise a reference security person, a reference security unit, a visit time, a doctor receiving a visit, a visit hospital, a household registration place, a medicine and the like; the entity relationship types comprise affiliated security units, family membership, visited hospitals, affiliated doctors, resident places and visiting time.

Further, the information extraction in step S20 includes entity extraction, relationship extraction, and attribute extraction, that is, the basic information, the household information, and the visit information of the insured person are extracted from the medical insurance database and the public security household information base.

Further, the S30 further includes:

and S31, eliminating concept ambiguity, eliminating redundancy and wrong concepts through knowledge combination, entity alignment and entity disambiguation, and ensuring the quality of the knowledge map.

Further, the step S40 includes:

s41, constructing a simultaneous hospitalizing relationship, combing the hospitalizing hospital and the hospitalizing time relationship from the dimensionality of the insured persons, and excavating a hospitalizing behavior relationship between the insured persons;

s42, constructing a diagnosis relationship map of the same doctor at the same time, combing the diagnosis time and the relation of the doctors to be diagnosed from the dimensionality of the ginseng insurance persons, and excavating the medical behavior relation between the ginseng insurance persons and the ginseng insurance persons;

s43, finding out the relation between the ginseng and the insured person in the relation space and setting weight for the relation;

s44, traversing all the insured persons according to the steps S41, S42 and S43, and constructing a map of the new relationship of medical treatment behaviors with the insured persons as entities and weights;

and S45, deleting redundant entities and entity relations according to the map of the new relations of the hospitalizing behaviors constructed in the step S44.

Further, in the step S43, finding out the relationship between the insured person and the insured person in the relationship space through the a-star algorithm, and setting a weight for the relationship; the a-algorithm formula is as follows:

f(n)＝g(n)+h(n)；

where f (n) is the distance estimate, i.e., weight, from the initial insurer to the target insurer via the insurer n, and g (n) is the actual distance from the initial insurer to the insurer n in the relationship space; h (n) is the estimated distance of the best path from the insurer to the target insurer.

Further, the step S50 of clustering similar people includes finding out people belonging to the same community according to the strength analysis of the relationship weight to form a clustered population map; the method is realized by a clustering algorithm which is as follows:

generating a relationship vector x (i) according to each insured entity, wherein the relationship vector of all insured persons can be expressed as { x (1), …, x (m) }, and the insured persons are clustered into k clusters (c l cluster), and the specific algorithm is described as follows:

randomly selecting k clustering centroids (clustercentroids) as mu 1 and mu 2, and … mu k belongs to Rn;

repeating the process until convergence or over N iterations

For each sample i, calculate the class to which it should belong;

for each class j, recalculating the centroid of the class;

further, the recommended suspected cheat insurance participators specifically include the following steps:

recommending people with similar behavior tracks according to confirmed cheat-insurance personnel in the knowledge graph; the cosine similarity is used for measuring the difference between two individuals by using the cosine value of the included angle between two vectors in the vector space;

measuring similarity values between the vectors by calculating cosine values of included angles of the two vectors; the cosine similarity derivation formula is as follows;

wherein: a and B are relation vectors of the participants and the insured persons in the relation space.

Further, the mining of potential cheaters is specifically as follows:

and (2) reasoning a new relation through a TransE algorithm, expressing the relation by the TransE based on an entity and a relational expression vector, reasoning whether the participant t has cheating and protection behaviors from the suspected cheating and protection person h by using the behavior relation r in the participant h, the behavior relation r and the participant t of each triple instance, and enabling (h + r) to be equal to t as far as possible by adjusting h and r.

The medical insurance cheating insurance behavior discovery method based on the knowledge graph, provided by the invention, introduces the knowledge graph into the field of medical insurance, extracts knowledge through social relations, family relations and visiting behavior relations of security personnel, thereby constructing the relationship graph with suspicious cheating insurance behaviors, describes knowledge resources and carriers thereof by using a visualization technology, and excavates, analyzes, constructs, draws and displays knowledge and mutual relations among the knowledge resources and the carriers. Therefore, a list of the participants who are suspected to cheat the insurance behavior is found and provided for the medical security management organization, and a new way for discovering the suspected cheat the insurance behavior, which is different from the traditional method, is provided for the medical security management organization.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of a medical insurance fraud-guaranteeing behavior discovery method based on knowledge-maps, provided by the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a medical insurance cheating and insurance behavior discovery method based on a knowledge graph, which comprises the following steps:

s10, constructing corresponding domain ontology libraries aiming at different domains, and constructing a global ontology library through mapping to complete the construction of a knowledge graph mode layer; in the step, a knowledge map mode layer is constructed, because the coverage of the medical security knowledge map content is wide, the medical security knowledge map contains a plurality of cross-domain information, and the ontology construction methods of different domains are different, firstly, corresponding domain ontology libraries are constructed aiming at different domains, and then, a global ontology library is constructed through mapping. The ontology comprises knowledge graph entity types and entity relationship types, wherein the entity types comprise a reference insurance person, a reference insurance unit, treatment time, a doctor receiving a doctor, a hospital receiving a doctor, a household registration place, a medicine and the like; the entity relationship types comprise affiliated security units, family membership, visited hospitals, affiliated doctors, resident residences, visiting time and the like;

s20, information extraction, including entity extraction, relationship extraction, attribute extraction and other contents, namely extracting relevant data corresponding to the knowledge map body, such as basic information of the insured person, household registration information, doctor seeing information and the like from the medical insurance database and the public security household registration information base;

s30, constructing a knowledge graph; the method comprises the following steps:

s31, constructing a knowledge graph entity according to the ontology in the S10;

s32, establishing a knowledge graph entity relationship according to the entity type and the ontology; and (4) entity relationship completion processing, wherein two corresponding relationships exist in the relationship, if Zhang three and Li four are in a co-worker relationship, then Li four and Zhang three are also in a co-worker relationship, and when the relation of the knowledge graph is established, two directional relationships are established simultaneously, so that the complexity of the graph is reduced, and the efficiency of establishing the relationship is improved.

S33, the knowledge graph construction result may contain more redundancy and a small amount of error information, and the conceptual ambiguity and the redundancy and the error concept are eliminated through the processes of knowledge combination, entity alignment, entity disambiguation and the like, so that the quality of the knowledge graph is ensured;

and S40, further constructing a hospitalizing behavior track knowledge graph integrating space, time and character three-dimensional relations according to the knowledge graph obtained in the previous step. The method comprises the steps of taking a person as a center, and fusing time and space into a map so as to form a relationship network taking the person as a core;

s41, constructing a simultaneous hospitalizing relationship; the relation between hospital (space) and time (time) is sorted from the dimensionality of the insured persons, and the medical action relation between the insured persons is excavated. Merging according to the relationship of people in the same time and space, and giving weight to the relationship;

and S42, constructing a diagnosis relationship map of the same doctor at the same time. The relation between the time (time) of the visit and the doctor (the doctor's practice point is in the hospital, namely, the spatial relation) is sorted according to the dimension of the insured person, and the medical action relation between the insured person and the insured person is excavated. Merging according to the relationship of people in the same time and space, and giving weight to the relationship;

s43, finding out the relation between the ginseng and the insured person in the relation space through an A-star algorithm and setting weight for the relation; the algorithm a is an effective search method for solving the shortest path, and the formula is represented as:

f(n)＝g(n)+h(n)；

where f (n) is the distance estimate, i.e., weight, from the initial insurer to the target insurer via the insurer n, g (n) is the actual distance in the relationship space from the initial insurer to the insurer n,

h (n) is the estimated distance of the best path from the insurer to the target insurer.

h (n) is selected to ensure that the shortest path (optimal solution) is found, and the key is the selection of the function f (n); we express the distance from the target insured person as d (n), then h (n) is selected roughly as follows:

1. if h (n) < d (n) is the actual distance to the target ginseng-conservation person, in this case, the number of searched persons is large, the relation range is large, the efficiency is low, and the optimal solution can be obtained;

2. if h (n) ═ d (n), i.e. the distance estimate h (n) is equal to the shortest distance, then the search will be performed strictly along the shortest path, where the search efficiency is highest;

3. if h (n) > d (n), the number of searched people is small, the relation range is small, the efficiency is high, but the optimal solution cannot be obtained necessarily.

And S44, traversing all the insured persons according to the algorithm, and constructing a map which takes the insured persons as entities and has new relations of medical treatment behaviors with weights.

And S45, deleting redundant entities and entity relations. The types of entities deleted are: visit time, doctor and hospital; the deleted entity relationship types are: the hospital, the doctor and the time of the doctor. Through the steps, entities in the map are further reduced, only the ginseng insurance people, the ginseng insurance units and the household registration areas are left, the medical treatment behavior relationship is newly added in the entity relationship, namely, the medical treatment is performed simultaneously and the doctor is performed simultaneously, and the social relationship and the medical treatment behavior relationship between the ginseng insurance people are clearer.

S50, clustering similar people, recommending suspected cheat insurance participation personnel, and excavating potential cheat insurance personnel according to the relation space map generated in the previous step.

And S51, clustering similar people. According to the relation rightsAnd (4) performing strong and weak analysis to find out people belonging to the same community to form a clustering population map. The clustering algorithm is implemented as follows: generating a relationship vector x from each of the insured persons entities⁽ⁱ⁾The relationship vector for all references can be expressed as { x }⁽¹⁾,…,x^(m)And (4) clustering the insured persons into k clusters (cluster), wherein a specific algorithm is described as follows:

randomly selecting k cluster centroids (clustercentroids) as mu₁,μ₂,…μ_k∈Rⁿ；

Repeating the following process until convergence or over N iterations

For each sample i, calculate the class to which it should belong

For each class j, recalculating the centroid of the class;

and S52, recommending suspected cheating insurance participators. Recommending people with similar behavior tracks according to confirmed cheat-insurance personnel in the knowledge graph; and the cosine similarity is used for measuring the difference between the two individuals by using the cosine value of the included angle between the two vectors in the vector space.

For two vectors, we consider the two vectors to be more similar if the angle between them is smaller. Measuring similarity values between the vectors by calculating cosine values of included angles of the two vectors; the cosine similarity derivation formula is as follows:

a and B are relation vectors of the participants and the insured persons in the relation space.

And S53, reasoning out new relations and mining potential cheat and insurance groups. Deducing new relations through a TransE algorithm, wherein the TransE is expressed based on an entity and a relational expression vector, and each relation is expressed by a vectorThe behavior relation r in the triple examples (the participator h, the behavior relation r and the participator t) can be regarded as that whether the participator t has cheating and protection behaviors is deduced from the suspected cheating and protection person h, and h and r are randomly selected to enable (h + r) to be equal to t as far as possible. In the embodiment, because the hospitalizing behaviors of the participators are cleaned and combed, a simultaneous hospitalizing relationship and a simultaneous and same doctor hospitalizing relationship are formed, under the specific behavior relationship, the algorithm is actually converted into a suspected deception relationship between the insurer h and the participator t, and if a path exists, the path passes through the specific behavior relationship r₁,r₂…r_n，r_nE.g. set, if

A new relationship exists between the suspected cheating and insurance person h and the participator t, otherwise, the new relationship does not exist.

The following entities and relationships are assumed to exist: entities comprise a suspected cheat guardian A, a suspected cheat guardian B and a suspected cheat guardian C; the relationship is expressed by finding the same doctor to see a doctor at the same time (relationship r: occurrence frequency), and the three groups are: (suspected deceptive insurer a, r1:90, suspected deceptive insurer b), (suspected deceptive insurer b, r2:80, insurer c), wherein r1:90 contains the relationship: at the same time, the doctor A is looked for 50 times (r11:50), and the doctor B is looked for 40 times (r12: 40); r2:80 includes the relationship: the doctor B is looked for 45 times (r21:45) and the doctor C is looked for 35 times (r22: 35). It is now reasoned whether there is a behavioral relationship between the suspected deceptive insured person's nail and the insured person's third party.

Finally finding out a path by continuously adjusting the relations of r, r11, r12, r21 and r22, wherein the path is suspected as a fake policeman → r12:40 → suspected fake policeman B → r21:45 → ginseng policeman C, and the result is n (n is not less than 40) by carrying out aggregate subtraction on r12 and r21, and if n is 0, the suspected fake policeman A and the ginseng policeman C have no new relation; if n >0, a new relationship may be formed;

namely, a new relation r3: n is formed, (suspected cheat safeguard person A, r3: n, participator person C), so that the participator person C has similarity not only with the doctor-seeking behavior of the suspected cheat safeguard person B, but also with the doctor-seeking behavior of the suspected cheat safeguard person A, and the suspicious degree of the cheat safeguard behavior of the participator person C is higher and higher along with the similarity of the doctor-seeking behavior generated by more suspected cheat safeguard persons and the participator person C.

The embodiment of the invention provides a medical insurance cheating and insurance behavior discovery method based on knowledge graph, which is different from the traditional technical architecture, firstly, the method reconstructs data of various different structures and sources into structured data in a knowledge extraction mode, thus reducing the complexity of code realization when processing various complex relations and simultaneously improving the retrieval efficiency; and secondly, the visiting behavior of the insured person is used as a breakthrough point of the cheating and insurance behavior, the traditional method of discovering the cheating and insurance behavior by medical knowledge rules is broken, the insured person which is possibly cheated and insured is dug out more deeply under the condition that each prescription is reasonable, and meanwhile, the visiting behavior of the insured person is shown to a medical security management organization in a visual form by an analysis result.

Tests show that by introducing the knowledge graph into the field of medical security, the problem that potential cheating and security behaviors cannot be effectively excavated by adopting medical rules in the traditional medical security is solved, potential medical treatment behavior relations among the participators are continuously improved and supplemented along with reasoning of the knowledge graph, so that the participators increasingly present medical treatment behavior clustering, namely, persons having the same medical treatment behaviors in a team are presented, the probability that 5 persons and more than 5 persons find the same doctor to see a doctor at the same time within a time range of one year is basically very small on the basis of the common knowledge of the persons, and the only possibility that 5 or more social security cards are held by the persons to be brushed is high. Meanwhile, the medical track and behavior of the ginseng security personnel are visually displayed by using the knowledge map, the relationship between the ginseng security personnel and the ginseng security personnel is visually displayed, the suspected cheating security personnel can be conveniently and visually found by medical security institution personnel, and the working efficiency of a medical security management department is improved.

Taking a certain city as an example, about 20 million persons of suspected cheating insurance participants found by using a traditional medical insurance fund checking system based on a medical knowledge rule base are found out, about 1.1 million persons of the suspected cheating insurance participants are found out through knowledge mapping reasoning, wherein about 92% of the suspected cheating insurance participants are found in the traditional system, about 8% of the remaining participants are not found in the traditional system, and the remaining 8% of the participants are also brought into the suspected cheating insurance participants through confirmation of customers, and meanwhile, the 1.1 million persons are intensively monitored.

The embodiment of the invention provides a medical insurance cheating and insurance behavior discovery method based on a knowledge graph, which is characterized in that the knowledge graph is introduced into the field of medical insurance, knowledge extraction is carried out on social relations, family relations and visiting behavior relations of security personnel, so that a relation graph with suspicious cheating and insurance behaviors is constructed, knowledge resources and carriers thereof are described by using a visualization technology, and knowledge and mutual relations among the knowledge resources and the carriers are mined, analyzed, constructed, drawn and displayed. Therefore, a list of the participants who are suspected to cheat the insurance behavior is found and provided for the medical security management organization, and a new way for discovering the suspected cheat the insurance behavior, which is different from the traditional method, is provided for the medical security management organization.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A medical insurance cheating and insurance behavior discovery method based on knowledge graph is characterized in that: the method comprises the following steps:

s20, extracting information of the knowledge graph pattern layer;

2. The method for discovering medical insurance fraud based on knowledge-graph as claimed in claim 1, wherein: the body in the step S10 includes: the knowledge map entity types and the entity relationship types, wherein the entity types comprise a reference security person, a reference security unit, a visit time, a doctor receiving a visit, a visit hospital, a household registration place, a medicine and the like; the entity relationship types comprise affiliated security units, family membership, visited hospitals, affiliated doctors, resident places and visiting time.

3. The method for discovering medical insurance fraud based on knowledge-graph as claimed in claim 1, wherein: the information extraction in step S20 includes entity extraction, relationship extraction, attribute extraction, and the like, that is, the basic information, the household registration information, and the visit information of the insured person are extracted from the medical insurance database and the public security household registration information base.

4. The method for discovering medical insurance fraud based on knowledge-graph as claimed in claim 1, wherein: the S30 further includes:

5. The method for discovering medical insurance fraud based on knowledge-graph as claimed in claim 1, wherein: the step S40 includes:

6. The method of claim 5, wherein the method comprises the following steps: finding out the relation between the insured person and the insured person in the relation space and setting weight for the relation in the S43 through A-algorithm; the a-algorithm formula is as follows:

f(n)＝g(n)+h(n)；

7. The method for discovering medical insurance fraud based on knowledge-graph according to claim 4 or 5, wherein: the step S50 of clustering similar people includes finding out people belonging to the same community according to the strength analysis of the relation weight to form a clustering population map; the method is realized by a clustering algorithm which is as follows:

generating a relationship vector x (i) according to each insured entity, wherein the relationship vector of all insured persons can be expressed as { x (1), …, x (m) }, and the insured persons are clustered into k clusters (cluster), and the specific algorithm is described as follows:

repeating the process until convergence or over N iterations

For each sample i, calculate the class to which it should belong;

for each class j, recalculating the centroid of the class;

8. the method of claim 7, wherein the method comprises the following steps: the recommended suspected cheat insurance participants specifically comprise the following steps:

9. The method of claim 8, wherein the method comprises the following steps: the mining potential cheater and insurance personnel specifically comprise the following steps: