CN115859199A

CN115859199A - Medical insurance fraud detection method and embedded vector generation method, device and medium thereof

Info

Publication number: CN115859199A
Application number: CN202211253233.3A
Authority: CN
Inventors: 林开标; 洪彬升; 张杨; 卢萍
Original assignee: Xiamen University of Technology
Current assignee: Xiamen University of Technology
Priority date: 2023-01-17
Filing date: 2023-01-17
Publication date: 2023-03-28

Abstract

The embodiment of the invention provides a medical insurance fraud detection method and an embedded vector generation method, device and medium thereof, relating to the technical field of medical insurance. The embedded vector generation method comprises the steps of S1, obtaining medical insurance data, and constructing a medical insurance heterogeneous image G according to the medical insurance data. And S2, acquiring a characteristic diagram according to the medical insurance data and the medical insurance heterogeneous image. S3, according to medical insuranceAnd (4) heterogeneous graph obtaining topological graph. And S4, acquiring a semantic graph according to the medical insurance heterogeneous graph. S5, respectively inputting the feature map, the topological graph and the semantic graph into a single-graph convolutional neural network model to obtain feature space nodes to be embedded into Z _F Topology space node embedding Z _T And semantic space node embedding Z _S . S6, combining the feature graph, the topological graph and the semantic graph in pairs, respectively inputting the combined feature graph, topological graph and semantic graph into a public convolutional neural network model sharing parameters, and acquiring topological feature nodes embedded in Z _CTF Feature semantics node embedding Z _CFS And topology semantic node embedding Z _CTS . And S7, embedding and fusing the nodes obtained in the steps S5 and S6 to obtain a final embedded vector representation.

Description

Medical insurance fraud detection method and embedded vector generation method, device and medium thereof

Technical Field

The invention relates to the technical field of medical insurance, in particular to a medical insurance fraud detection method, an embedded vector generation method, a device and a medium thereof.

Background

With the development of artificial intelligence, abnormal samples in the medical insurance data are identified through the artificial intelligence. I.e., medical insurance fraud detection, has become a reality. The existing detection methods aiming at medical insurance fraud mainly comprise a method based on outlier detection, a method based on supervised learning, a method based on unsupervised clustering and a method based on a graph neural network.

The detection of medical insurance fraud by a computer program generally requires the following steps: firstly, medical insurance data are processed to obtain medical insurance data heterogeneous graphs, then data extraction is carried out according to the heterogeneous graphs to obtain embedded vector representations of all nodes, and finally classification is carried out on the embedded vector representations through a classification model, so that abnormal samples in the embedded features are identified, and medical insurance fraud detection is completed.

However, the information extracted by the embedded vector representation in the prior art is relatively simple, so that the nodes cannot be accurately expressed, abnormal medical insurance data cannot be accurately screened out in the subsequent classification step, and the accuracy of medical insurance fraud detection is reduced.

In view of the above, the applicant has specifically proposed the present application after studying the existing technology.

Disclosure of Invention

The invention provides a medical insurance fraud detection method and an embedded vector generation method, device and medium thereof, so as to improve at least one of the technical problems.

The first aspect,

The embodiment of the invention provides an embedded vector generation method, which comprises steps S1 to S7.

S1, acquiring medical insurance data, and constructing a medical insurance heterogeneous image G according to the medical insurance data. Wherein G = (a, X), and in the formula, a is an adjacent matrix and X is a feature matrix.

And S2, acquiring a feature map based on the similarity between the node features according to the medical insurance data and the medical insurance heterogeneous map.

And S3, acquiring a topological graph based on the topological relation among the nodes according to the medical insurance heterogeneous graph.

And S4, extracting meta-paths of different node relations according to the medical insurance heterogeneous graph to obtain a semantic graph.

S5, respectively inputting the feature map, the topological graph and the semantic graph into a single-graph convolutional neural network model to obtain feature space nodes to be embedded into Z _F Topology space node embedding Z _T And semantic space node embedding Z _S 。

S6, combining the feature map, the topological map and the semantic map in pairs, inputting the combined feature map, topological map and semantic map into a public convolutional neural network model sharing parameters respectively, and acquiring topological feature nodes embedded Z _CTF Feature semantic node embedding Z _CFS And topology semantic node embedding Z _CTS 。

S7, embedding Z according to characteristic space nodes _F Topology space node embedding Z _T Semantic space node embedding Z _S Topology feature node embedding Z _CTF Feature semantics node embedding Z _CFS And topology semantic node embedding Z _CTS And performing fusion through an attention mechanism to obtain a final embedded vector representation Z of the medical insurance heterogeneous image.

The second aspect,

The embodiment of the invention provides an embedded vector generating device, which comprises:

and the heterogeneous image acquisition module is used for acquiring the medical insurance data and constructing a medical insurance heterogeneous image G according to the medical insurance data. Wherein G = (a, X), and in the formula, a is an adjacent matrix and X is a feature matrix.

And the characteristic diagram acquisition module is used for acquiring a characteristic diagram based on the similarity between the node characteristics according to the medical insurance data and the medical insurance heterogeneous diagram.

And the topological graph acquisition module is used for acquiring a topological graph based on the topological relation among the nodes according to the medical insurance heterogeneous graph.

And the semantic graph acquisition module is used for extracting meta-paths of different node relationships according to the medical insurance heterogeneous graph to acquire the semantic graph.

The single-graph convolution module is used for respectively inputting the feature graph, the topological graph and the semantic graph into the single-graph convolution neural network model to obtain feature space nodes embedded into Z _F Topology space node embedding Z _T And semantic space node embedding Z _S 。

The shared convolution module is used for combining the feature graph, the topological graph and the semantic graph in pairs and respectively inputting the combined feature graph, the topological graph and the semantic graph into a public convolution neural network model with shared parameters to obtain topological feature nodes and embed the topological feature nodes into Z _CTF Feature semantics node embedding Z _CFS And topology semantic node embedding Z _CTS 。

A fusion module for embedding Z according to the feature space nodes _F Topology space node embedding Z _T Semantic space node embedding Z _S Topology feature node embedding Z _CTF Feature semantics node embedding Z _CFS And topology semantic node embedding Z _CTS And fusing by an attention mechanism to obtain a final embedded vector representation Z of the medical insurance heterogeneous image.

The third aspect,

An embodiment of the present invention provides a computer-readable storage medium, which includes a stored computer program, where, when the computer program runs, a device in which the computer-readable storage medium is located is controlled to execute the method for generating an embedded vector according to any one of the paragraphs of the first aspect.

The fourth aspect,

The embodiment of the invention provides a medical insurance fraud detection method which comprises a step A1 and a step A2.

A1, obtaining a final embedding vector Z according to the embedding vector generating method of any one of the paragraphs of the first aspect.

And A2, classifying each node through a pre-trained classification model according to the final embedded vector Z so as to identify medical insurance fraud information in the medical insurance data.

By adopting the technical scheme, the invention can obtain the following technical effects:

the embedded vector generation method provided by the embodiment of the invention can simultaneously learn the difference of the same type of graphs in different spaces and the commonality of different types of graphs in the same space, thereby mining more and richer information in medical insurance data. The final embedded vector obtained by the embedded vector generation method of the embodiment of the invention can obtain more accurate classification effect when being classified.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a flow chart diagram of an embedding vector generation method.

FIG. 2 is an example of a heterogeneous graph, meta-path, network schema, and meta-graph in medical insurance data.

Fig. 3 is a network configuration diagram of an embedding vector generation method.

Figure 4 is a schematic diagram of medical insurance heterogeneous primitive paths.

Fig. 5 is a network configuration diagram of common meta-path sampling.

FIG. 6 is a network architecture diagram of a single graph convolutional neural network model.

Fig. 7 is a network structure diagram of a common convolutional neural network model.

Fig. 8 is a schematic configuration diagram of an embedded vector generation apparatus.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment,

Referring to fig. 1 to 7, a first embodiment of the present invention provides an embedded vector generating method, which can be executed by an embedded vector generating apparatus. In particular, by one or more processors in the embedded vector generation device to implement steps S1 to S7.

As shown in fig. 2, the top left corner in fig. 2 is a heterogeneous map according to the medical insurance data, and the top right corner in fig. 2 is a meta-path in the heterogeneous map, and the bottom in fig. 2 is a network mode of the medical insurance data. The construction of the heterogeneous map according to the medical insurance data is not repeated for the present invention in the prior art.

It is to be understood that the embedded vector generation device may be an electronic device with computing capabilities, such as a laptop computer, a desktop computer, a server, a smartphone, or a tablet computer.

And S2, acquiring a feature map based on the similarity between the node features according to the medical insurance data and the medical insurance heterogeneous map. On the basis of the foregoing embodiment, in an optional embodiment of the present invention, step S2 specifically includes step S21 to step S23.

And S21, calculating a first similarity matrix S according to the medical insurance data.

S22, according to the first similarity matrix S, selecting the first k similar nodes of each node as the field of the node, and acquiring a second similarity matrix A _f 。

S23, converting the second similarity matrix A _f Combining with the characteristic matrix X to obtain a characteristic diagram G _f . Wherein G is _f ＝(A _f ,X)。

Specifically, in order to fully obtain useful feature information in the feature space, firstly, the original medical insurance data is composed by adopting a k-neighbor method, and the feature of the medical insurance data node is generated into a feature structure diagram based on k neighbors.

In addition to the given medical insurance heterogeneous graph G = (a, X), in order to capture the underlying structure information of the node features, the present embodiment constructs a KNN graph G based on the similarity between the node features _f ＝(A _f X) in which A _f Similarity matrix of KNN picture.

Specifically, in this embodiment, the similarity matrix S ∈ R of the medical insurance sample is calculated ^N×N . Calculating a sample similarity matrix S epsilon R ^N×N The method of (2) is various, and the embodiment calculates the similarity matrix S by using a cosine similarity method based on a vector space. The cosine similarity method measures the eigenvectors x of the node i and the node j _i And x _j The cosine value of the included angle between the two samples is measured, and the calculation mode is as follows:

after the similarity matrix S is calculated, the first k similar nodes are selected for each node to serve as the field of the node, and therefore the similarity matrix A of the KNN is obtained _f . Finally, the similarity matrix A is divided into _f Combined with features X to form KNN feature map G _f ＝(A _f ,X)。

And S3, acquiring a topological graph based on the topological relation among the nodes according to the medical insurance heterogeneous graph. On the basis of the foregoing embodiment, in an optional embodiment of the present invention, step S3 specifically includes step S31 to step S32.

S31, extracting the relationship between patient nodes and department nodes according to the medical insurance heterogeneous graph, and constructing an adjacency matrix A _t 。

S32, combining the adjacency matrix At with the characteristic matrix X to obtain a topological graph G _t . Wherein G is _t ＝(A _t ,X)。

Specifically, in order to keep the relationship between nodes in the original graph, the topological relationship between the nodes in the medical insurance data is extracted to construct a node topological graph. Preferably, the relationship of the patient nodes and department nodes is extracted, and the adjacency matrix At is constructed. The matrix At and the feature X are combined into a topology Gt = (At, X).

Specifically, in order to fully utilize semantic relationships between different types of nodes, the embodiment extracts meta-paths of different node relationships from a heterogeneous information network generated by medical insurance data, and constructs a semantic graph with rich node semantic information between the meta-paths by using a common sampling method.

On the basis of the foregoing embodiment, in an optional embodiment of the present invention, step S4 specifically includes step S41 to step S44.

And S41, extracting different types of meta-paths according to the medical insurance heterogeneous image. Wherein the meta-path includes patient P in the patient department _PKP Patient time patient P _PTP And patient drug patient P _PMP 。

And S42, generating corresponding semantic adjacency matrixes according to different meta paths. Wherein the semantic adjacency matrix comprises a department semantic adjacency matrix A _PKP Temporal semantic adjacency matrix a _PTP And drug adjacency matrix A _PMP 。

S43, sampling common paths among multiple meta paths contained in the same patient node in sequence, acquiring multi-path meta paths of the patient node, and generating a corresponding multi-path semantic adjacency matrix A _s 。

S44, enabling multipath semantic adjacency matrix A _s Combining the sum with the feature matrix X to obtain a semantic graph G _s . Wherein G is _s ＝(A _s ,X)。

Specifically, as shown in fig. 3, in order to capture semantic information hidden in the medical insurance heterogeneous image G, the embodiment samples and extracts different types of meta-paths P from the medical insurance heterogeneous image G _PKP ,P _PTP ,P _PMP Meta path P _PKP Is represented as follows:

. Because different meta-paths contain semantic information of different types of nodes, a corresponding semantic adjacency matrix A is generated based on different meta-paths _PKP ,A _PTP ,A _PMP 。

As shown in FIG. 4, the embodiment obtains two different meta-paths P between the patient nodes _PKP And P _PTP Semantic information of the join, we are on the semantic adjacency matrix A _PKP And A _PTP Carrying out common meta-path sampling and generating corresponding meta-path semantic adjacency matrix A _PKTP And a multipath P containing information of two meta paths simultaneously _PKTP ＝(P _PKP ,P _PTP ). Then, in order to obtain more complicated semantic information between the patient nodes, the embodiment further pairs the meta-path semantic adjacency matrix A _PMP And a semantic adjacency matrix A simultaneously containing information of two meta paths _PKTP Common path sampling is carried out to obtain multi-path semantic information with three element paths connected between patient nodes at the same time, and a corresponding multi-path semantic adjacency matrix A is generated _PKTMP And multiple paths P _PKTMP ＝(P _PKP ,P _PTP ,P _PMP )。

The semantic adjacency matrix A with multiple element path connection _s ＝A _PKTMP Combined with features X into a semantic graph G _s ＝(A _s ,X)。

In this embodiment, the common path sampling method is used to perform common path sampling on different types of meta-paths sampled from medical insurance data, so as to obtain semantic information that a plurality of different meta-paths are connected between patient nodes at the same time, and generate a semantic graph with complex semantic information.

Specifically, after obtaining the feature map, the topology map, and the semantic map, in order to learn the features of the node propagation of the different types of maps in different spaces, the embodiment inputs the generated topology map, feature map, and semantic map into two specific convolution modules, learns the different types of maps, and extracts the specific embedding of the maps in the spaces.

Preferably, the single graph convolutional neural network model is a two-layer graph convolutional network. On the basis of the foregoing embodiment, in an optional embodiment of the present invention, step S5 specifically includes step S51 to step S53.

S51, converting the characteristic diagram G _f Inputting a single-graph convolutional neural network model, the first of which _l The layer output is embedded into

Recording the embedding of the last layer of output as characteristic space node embedding Z _F . Wherein,

in the formula, reLu is a non-linear activation function->

For a network topology dependent quantity, or>

Is characteristic of a level l-1 node>

And (4) rolling the weight matrix of the ith layer of the network for the graph.

S52, converting the topological graph G _t Inputting a single-graph convolutional neural network model, the first of which _l The layer output is embedded into

Recording the embedding of the last layer of output as the embedding of the topological space node Z _T . Wherein,

wherein ReLu is a nonlinear kinaseFunction of alive and->

For a network topology dependent quantity, or>

Is characteristic of a level l-1 node>

S53, converting the semantic graph G _s Inputting the single-image convolution neural network model, the first image convolution neural network model _l The layer output is embedded into

Recording the embedding of the last layer output as semantic space node embedding Z _S . Wherein,

in which ReLu is a nonlinear activation function: (R;)>

For a network topology dependent quantity, or>

Characteristic of a node of layer l-1, in conjunction with a system of nodes>

Specifically, in order to learn the embedding of specific node information in a specific space, the embodiment uses a topological graph G _t ＝(A _t X), feature map G _f ＝(A _f X) and semantic graph G _s ＝(A _s X) input into a specific space to learn embedding.

Learning node embedding in topological space Z _T The process is shown in figure 6.

In particular toTo say that, the topological graph G _t And inputting the characteristic X into the topological space, and connecting two layers of GCN to learn the embedding of the topological graph in the topological space. We will first _l The layer output is embedded into

The calculation formula of (a) is as follows:

in the formula,

is based on a quantity, which is dependent on the network topology>

A characteristic representing an initiating node, based on the status of the node>

Is the first _l Characteristic of a level 1 node, in->

Is the graph network at GCN No _l The weight matrix of the layer, reLu, is a nonlinear activation function in an artificial neural network.

Then, the output of the last layer is embedded

Is represented by Z _T ，/>

Namely the topological graph G _t ＝(A _t X) learning node embedding with specific topology information in a specific topology space.

Similarly, the feature map G is calculated in the above manner _f ＝(A _f X) input into a feature space to learn node embedding with specific feature information

The calculation formula of (a) is as follows:

/>

and, generating the semantic graph G _s ＝(A _s X) learning node embedding with specific semantic information input into the semantic space

The calculation formula of (a) is as follows:

s6, combining the feature graph, the topological graph and the semantic graph in pairs, respectively inputting the combined feature graph, topological graph and semantic graph into a public convolutional neural network model sharing parameters, and acquiring topological feature nodes embedded in Z _CTF Feature semantic node embedding Z _CFS And topology semantic node embedding Z _CTS 。

Specifically, considering that the same characteristics may exist between different graphs, the embodiment adds a common convolution module sharing parameters to learn characteristics between graphs, and learns that different graphs learn different embeddings in different spaces.

Preferably, the common convolutional neural network model is two double-layer graph convolutional networks with parameter sharing strategies. Wherein the weight matrix of the two double-layer graph convolutional networks is shared. On the basis of the foregoing embodiment, in an optional embodiment of the present invention, step S6 specifically includes step S61 to step S63.

S61, converting the topological graph G _t And feature map G _f Respectively inputting two double-layer graph convolution networks of a public convolution neural network model to obtain a topology embedding Z _CT And feature embedding Z _CF . Then, the topology is embedded into Z _CT And feature embedding Z _CF Averaging and obtaining topological characteristic node embedding Z _CTF 。

S62, converting the characteristic diagram G _f And semantic graph G _s Respectively inputting two double-layer graph convolution networks of a public convolution neural network model to obtain characteristic embedding Z _CF And semantic embedding Z _CS . Then, embedding features into Z _CF And semantic embedding Z _CS Averaging and obtaining characteristic semantic node embedding Z _CFS 。

S63, converting the topological graph G _t And semantic graph G _s Respectively inputting two double-layer graph convolution networks of a public convolution neural network model to obtain a topology embedding Z _CT And semantic embedding Z _CS . Then, the topology is embedded into Z _CT And semantic embedding Z _CS Averaging and obtaining topology semantic node embedding Z _CTS 。

Specifically, in step S5, only the topology map G is assigned _t ＝(A _t X), feature map G _f ＝(A _f X) and semantic graph G _s ＝(A _s X) input into a specific space to learn embedding, the same feature information may exist between different graphs.

There may be many commonalities in learned embeddings in topology, feature and semantic spaces, and downstream tasks may be related to information common in these spaces. In order to extract the specific embedding learned by the nodes in different spaces and the common information between different spaces, the present embodiment learns the features common to the two spaces by sharing parameters through one GCN shared by the parameters, as shown in fig. 7.

The embodiment maps the topological graph G _t ＝(A _t X) input to a common GCN module learning the embedding of nodes and embedding the output of the last layer

Is represented by Z _CT CalculatingThe process is as follows:

in the formula,

represents a characteristic of the initiating node in the topology space, <' > or>

Is a characteristic of a level l-1 node, is present>

Is that the graph network is in the public GCN module _l The shared weight matrix of the layers.

In the topology map G _t ＝(A _t X) embedding of learning nodes in a common GCN module, and simultaneously inputting a feature graph G _f ＝(A _f X) input to the same common GCN module. And embedding the output of the last layer

Is represented by Z _CF The calculation process is as follows:

/>

in the formula,

represents a characteristic of the initiating node in a characteristic space, and>

is characteristic of the l-1 level node.

In the process of learning embedding of the characteristic diagram in the GCN, the weight matrix of the ith layer and the weight matrix of the topological diagram in the ith layer in a common GCN module are shared, and the shared weight matrix is

Inputting topology graph G in common GCN module _t ＝(A _t X) and feature map G _f ＝(A _f X) two different graphs, outputting two different spatial embedding Z _CT And Z _CF In this common GCN of the input topology and feature maps, we find two embedding Zs _CT And Z _CF Averaging to obtain the embedding Z learned in the GCN module _CTF The calculation process is as follows:

Z _CTF ＝Mean(Z _CT ,Z _CF )

similarly, we apply the same method as above to map G _t ＝(A _t X) and a semantic graph G _s ＝(A _s X) input to a common GCN module, learning two different spatial embeddings Z by sharing a weight matrix _CT And Z _CS And finally embedding Z _CTS 。

And, the feature map G _f ＝(A _f X) and semantic graph G _s ＝(A _s X) input to a common GCN module, learning two different spatial embeddings Z by sharing a weight matrix _CF And Z _CS And finally embedding Z _CFS 。

Preferably, as shown in FIG. 3, the topology space nodes are embedded in Z _T And topology embedding Z _CT Space, feature space node embedding Z _F And feature embedding Z _CF And semantic space node embedding Z _S And semantic embedding Z _CS Parallax constraint is used between the images, so that the difference of the same type of images in different spaces can be effectively learned in the transmission process, and the independence is ensured. Topology embedding Z _CT And feature embedding Z _CF Space, feature embedding Z _CF And semantic embedding Z _CS And topological embedding Z _CT And semantic embedding Z _CS The consistency constraint is used between the two, so that different types of graphs can be effectively learned in the same space in the propagation processThe commonality of (2) and (3) are enhanced.

S7, embedding Z according to characteristic space nodes _F Topology space node embedding Z _T Semantic space node embedding Z _S Topology feature node embedding Z _CTF Feature semantics node embedding Z _CFS And topology semantic node embedding Z _CTS And fusing by an attention mechanism to obtain a final embedded vector Z representation of the medical insurance heterogeneous image.

Specifically, since step S5 and step S6 are performed by different modules, the results obtained by all the modules are finally weighted and merged together to obtain the final embedded representation. The downstream classification task and anomaly detection task can be performed.

In this embodiment, learned embeddings are adaptively assigned different sized weights using an attention mechanism, and all embedded information is fused.

By learning the embedding in the specific map volume module and parameter-sharing map volume module into which the topology map, feature map and semantic map are input, we now have three specific embedding Z's obtained in the specific GCN map volume module _T 、Z _F And Z _S There are also three embedded Z's obtained in the GCN graph convolution module sharing the parameters _CTF 、Z _CTS And Z _CFS 。

Considering that there may be some correlation between the learned embeddings of these different graphs, this embodiment uses an attention mechanism α for adaptively assigning weights _Attention To learn the relative importance of these embeddings, the calculation formula is as follows:

a _Attenton ＝(a _T， a _F ，a _S ，a _CTF， a _CTS ，a _CFS )＝att(Z _T ，Z _F ，Z _S ，Z _CTF ，Z _CTS ，Z _CFS )

in the formula, a _T ，a _F ，as，a _CTF ，a _CTS ，a _CFS Is corresponding to the insertion Z _T ，Z _F ，Z _S ，Z _CTF ，Z _CTS ，Z _CFS Attention weight value of (1).

To fuse information between multiple graphs, six are embedded into Z _T ，Z _F ，Z _S ，Z _CTF ，Z _CTS ，Z _CFS And the corresponding attention weights are combined into a final embedding Z, as follows:

Z _dep ＝a _T ·Z _T +a _F ·Z _F +a _S ·Z _S

Z _com ＝a _CTF ·Z _CTF +a _CTS ·Z _CTS +a _CFS ·Z _CFS

Z＝Z _dep +Z _com

according to the embodiment of the invention, the topological structure, the characteristic information and the semantic information of the sample node are extracted from the medical insurance data, the information is generated into the corresponding topological graph, the characteristic graph and the semantic graph, then the embedded information of the graphs in different spaces is learned through two different GCN modules, and the embedding is fused together through self-adaptive attention weight distribution to obtain the final embedded representation. Wherein the final embedded representation can be used for downstream classification tasks and anomaly detection tasks to achieve detection of medical insurance fraud.

Specifically, in order to obtain rich node information and side information, the present embodiment generates a topology map, a feature map, and a semantic map corresponding to a complex topology structure, feature information, and semantic information in a heterogeneous map. The deeper level of information within the graph and the same information that exists between different types of graphs are learned using a particular convolution module.

The embodiment of the invention effectively spreads learning in different spaces without using the commonality and the difference between the pictures through two different transfer modules, and weights are distributed and fused for the learned embedding through an attention mechanism. Weights are adaptively assigned to different embedding and are fused, and the larger the weight is, the more important the corresponding embedding is. We can analyze which spatial learned embeddings have the greatest impact on the final anomaly detection result.

More and richer information in medical insurance data is mined by simultaneously learning the difference of the same type of graphs in different spaces and learning the commonality of the different types of graphs in the same space. The final embedded vector obtained by the embedded vector generation method of the embodiment of the invention can obtain more accurate classification effect when being classified.

Example II,

and the heterogeneous image acquisition module 1 is used for acquiring medical insurance data and constructing a medical insurance heterogeneous image G according to the medical insurance data. Wherein G = (a, X), and in the formula, a is an adjacent matrix and X is a feature matrix.

And the characteristic diagram acquisition module 2 is used for acquiring a characteristic diagram based on the similarity between the node characteristics according to the medical insurance data and the medical insurance heterogeneous diagram.

And the topological graph acquisition module 3 is used for acquiring a topological graph based on the topological relation among the nodes according to the medical insurance heterogeneous graph.

And the semantic graph acquisition module 4 is used for extracting meta-paths of different node relationships according to the medical insurance heterogeneous graph to acquire the semantic graph.

The single-graph convolution module 5 is used for respectively inputting the feature graph, the topological graph and the semantic graph into the single-graph convolution neural network model to obtain the feature space node embedding Z _F Topology space node embedding Z _T And semantic space node embedding Z _S 。

A shared convolution module 6, which is used for combining the characteristic graph, the topological graph and the semantic graph in pairs and then respectively inputting the combined characteristic graph, the topological graph and the semantic graph into a public convolution neural network model with shared parameters to obtain topological characteristic nodes and embed the topological characteristic nodes into Z _CTF Feature semantics node embedding Z _CFS And topology semantic node embedding Z _CTS 。

A fusion module 7 for embedding Z according to the feature space nodes _F Topology space node embedding Z _T Semantic space node embedding Z _S Topology feature node embedding Z _CTF Feature semantic node embedding Z _CFS And topology semantic node embedding Z _CTS Fusing through an attention mechanism to obtain final embedding of the medical insurance heterogeneous imageVector Z represents.

The embedded vector generation device provided by the embodiment of the invention can simultaneously learn the difference of the same type of graphs in different spaces and the commonality of different types of graphs in the same space, thereby excavating more and richer information in medical insurance data. The final embedded vector obtained by the embedded vector generating device of the embodiment of the invention can obtain more accurate classification effect when being classified.

On the basis of the foregoing embodiment, in an optional embodiment of the present invention, the feature map obtaining module 2 specifically includes:

and the first similarity matrix calculating unit is used for calculating a first similarity matrix S according to the medical insurance data.

A second similarity matrix calculation unit for selecting the first k similar nodes of each node as the node field according to the first similarity matrix S to obtain a second similarity matrix A _f 。

A feature map combining unit for combining the second similarity matrix A _f Combining with the feature matrix X to obtain a feature map G _f . Wherein G is _f ＝(A _f ,X)。

On the basis of the foregoing embodiment, in an optional embodiment of the present invention, the topological graph obtaining module 3 specifically includes:

an adjacency matrix construction unit for extracting the relationship between the patient node and the department node according to the medical insurance heterogeneous graph to construct an adjacency matrix A _t 。

A topological graph combination unit for combining the adjacent matrix At and the characteristic matrix X to obtain a topological graph G _t . Wherein G is _t ＝(A _t ,X)。

On the basis of the foregoing embodiment, in an optional embodiment of the present invention, the semantic graph obtaining module 4 specifically includes:

and the meta-path extraction unit is used for extracting meta-paths of different types according to the medical insurance heterogeneous image. Wherein the meta-path includes patient department patient P _PKP Patient time patient P _PTP And patient drug patient P _PMP 。

And the semantic adjacency matrix generating unit is used for generating corresponding semantic adjacency matrixes according to different meta paths. Wherein the semantic adjacency matrix comprises a department semantic adjacency matrix A _PKP Temporal semantic adjacency matrix a _PTP And drug adjacency matrix A _PMP 。

A common path sampling unit for sampling common paths between multiple element paths contained in the same patient node in sequence, acquiring multi-path element paths of the patient node, and generating a corresponding multi-path semantic adjacency matrix A _s 。

A semantic graph combining unit for combining the multipath semantic adjacency matrix A _s Combining the sum with the feature matrix X to obtain a semantic graph G _s . Wherein G is _s ＝(A _s ,X)。

Based on the above embodiments, in an optional embodiment of the present invention, the single graph convolutional neural network model is a two-layer graph convolutional network.

On the basis of the foregoing embodiment, in an optional embodiment of the present invention, the single-graph convolution module 5 specifically includes:

a spatial node embedding and acquiring unit for embedding the feature map G _f Inputting a single-graph convolutional neural network model, embedding the I-th layer output in the single-graph convolutional neural network model into

Recording the embedding of the last layer of output as characteristic space node embedding Z _F . Wherein it is present>

In the formula, reLu is a non-linear activation function->

For a network topology dependent quantity, or>

Is characteristic of a level l-1 node>

The topology space node is embedded into the acquisition unit and used for embedding the topology graph G into the acquisition unit _t Inputting a single-graph convolutional neural network model, embedding the I-th layer output in the single-graph convolutional neural network model into

Recording the embedding of the last layer of output as the embedding of the topological space node Z _T . Wherein +>

In the formula, reLu is a non-linear activation function->

For a network topology dependent quantity, or>

Is characteristic of a level l-1 node>

A semantic space node embedding and acquiring unit for embedding the semantic graph G _s Inputting a single-graph convolution neural network model, wherein the I-th layer output in the single-graph convolution neural network model is embedded into

Recording the embedding of the last layer output as semantic space node embedding Z _s . Wherein +>

In the formula, reLu is a non-linear activation function->

For a network topology dependent quantity, or>

Is characteristic of a level l-1 node>

And the weight matrix of the ith layer of the graph convolution network is obtained.

On the basis of the above embodiments, in an optional embodiment of the present invention, the common convolutional neural network model is two double-layer graph convolution networks with a parameter sharing policy. Wherein the weight matrix of the two double-layer graph convolutional networks is shared.

On the basis of the foregoing embodiment, in an optional embodiment of the present invention, the shared convolution module 6 specifically includes:

the topological characteristic node is embedded into the acquisition unit and is used for embedding the topological graph G into the acquisition unit _t And feature map G _f Respectively inputting two double-layer graph convolution networks of the public convolution neural network model to obtain topology embedding Z _CT And feature embedding Z _CF . Then, the topology is embedded into Z _CT And feature embedding Z _CF Averaging and obtaining topological characteristic node embedding Z _CTF 。

An acquisition unit for acquiring the feature map G _f And semantic graph G _s Respectively inputting two double-layer graph convolution networks of a public convolution neural network model to obtain a characteristic embedding Z _CF And semantic embedding Z _CS . Then, embedding features into Z _CF And semantic embedding Z _CS Averaging and obtaining characteristic semantic node embedding Z _CFS 。

An acquisition unit for acquiring the topology map G _t And semantic graph G _s Respectively inputting two double-layer graph convolution networks of a public convolution neural network model to obtain a topology embedding Z _CT And semantic embedding Z _CS . Then, the topology is embedded into Z _CT And semantic embedding Z _CS Averaging and obtaining topology semantic node embedding Z _CTS 。

On the basis of the above-described embodiment, in an alternative embodiment of the invention, the attention mechanism α is _Attention Comprises the following steps:

a _Attention ＝(a _T ，a _F ，a _S ，a _CTF ，a _CTS ，a _CFS )＝att(Z _T ，Z _F ，Z _S ，Z _CTF ，Z _CTS ，Z _CFS )

Z _dep ＝a _T ·Z _T +a _F ·Z _F +a _S ·Z _S

Z _com ＝a _CTF ·Z _CTF +a _CTS ·Z _CTS +a _CFS ·Z _CFS

Z＝Z _dep +Z _com

wherein Z is the final embedding vector, a _T ，a _F ，a _S ，a _CTF ，a _CTS And alpha _CFS Respectively embedding Z for topological space nodes _T Feature space node embedding Z _F Semantic space node embedding Z _S Topology feature node embedding Z _CTF Feature semantics node embedding Z _CFS And topology semantic node embedding Z _CTS Attention weight value of (1).

Example III,

An embodiment of the present invention provides a computer-readable storage medium, which includes a stored computer program, where when the computer program runs, a device on which the computer-readable storage medium is located is controlled to execute the method for generating an embedded vector according to any one of the paragraphs of the embodiment.

Example four,

The embodiment of the invention provides a medical insurance fraud detection method which can be executed by medical insurance fraud detection equipment. In particular, the steps A1 and A2 are performed by one or more processors in the medical insurance fraud detection apparatus.

A1, according to the embedding vector generation method described in any paragraph of the embodiment, a final embedding vector Z is obtained.

Specifically, the present invention does not limit the specific algorithm of the classification model. The embedded vector is obtained according to the embedded vector generation method described in the first embodiment through the training data with the label. And then inputting the data into a classification model for training. The specific training steps of the classification model are the prior art, and the present invention is not repeated herein.

It should be noted that the medical insurance fraud detection device may be an electronic device with computing capability, such as a portable notebook computer, a desktop computer, a server, a smart phone, or a tablet computer.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The word "if," as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection," depending on context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

In the embodiments, the references to "first \ second" are merely to distinguish similar objects and do not represent a specific ordering for the objects, and it is to be understood that "first \ second" may be interchanged with a specific order or sequence, where permitted. It should be understood that "first \ second" distinct objects may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced in sequences other than those illustrated or described herein.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An embedded vector generation method, comprising:

acquiring medical insurance data, and constructing a medical insurance heterogeneous graph G according to the medical insurance data; wherein G = (a, X), where a is an adjacency matrix and X is a feature matrix;

acquiring a feature map based on the similarity between the node features according to the medical insurance data and the medical insurance heterogeneous map;

acquiring a topological graph based on a topological relation among nodes according to the medical insurance heterogeneous graph;

extracting meta-paths of different node relations according to the medical insurance heterogeneous graph to obtain a semantic graph;

inputting the feature map, the topological map and the semantic map into a single-map convolution respectivelyIn the neural network model, the characteristic space node embedding Z is obtained _F Topology space node embedding Z _T And semantic space node embedding Z _S ；

Combining the feature graph, the topological graph and the semantic graph pairwise, respectively inputting the combined feature graph, topological graph and semantic graph into a public convolutional neural network model sharing parameters, and acquiring topological feature node embedding Z _CTF Feature semantics node embedding Z _CFS And topology semantic node embedding Z _CTS ；

Embedding Z according to the characteristic space node _F Said topological space node embedding Z _T The semantic space node embedding Z _S The topological feature node embedding Z _CTF The feature semantic node embedding Z _CFS And said topological semantic node embedding Z _CTS And fusing through an attention mechanism to obtain a final embedded vector representation Z of the medical insurance heterogeneous image.

2. The method of claim 1, wherein the single graph convolutional neural network model is a two-layer graph convolutional network;

respectively inputting the feature map, the topological map and the semantic map into a single-map convolutional neural network model to obtain feature space nodes embedded Z _F Topology space node embedding Z _T And semantic space node embedding Z _S The method specifically comprises the following steps:

the feature map G _f Inputting the single-graph convolution neural network model, wherein the l-th layer output in the single-graph convolution neural network model is embedded into

Recording the embedding of the last layer of output as the embedding Z of the feature space node _F (ii) a Wherein,

in the formula, reLu is a non-linear activation function->

For a network topology dependent quantity, or>

Is characteristic of a level l-1 node>

The weight matrix of the ith layer of the graph convolution network is taken as the weight matrix of the ith layer of the graph convolution network;

the topological graph G is combined _t Inputting the single-graph convolution neural network model, wherein the l-th layer output in the single-graph convolution neural network model is embedded into

Recording the embedding of the last layer of output as the topological space node embedding Z _T (ii) a Wherein,

in the formula, reLu is a non-linear activation function->

For a network topology dependent quantity, or>

Is the characteristic of the l-1 level node, wt _t ^(l) The weight matrix of the first layer of the network is convolved with a graph;

the semantic graph G is processed _s Inputting the single-graph convolution neural network model, wherein the l-th layer output in the single-graph convolution neural network model is embedded into

Recording the embedding of the last layer output as the embedding Z of the semantic space node _S (ii) a Wherein,

in the formula, reLu is a non-linear activation function->

For a network topology dependent quantity, or>

Is a characteristic of the l-1 level node, W _s ^(l) And (4) rolling the weight matrix of the ith layer of the network for the graph.

3. The method of claim 1, wherein the common convolutional neural network model is two double-layer graph convolutional networks with parameter sharing strategy; wherein the weight matrix of the two double-layer graph convolutional networks is shared;

combining the feature graph, the topological graph and the semantic graph pairwise, respectively inputting the combined feature graph, topological graph and semantic graph into a public convolutional neural network model sharing parameters, and acquiring topological feature node embedding Z _CTF Feature semantics node embedding Z _CFS And topology semantic node embedding Z _CTS The method specifically comprises the following steps:

the topological graph G is combined _t And the characteristic diagram G _f Respectively inputting two double-layer graph convolution networks of the public convolution neural network model to obtain a topology embedding Z _CT And feature embedding Z _CF (ii) a Then, the topology is embedded into Z _CT And said feature embedding Z _CF Averaging to obtain the topology characteristic node embedding Z _CTF ；

The feature map G _f And the semantic graph G _s Respectively inputting two double-layer graph convolution networks of the public convolution neural network model to obtain a characteristic embedding Z _CF And semantic embedding Z _CS (ii) a Then, the feature is embedded in Z _CF And said semantic embedding Z _CS Averaging and obtaining the characteristic semantic node embedding Z _CFS ；

The topological graph G is processed _t And the semantic graph G _s Two double-layer graphs respectively input into the public convolutional neural network modelConvolutional network, obtaining topology embedding Z _CT And semantic embedding Z _CS (ii) a Then, the topology is embedded into Z _CT And the semantic embedding Z _CS Averaging to obtain the topology semantic node embedding Z _CTS 。

4. The embedded vector generation method of any one of claims 1 to 3, wherein the attention mechanism a _Attsntion Comprises the following steps:

a _Attsntion ＝(a _T ，a _F ，a _S ，a _CTF ，a _CTS ，a _CFS )＝att(Z _T ，Z _F ，Z _S ，Z _CTF ，Z _CTS ，Z _CFS )

Z _dep ＝a _T ·Z _T +a _F ·Z _F +a _S ·Z _S

Z _com ＝a _CTF ·Z _CTF +a _CTS ·Z _CTS +a _CFS ·Z _CFS

Z＝Z _dep +Z _com

wherein Z is the final embedding vector, a _T ，a _F ，a _S ，a _CTF ，a _CTS And a _CFS Respectively embedding Z for topological space nodes _T Feature space node embedding Z _F Semantic space node embedding Z _S Topology feature node embedding Z _CTF Feature semantics node embedding Z _CFS And topology semantic node embedding Z _CTS Attention weight value of (1).

5. The method for generating the embedded vector according to any one of claims 1 to 3, wherein the obtaining of the feature map based on the similarity between the node features according to the medical insurance data and the medical insurance heterogeneous map specifically includes:

calculating a first similarity matrix S according to the medical insurance data;

according to the first similarity matrix S, the first k similar nodes of each node are selected as the field of the node, and the field is obtainedTaking a second similarity matrix A _f ；

The second similarity matrix A is divided into _f Combining with the feature matrix X to obtain the feature map G _f (ii) a Wherein G is _f ＝(A _f ，X)。

6. The method for generating the embedded vector according to any one of claims 1 to 3, wherein the obtaining of the topological graph based on the topological relation between the nodes according to the medical insurance heterogeneous graph specifically includes:

extracting the relationship between patient nodes and department nodes according to the medical insurance heterogeneous graph, and constructing an adjacency matrix A _t ；

Combining the adjacent matrix At and the characteristic matrix X to obtain the topological graph G _t (ii) a Wherein G is _t ＝(A _t ，X)。

7. The method for generating the embedded vector according to any one of claims 1 to 3, wherein the extracting meta-paths of different node relationships according to the medical insurance heterogeneous graph to obtain a semantic graph specifically comprises:

extracting different types of meta-paths according to the medical insurance heterogeneous image; wherein the meta path includes a patient department patient P _PKP Patient time patient P _PTP And patient drug patient P _PMP ；

Generating corresponding semantic adjacency matrixes according to different meta paths; wherein the semantic adjacency matrix comprises a department semantic adjacency matrix A _PKP Temporal semantic adjacency matrix a _PTP And drug adjacency matrix A _PMP ；

Sampling common paths among multiple element paths contained in the same patient node in sequence to obtain multi-path element paths of the patient node and generate a corresponding multi-path semantic adjacency matrix A _s ；

The multipath semantic adjacency matrix A _s Combining the sum with the characteristic matrix X to obtain the semantic graph G _s (ii) a Wherein G is _s ＝(A _s ，X)。

8. An embedded vector generation apparatus, comprising:

the medical insurance heterogeneous image acquisition module is used for acquiring medical insurance data and constructing a medical insurance heterogeneous image G according to the medical insurance data; wherein G = (a, X), where a is an adjacency matrix and X is a feature matrix;

the characteristic diagram acquisition module is used for acquiring a characteristic diagram based on the similarity between the node characteristics according to the medical insurance data and the medical insurance heterogeneous diagram;

the topological graph acquisition module is used for acquiring a topological graph based on topological relation among nodes according to the medical insurance heterogeneous graph;

the semantic graph acquisition module is used for extracting meta-paths of different node relationships according to the medical insurance heterogeneous graph to acquire a semantic graph;

the single-graph convolution module is used for respectively inputting the feature graph, the topological graph and the semantic graph into a single-graph convolution neural network model to obtain feature space node embedding Z _F Topology space node embedding Z _T And semantic space node embedding Z _S ；

A shared convolution module used for combining the feature map, the topological map and the semantic map in pairs and inputting the combined feature map, topological map and semantic map into a public convolution neural network model sharing parameters to obtain topological feature node embedding Z _CTF Feature semantics node embedding Z _CFS And topology semantic node embedding Z _CTS ；

A fusion module for embedding Z according to the feature space node _F Said topological space node embedding Z _T The semantic space node embedding Z _S The topological feature node embedding Z _CTF The feature semantic node embedding Z _CFS And said topological semantic node embedding Z _CTS And fusing through an attention mechanism to obtain a final embedded vector representation Z of the medical insurance heterogeneous image.

9. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the embedded vector generation method of any one of claims 1 to 7.

10. A method for detecting medical insurance fraud, comprising:

the embedding vector generation method according to any one of claims 1 to 7, obtaining a final embedding vector Z;

and classifying each node through a pre-trained classification model according to the final embedded vector Z so as to identify medical insurance fraud information in the medical insurance data.