CN111210356B

CN111210356B - Medical insurance data analysis method and device, computer equipment and storage medium

Info

Publication number: CN111210356B
Application number: CN202010038527.9A
Authority: CN
Inventors: 张旭
Original assignee: Ping An Medical and Healthcare Management Co Ltd
Current assignee: Ping An Medical and Healthcare Management Co Ltd
Priority date: 2020-01-14
Filing date: 2020-01-14
Publication date: 2023-03-21
Anticipated expiration: 2040-01-14
Also published as: CN111210356A

Abstract

The application relates to the field of big data, in particular to a medical insurance data analysis method and device, computer equipment and a storage medium. The method comprises the following steps: receiving medical insurance settlement data in a preset period sent by a database, wherein the medical insurance settlement data carries an insurance participation identifier, a settlement date and a settlement address; grouping medical insurance settlement data according to the settlement date to obtain data groups; acquiring an operation thread channel, performing multi-thread parallel processing on the data packet through the operation thread channel to obtain an encounter participation and protection identifier, performing thread concurrent analysis on date data corresponding to the participation and protection identifier in the data packet by adopting the operation thread channel, and screening the encounter participation and protection identifier; classifying the encounter participation and insurance marks according to the settlement addresses to obtain a classification result; and comparing the classification results according to a time axis of the medical insurance settlement data, identifying the illegal participation identification, and sending the illegal participation identification to the medical insurance terminal. By adopting the method, the large-batch medical insurance data can be rapidly subjected to exception investigation.

Description

Medical insurance data analysis method and device, computer equipment and storage medium

Technical Field

The application relates to the technical field of data cleaning, in particular to a medical insurance data analysis method and device, computer equipment and a storage medium.

Background

After medical settlement data of insured personnel is obtained, the server needs to extract the content of the medical settlement data and verify the data. Before the settlement data is extracted and checked, the computer needs to identify the abnormal settlement data in the medical insurance settlement data, and then the computer eliminates the identified abnormal data from a large amount of settlement data, however, the number of the insurance participants corresponding to the medical insurance settlement data is ten thousand, the computer needs to spend a large amount of computing time to perform abnormal investigation, the operation efficiency is low, and when a medical vendor utilizes a medical insurance card batch to purchase medicines at a low price, the computer cannot determine the user identification corresponding to the abnormal medical insurance card, so that the abnormal data cannot be extracted, and the abnormal data cannot be classified and analyzed.

Disclosure of Invention

In view of the above, it is necessary to provide a medical insurance data analysis method, device, computer device and storage medium capable of rapidly performing exception checking on large quantities of medical insurance data.

A method of medical insurance data analysis, the method comprising:

receiving medical insurance settlement data in a preset period sent by a database, wherein the medical insurance settlement data carries an insurance participation identifier, a settlement date and a settlement address;

grouping the medical insurance settlement data according to the settlement date to obtain data groups;

acquiring an operation thread channel, and performing multi-thread parallel processing on the data packet through the operation thread channel to obtain an encounter participation and protection identifier, wherein the operation thread channel is adopted to perform thread concurrent analysis on date data corresponding to the participation and protection identifier in the data packet, and the encounter participation and protection identifier is screened out;

classifying the meeting participation protection identification processed in a multi-thread mode according to the settlement address to obtain a classification result;

and comparing the classification results according to the time axis of the medical insurance settlement data, identifying the illegal participation identification, and sending the illegal participation identification to the medical insurance terminal.

In one embodiment, the performing, by the operation thread channel, multi-thread parallel processing on the data packet to obtain an encounter join guarantee identifier includes:

distributing the data packets to the operation thread channel, and generating a packet queue corresponding to the operation thread channel;

storing a buffer area array with a fixed size at the same address according to the medical insurance settlement data as a shared space of an operation thread channel, and initializing a counting signal;

creating an entry thread and an operation thread in the operation thread channel, updating a set counting signal by the operation thread channel when data is written into a buffer area by the entry thread, and determining the synchronization between the entry thread and the operation thread through the counting signal;

and storing the data packets into a buffer area array according to the packet queue in a data prefetching mode by adopting the input thread, wherein the operation thread performs operation processing on the data packets written into the buffer area array to obtain the encounter participation identifier.

In one embodiment, the performing, by the operation thread channel, the multi-thread parallel processing on the data packet to obtain the encounter join guarantee identifier includes:

acquiring date perspective parameters corresponding to the participation identifiers in the data packets, and constructing a first matrix corresponding to the date perspective parameters;

acquiring date perspective parameters of the participation identification in a preset sequence according to a preset rule to construct a second matrix, wherein the number of rows and columns of the second matrix is not more than the number of rows and columns of the first matrix;

calculating the product of the transposed matrix corresponding to the second matrix and the first matrix in a multithread manner to obtain a third matrix, wherein the third matrix is used for expressing the number of times of meeting of the preset sequence parametres and all the parametres in the medical insurance settlement data;

and acquiring meeting participation identifiers of different ginseng protectors with the meeting times larger than a preset meeting threshold value in the third matrix to obtain meeting participation identifiers.

In one embodiment, the grouping the medical insurance settlement data according to the settlement date to obtain a data group includes:

analyzing and extracting the medical insurance settlement data to obtain a settlement statement;

checking abnormal values in the settlement statement and correspondingly deleting the abnormal values in the settlement statement;

determining medical insurance settlement data parameters most related to the variables of the missing values in the settlement list, and substituting the medical insurance settlement data parameters into the variables of the missing values to obtain the cleaned settlement list;

and grouping the cleaned settlement statement according to the settlement date to obtain a data group corresponding to the settlement date.

determining a matrix layout according to the participation identification and the settlement address, and obtaining a correlation coefficient matrix according to the matrix layout and the address perspective parameter of the medical insurance settlement data;

calculating the correlation coefficient between the participation identification according to the correlation coefficient matrix;

and classifying different insured persons with the correlation coefficient larger than a preset threshold value to obtain data groups.

In one embodiment, the comparing the classification result according to the time axis of the medical insurance settlement data to identify an illegal insurance participation identifier includes:

constructing a time axis according to the medical insurance settlement data, and dividing the time axis into a plurality of time periods;

mapping settlement moments corresponding to the encounter participation identifications of the classification results on the time axis;

and analyzing the meeting participation identification in each time period on the time axis to obtain the violation participation identification corresponding to the violation participation person.

A medical insurance data analysis apparatus, the apparatus comprising:

the medical insurance data receiving module is used for receiving medical insurance settlement data in a preset period sent by a database, and the medical insurance settlement data carries an insurance participation identifier, a settlement date and a settlement address;

the data grouping module is used for grouping the medical insurance settlement data according to the settlement date to obtain data groups;

the data processing module is used for acquiring an operation thread channel, and performing multi-thread parallel processing on the data packets through the operation thread channel to obtain an encounter participation and protection identifier, wherein the operation thread channel is adopted to perform thread concurrent analysis on date data corresponding to the participation and protection identifier in the data packets, and the encounter participation and protection identifier is screened out;

the data classification module is used for classifying the encounter participation identification processed in a multithread mode according to the settlement address to obtain a classification result;

and the violation mark identification module is used for comparing the classification results according to the time axis of the medical insurance settlement data, identifying violation participation marks and sending the violation participation marks to the medical insurance terminal.

In one embodiment, the data processing module includes:

the queue distribution unit is used for distributing the data packets to the operation thread channel and generating a packet queue corresponding to the operation thread channel;

the shared space setting unit is used for setting a buffer area array with a fixed size as a shared space of an operation thread channel according to the medical insurance settlement data stored in the same address and initializing a counting signal;

the thread creating unit is used for creating an input thread and an operation thread in the operation thread channel, when data are written into a buffer area in the input thread, the operation thread channel updates a set counting signal, and the synchronization between the input thread and the operation thread is determined through the counting signal;

and the thread running unit is used for storing the data packets into the buffer area array according to the packet queue in a data prefetching mode by adopting the input thread, and the operation thread is used for performing operation processing on the data packets written into the buffer area array to obtain the encounter participation and protection identifier.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when executing the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

According to the medical insurance data analysis method, the medical insurance data analysis device, the computer equipment and the storage medium, multithread parallel processing is conducted on medical insurance settlement data through the operation thread channel, meeting insurance-participating marks are obtained, the collected meeting insurance-participating marks are classified according to settlement addresses, classification results are obtained, violation insurance-violating marks are determined according to the classification results through the time axis, the medical insurance settlement data are analyzed in a multi-level mode, idle resources of the server are reasonably used in the process of hard content calculation, abnormal analysis on the medical insurance settlement data is achieved rapidly, and therefore the efficiency of the model is improved.

Drawings

FIG. 1 is a diagram illustrating an exemplary embodiment of a method for analyzing medical insurance data;

FIG. 2 is a schematic flow chart diagram illustrating a method for analyzing medical insurance data according to one embodiment;

FIG. 3 is a schematic flow chart illustrating the steps of analyzing the medical insurance data according to one embodiment;

FIG. 4 is a schematic flow chart diagram illustrating the steps of the medical insurance data analysis in another embodiment;

FIG. 5 is a schematic flow chart showing the steps of cleaning and grouping medical insurance settlement data in another embodiment;

FIG. 6 is a block diagram of a medical insurance data analysis apparatus according to an embodiment;

FIG. 7 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The medical insurance data analysis method provided by the application can be applied to the application environment shown in fig. 1. Wherein, the medical insurance terminal 102 communicates with the server 104 through the network. The server 104 receives medical insurance settlement data in a preset period sent by a database, wherein the database can be arranged on the medical insurance terminal 102 or on the server 104. The medical insurance settlement data carries the insurance participation identification, the settlement date and the settlement address. The server 104 groups the medical insurance settlement data according to the settlement date to obtain data groups. The server 104 obtains the operation thread channel, and performs multi-thread parallel processing on the data packets through the operation thread channel to obtain the encounter participation protection identifier. The server 104 performs thread concurrent analysis on the date data corresponding to the participation identifier in the data packet by using an operation thread channel, and screens out the encounter participation identifier. The server 104 classifies the meeting insurance-participating identifiers processed in the multithreading according to the settlement address to obtain a classification result. The server 104 compares the classification results according to the time axis of the medical insurance settlement data, identifies the illegal participation identification, and sends the illegal participation identification to the medical insurance terminal 102. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable smart devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a method for analyzing medical insurance data is provided, which is described by taking the method as an example of being applied to the server in fig. 1, and includes the following steps:

step 202, receiving medical insurance settlement data in a preset period sent by the database, wherein the medical insurance settlement data carries an insurance participation identifier, a settlement date and a settlement address.

The medical insurance settlement data is medical insurance information of the insured person corresponding to the insured identifier, and can carry the insured identifier, the settlement date and the settlement address. The preset period is a preset verification period, and may be 1 month or 1 year. And the server receives medical insurance settlement data in a preset period sent by the database.

And step 204, grouping the medical insurance settlement data according to the settlement date to obtain a data group.

And the server groups the medical insurance settlement data according to the settlement date to obtain data groups. The server can group the medical insurance settlement data according to the data memory and the settlement date of the medical insurance settlement data to obtain data groups; the server can also group the medical insurance settlement data according to the number of the operation thread channels and the settlement date to obtain data groups. The server can also generate a data pivot table for representing the medicine purchasing behavior of the ginseng and insurance persons according to the ginseng and insurance identification and the medicine purchasing information corresponding to the settlement date; and the server groups the medical insurance settlement data according to the settlement date in the data pivot table according to the data memory of the medical insurance settlement data to obtain data groups. The data perspective table can take the reference security mark as a vertical axis, the medicine purchasing date in the medicine purchasing information as a horizontal axis, and whether the reference security person purchases the medicine on the same day is filled in the corresponding position of the table to be used as a perspective parameter, if the patient drives the medicine on the same day, the perspective parameter is 1, otherwise, the perspective parameter is 0.

And step 206, acquiring an operation thread channel, and performing multi-thread parallel processing on the data packet through the operation thread channel to obtain the meeting participation identification, wherein the operation thread channel is adopted to perform thread concurrent analysis on date data corresponding to the participation identification in the data packet, so as to screen out the meeting participation identification.

The operation thread channel is a schedulable execution unit smaller than a process in the scheduling process of the operating system and is the minimum unit of the program execution flow. Each operation thread channel can execute an independent logic path, and a tedious or very time-consuming task can be decomposed into a plurality of threads. The threads of the same data packet are executed asynchronously, and the threads of different data packets can be executed synchronously. And the server acquires an operation thread channel, and performs multi-thread parallel processing on the data packets through the operation thread channel to obtain the encounter participation protection identifier. The server can associate the data packets with the operation thread channels, the operation thread channels are respectively associated with CPU cores of the server, and then the server can continuously split the data packets into threads corresponding to the medical insurance settlement data. And the server performs concurrent processing on the threads of the same data group, performs exception analysis on the data in each thread in sequence, and screens out the encounter participation identification in the data group. The encounter insurance mark refers to at least two insurance marks which appear simultaneously in a preset date, wherein the preset date can be a certain specific date set by the server or a settlement date extracted from medical insurance settlement data. The data calculated by each thread is date data obtained by summarizing medical insurance settlement data according to the settlement date. The date data can be date perspective parameters for representing the drug purchasing behavior of the insured person according to the drug purchasing information corresponding to the settlement date and the insured mark. The server can also adopt the first thread to firstly carry out standardization processing on the data in the data packet, and then adopt the second thread to carry out abnormal value analysis on the standardized data packet. All threads in the same group are executed concurrently and asynchronously, and different operation thread channels are executed in parallel. The server may assign the data packets to the arithmetic thread channel and generate a packet queue corresponding to the arithmetic thread channel. And the server calculates the data packets in the packet queue through the operation thread channel and obtains the meeting participation protection identifier. The server can also read and analyze data by using python through the operation thread channel, and calculate the data through numpy to obtain the encounter participation identification.

And step 208, classifying the meeting participation protection identification processed in the multithreading according to the settlement address to obtain a classification result.

And the server classifies the meeting participation identification processed in the multithread according to the settlement address to obtain a classification result. The server can obtain a settlement address corresponding to the participation identifier according to the participation identifier in the encounter participation identifier, and then classify the participation identifier in the encounter participation identifier according to the settlement address so as to obtain a classification result.

And step 210, comparing the classification results according to a time axis of the medical insurance settlement data, identifying the illegal participation identification, and sending the illegal participation identification to the medical insurance terminal.

The server compares the classification results according to the time axis of the medical insurance settlement data, identifies the illegal participation identification, and sends the illegal participation identification to the medical insurance terminal. The server can map the settlement time of different insurers corresponding to the same settlement address to a time axis through a pandas tool, determine the illegal insured identification of the illegal insured person according to the time axis, and send the illegal insured identification to the medical insurance terminal.

According to the medical insurance data analysis method, the medical insurance settlement data is subjected to multi-thread parallel processing through the operation thread channel to obtain the encounter insurance participation identification, the gathered encounter insurance participation identification is classified according to the settlement address to obtain the classification result, the violation insurance participation identification is determined according to the classification result through the time axis, the medical insurance settlement data is analyzed in a multi-level mode, idle resources of the server are reasonably used in the process of hard content calculation, the operation efficiency and the data processing efficiency of the computer are steadily improved, the medical insurance settlement data is quickly analyzed in an abnormal mode, and therefore the efficiency of the model is improved.

In one embodiment, as shown in fig. 3, the obtaining the encounter join-guard identifier by performing multi-thread parallel processing on the data packet through the operation thread channel includes:

step 302 assigns data packets to the arithmetic thread channel and generates a packet queue corresponding to the arithmetic thread channel.

The server distributes the data packets to the operation thread channel and generates a packet queue corresponding to the operation thread channel. The server can obtain an idle operation thread channel, estimate the estimated time of each data packet operated by the operation thread channel, distribute the data packet to the operation thread channel according to the estimated time, and generate a packet queue corresponding to the operation thread channel.

And step 304, storing a buffer area array with a fixed size in the same address according to the medical insurance settlement data as a shared space of the operation thread channel, and initializing a counting signal.

The server stores a buffer area array with a fixed size in the same address as a shared space of an operation thread channel according to the medical insurance settlement data, and initializes a counting signal which is used for synchronization among threads. The shared space is the same address space shared by the logging thread and the operation thread in the operation thread channel.

And step 306, creating an entry thread and an operation thread in the operation thread channel, updating the set counting signal by the operation thread channel when the entry thread writes data into the buffer area, and determining the synchronization between the entry thread and the operation thread through the counting signal.

The server creates an entry thread and an operation thread in the operation thread channel. And in the memory access stage of the logging thread execution algorithm, the logging thread stores the data to be measured, which are uniformly partitioned, into a buffer area array in a data prefetching mode. And the arithmetic thread executes the arithmetic calculation stage, and the arithmetic thread carries out sequential arithmetic processing on the data written into the buffer area array. When the data is written into the buffer area in the logging thread, the operation thread channel updates the set counting signal, and determines the synchronization between the logging thread and the operation thread through the counting signal. When the input thread writes data into the buffer area, the computing thread channel accumulates the computing signals of the input thread; and when the operation thread processes the data acquired from the buffer, the operation thread channel accumulates the calculation signals of the operation thread.

And 308, storing the data packets into a buffer area array according to the packet queue in a data prefetching mode by adopting an input thread, wherein the operation thread performs operation processing on the data packets written into the buffer area array to obtain an encounter participation identifier.

When the data are written into the buffer area in the logging thread, the data written into the buffer area are sequentially processed by the operation thread. And the server stores the data packets into the buffer area array according to the packet queue in a data prefetching mode by adopting an input thread, and the operation thread performs operation processing on the data packets written into the buffer area array to obtain the encounter participation identifier. The input thread inputs the data packet to the buffer area, the post-operation thread extracts the input data packet from the buffer area and performs operation processing on the input data packet to obtain the meeting participation identification.

In the medical insurance data analysis method, the thread-level parallel technology and the storage-level parallel technology are introduced to realize the parallel optimization of the measurement hash algorithm, so that the measurement time is reduced, and the measurement efficiency is improved.

In one embodiment, as shown in fig. 4, the obtaining of the encounter participation protection identifier by performing multi-thread parallel processing on the data packet through the operation thread channel includes the following steps:

step 402, obtaining date perspective parameters corresponding to the participation identifiers in the data packets, and constructing a first matrix corresponding to the date perspective parameters.

And the server generates a data pivot table for representing the drug purchasing behavior of the ginseng security personnel according to the ginseng security identification and the drug purchasing information corresponding to the settlement date. The data perspective table can take the reference security mark as a vertical axis, the medicine purchasing date in the medicine purchasing information as a horizontal axis, and whether the reference security person purchases the medicine on the same day is filled in the corresponding position of the table to be used as a perspective parameter, if the patient drives the medicine on the same day, the perspective parameter is 1, otherwise, the perspective parameter is 0. The server obtains date perspective parameters corresponding to the participation identifiers in the data groups, and constructs a first matrix corresponding to the date perspective parameters. The layout of the first matrix takes the horizontal axis as settlement date and the vertical axis as participation identification.

And step 404, acquiring date perspective parameters of the participation identifiers in the preset sequence according to a preset rule to construct a second matrix, wherein the number of rows and columns of the second matrix is not more than the number of rows and columns of the first matrix.

And the server acquires date perspective parameters of the insurance participation identifications in a preset sequence according to a preset rule to construct a second matrix, and the number of rows and columns of the second matrix is not more than that of the first matrix. The server can divide the first matrix into a plurality of second matrices according to the participation identification, and can also divide the first matrix into a plurality of second matrices according to the settlement date.

And 406, calculating the product of the transposed matrix corresponding to the second matrix and the first matrix in a multi-thread manner to obtain a third matrix, wherein the third matrix is used for expressing the meeting times of the preset sequence of the ginseng and insurance persons and all the ginseng and insurance persons in the medical insurance settlement data.

The server assigns a plurality of second matrices to a plurality of threads in the arithmetic thread path. And the server calculates the product of the transposed matrix corresponding to the second matrix and the first matrix through multiple threads to obtain a third matrix, wherein the third matrix is used for expressing the number of times of meeting of the preset sequence parametres and all the parametres in the medical insurance settlement data.

And step 408, acquiring meeting participation protection marks of different participation protection persons with meeting times larger than a preset meeting threshold value in the third matrix to obtain meeting participation protection marks.

The preset threshold is used for representing the minimum number of times of the illegal participants and insurers appearing simultaneously. And the server classifies different insured persons with the relation number larger than the preset meeting threshold value to obtain meeting insured identification.

According to the medical insurance data analysis method, the data groups are analyzed through the multiple operation channels, the processing efficiency of the computer is improved, and the data analysis time is shortened.

In some embodiments, grouping the medical insurance settlement data according to the settlement date to obtain a data group comprises the following steps:

step 502, analyzing and extracting the medical insurance settlement data to obtain a settlement statement.

The server analyzes and extracts the medical insurance settlement data and generates a settlement statement according to the analyzed data. The server can obtain preset analytic words and extract corresponding analytic data according to the analytic words. The analytic word may be a variable in the medical insurance settlement data. For example, the analytic word may be a reference label, a date of purchase of a medicine, an address of purchase of a medicine, a cost of purchase of a medicine, and the like. The server extracts corresponding analysis data according to the analysis words and correspondingly generates a settlement statement according to the participation identification and the like. The settlement statement can include detailed data of medical insurance settlement performed by the insurer, including the insured identifier and settlement contents in each visit record, for example, the settlement statement can include the insurer identifier, settlement time, settlement address, total amount and the like.

And step 504, checking abnormal values in the settlement list, and correspondingly deleting the abnormal values in the settlement list.

The server checks the abnormal value in the settlement detail table and correspondingly deletes the abnormal value in the settlement detail table. When the data corresponding to the analytic words are numerical values, the server can check abnormal values in the settlement list according to the 3 sigma principle of the standard deviation. For example, when the term of analysis is the purchase fee, the server may calculate the mean value μ and the standard deviation σ of the settlement fee, and then determine that the settlement fee whose numerical value is distributed in (μ - σ, μ + σ) is a normal value and the settlement fee whose numerical value is not distributed in (μ - σ, μ + σ) is an abnormal value. The server may determine that the settlement cost with the numerical value distributed in (μ -2 σ, μ +2 σ) is a normal value and determine that the settlement cost with the numerical value not distributed in (μ -2 σ, μ +2 σ) is an abnormal value. When the data corresponding to the analyzed word is a Chinese character or a character, the server can classify the data to obtain different groups, calculate the content of each group in the data, and set the group with the content smaller than the preset content as an abnormal value.

Step 506, determining medical insurance settlement data parameters most related to the variables of the missing values in the settlement list, and substituting the medical insurance settlement data parameters into the variables of the missing values to obtain the cleaned settlement list.

And the server determines the medical insurance settlement data parameter most related to the variable of the missing value in the settlement list, and substitutes the medical insurance settlement data parameter into the variable of the missing value to obtain the cleaned medical insurance settlement data. For example, when the variable where the missing value is located is the settlement fee, the server may first determine other variables most related to the participation identification corresponding to the missing value, and then substitute the parameter of the medical insurance settlement data of the other variables into the variable where the missing value is located to obtain the cleaned medical insurance settlement data. The server may use the correlation coefficient matrix to determine which variable (e.g., variable Y) is most correlated with the variable (e.g., variable X) in which the missing value is located, and then sort all variables by the value size of Y. The missing value of variable X may be replaced with the parameter of the reference identifier that precedes the missing value. The server may also use a nearest distance decision filling method, a regression filling method, a multiple filling method, a K-nearest neighbor method, an ordered nearest neighbor method, a bayes-based method, or the like to fill the missing value.

And step 508, grouping the cleaned settlement detail tables according to the settlement dates to obtain data groups corresponding to the settlement dates.

And the server groups the cleaned settlement detail list according to the settlement date to obtain a data group corresponding to the settlement date.

In the medical insurance data analysis method, the abnormal values in the medical insurance settlement data are deleted and the missing values are supplemented by cleaning the medical insurance settlement data, so that the consistency of the data is ensured.

In some embodiments, grouping the medical insurance settlement data according to the settlement date to obtain a data group comprises the following steps: determining matrix layout according to the participation identification and the settlement address, and obtaining a correlation coefficient matrix according to the matrix layout and the address perspective parameter of the medical insurance settlement data; calculating the correlation coefficient between the participation protection identifications according to the correlation coefficient matrix; and classifying different insured persons with the correlation coefficient larger than a preset threshold value to obtain a data packet.

The server determines a matrix layout according to the participation identifier and the settlement address, for example, the horizontal rows of the matrix layout may correspond to the participation identifier, and the columns may be according to the settlement time and/or the settlement address. And the server obtains a correlation coefficient matrix according to the matrix layout and the address perspective parameters of the medical insurance settlement data. The correlation coefficient matrix is used to quickly and accurately determine the correlation of two different parameterises occurring at the same time at the same address. And the server calculates the correlation coefficient between the participation identifications according to the correlation coefficient matrix. The server acquires a preset threshold value, wherein the preset threshold value is a minimum threshold value set for ensuring the accuracy of the finally screened illegal participants, and the value range can be 0.6-0.9. And the server classifies different insured persons with the correlation coefficient larger than a preset threshold value to obtain data packets.

In one embodiment, the classification result is compared according to a time axis of medical insurance settlement data, and the illegal insurance participation identification is identified, and the method comprises the following steps: constructing a time axis according to the medical insurance settlement data, and dividing the time axis into a plurality of time periods; mapping settlement moments corresponding to the encounter participation identifiers of the classification results on a time axis; and analyzing the meeting participation identification in each time period on the time axis to obtain the violation participation identification corresponding to the violation participant.

The server constructs a time axis according to the medical insurance settlement data, and divides the time axis into a plurality of time periods. The server may construct a daily or monthly timeline. The minimum unit of the time axis may be 2 hours, 1 hour, 0.5 hour, or the like. The server maps the settlement time corresponding to the encounter participation identification of the classification result on a time axis, and the server maps the settlement time of each participant in the encounter participation identification on the time axis through a pandas component, so that the settlement records of the participant on different settlement dates can be reflected on the time axis. And the server analyzes the participation identification in each time period on the time axis to obtain the violation participation identification corresponding to the violation participation person. The server analyzes the encounter participation identification in each time period to determine the encounter participation identification appearing in the time period, compares the encounter participation identifications determined in different time periods, judges that the corresponding encounter participation identification is an illegal participation identification when the overlapping times of the encounter participation identifications are larger than a preset comparison threshold, and acquires the corresponding illegal participation identification.

It should be understood that although the various steps in the flow charts of fig. 2-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 6, there is provided a medical insurance data analysis apparatus including: a medical insurance data receiving module 602, a data grouping module 604, a data processing module 606, a data classification module 608 and a violation identification recognition module 610, wherein:

the medical insurance data receiving module 602 is configured to receive medical insurance settlement data in a preset period sent by the database, where the medical insurance settlement data carries an insurance participation identifier, a settlement date, and a settlement address.

And the data grouping module 604 is used for grouping the medical insurance settlement data according to the settlement date to obtain data groups.

The data processing module 606 is configured to obtain an operation thread channel, perform multi-thread parallel processing on data packets through the operation thread channel, and obtain an encounter participation and protection identifier, where the operation thread channel is used to perform thread concurrency analysis on date data corresponding to the participation and protection identifier in the data packets, and screen out the encounter participation and protection identifier.

And the data classification module 608 is configured to classify the encounter participation identifier processed in the multithread according to the settlement address to obtain a classification result.

And the violation mark identification module 610 is used for comparing the classification results according to a time axis of the medical insurance settlement data, identifying the violation participation mark, and sending the violation participation mark to the medical insurance terminal.

In some embodiments, the data processing module comprises a queue allocation unit, a shared space setting unit, a thread creation unit, and a thread execution unit, wherein:

and the queue distribution unit is used for distributing the data packets to the operation thread channel and generating a packet queue corresponding to the operation thread channel.

And the shared space setting unit is used for setting a buffer area array with a fixed size as a shared space of the operation thread channel according to the medical insurance settlement data stored in the same address and initializing a counting signal.

And the thread creating unit is used for creating an entry thread and an operation thread in the operation thread channel, updating the set counting signal by the operation thread channel when the entry thread writes data into the buffer area, and determining the synchronization between the entry thread and the operation thread through the counting signal.

In one embodiment, the data processing module comprises a matrix construction unit, a matrix extraction unit, a matrix calculation unit and a classification unit, wherein:

and the matrix construction unit is used for acquiring date perspective parameters corresponding to the participation identifiers in the data packets and constructing a first matrix corresponding to the date perspective parameters.

And the matrix extraction unit is used for acquiring date perspective parameters of the participation identification in the preset sequence according to a preset rule to construct a second matrix, and the number of rows and columns of the second matrix is not more than that of the first matrix.

And the matrix calculation unit is used for calculating the product of the transposed matrix corresponding to the second matrix and the first matrix in a multi-thread manner to obtain a third matrix, and the third matrix is used for expressing the meeting times of the preset sequence ginseng insurance persons and all the ginseng insurance persons in the medical insurance settlement data.

And the classification unit is used for acquiring meeting participation identifiers of different participants with the meeting times larger than a preset meeting threshold value in the third matrix to obtain the meeting participation identifiers.

In one embodiment, the data grouping module comprises a data extraction unit, an exception checking unit, a missing supplement unit and a grouping unit, wherein:

and the data extraction unit is used for analyzing and extracting the medical insurance settlement data to obtain a settlement statement.

And the abnormal investigation unit is used for investigating the abnormal value in the settlement list and correspondingly deleting the abnormal value in the settlement list.

And the missing supplement unit is used for determining the medical insurance settlement data parameters most related to the variables of the missing values in the settlement list and substituting the medical insurance settlement data parameters into the variables of the missing values to obtain the cleaned settlement list.

And the grouping unit is used for grouping the cleaned settlement detail tables according to the settlement dates to obtain data groups corresponding to the settlement dates.

In some embodiments, the data grouping module comprises a matrix generation unit, a correlation coefficient calculation unit, and a grouping classification unit, wherein:

and the matrix generating unit is used for determining matrix layout according to the participation identification and the settlement address and obtaining a correlation coefficient matrix according to the matrix layout and the address perspective parameters of the medical insurance settlement data.

And the correlation coefficient calculation unit is used for calculating the correlation coefficient between the participation identification according to the correlation coefficient matrix.

And the grouping and classifying unit is used for classifying different participants with correlation coefficients larger than a preset threshold value to obtain data groups.

In one embodiment, the violation identification recognition module includes a timeline construction unit, a mapping unit, and a timeline analysis unit, wherein:

and the time axis construction unit is used for constructing a time axis according to the medical insurance settlement data and dividing the time axis into a plurality of time periods.

And the mapping unit is used for mapping the settlement time corresponding to the encounter participation identification of the classification result on a time axis.

And the time axis analysis unit is used for analyzing the meeting participation identification in each time period on the time axis to obtain the violation participation identification corresponding to the violation participation person.

For specific limitations of the medical insurance data analysis device, reference may be made to the above limitations of the medical insurance data analysis method, which are not described herein again. All or part of the modules in the medical insurance data analysis device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The database of the computer device is used for storing medical insurance data analysis data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of medical insurance data analysis.

Those skilled in the art will appreciate that the architecture shown in fig. 7 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: receiving medical insurance settlement data in a preset period sent by a database, wherein the medical insurance settlement data carries an insurance participation identifier, a settlement date and a settlement address; grouping medical insurance settlement data according to the settlement date to obtain data groups; acquiring an operation thread channel, and performing multi-thread parallel processing on the data packet through the operation thread channel to obtain an encounter participation and protection identifier, wherein the operation thread channel is adopted to perform thread concurrent analysis on date data corresponding to the participation and protection identifier in the data packet, and the encounter participation and protection identifier is screened out; classifying the meeting participation identification processed in the multithreading according to the settlement address to obtain a classification result; and comparing the classification results according to a time axis of the medical insurance settlement data, identifying the illegal participation identification, and sending the illegal participation identification to the medical insurance terminal.

In one embodiment, the multithread parallel processing of the data packet through the operation thread channel when the processor executes the computer program to obtain the encounter join guarantee identifier includes: distributing the data packets to an operation thread channel, and generating a packet queue corresponding to the operation thread channel; storing a buffer area array with a fixed size at the same address according to medical insurance settlement data to serve as a shared space of an operation thread channel, and initializing a counting signal; creating a logging thread and an operation thread in the operation thread channel, updating a set counting signal by the operation thread channel when data is written into the buffer area in the logging thread, and determining the synchronization between the logging thread and the operation thread through the counting signal; and storing the data packets into a buffer area array according to the packet queue in a data prefetching mode by adopting an entry thread, wherein the operation thread is used for performing operation processing on the data packets written into the buffer area array to obtain an encounter participation identifier.

In one embodiment, the multithread parallel processing of the data packet through the operation thread channel when the processor executes the computer program to obtain the encounter join guarantee identifier includes: acquiring date perspective parameters corresponding to the participation protection identifiers in the data groups, and constructing a first matrix corresponding to the date perspective parameters; acquiring date perspective parameters of the participation identification in a preset sequence according to a preset rule to construct a second matrix, wherein the number of rows and columns of the second matrix is not more than the number of rows and columns of the first matrix; calculating the product of the transposed matrix corresponding to the second matrix and the first matrix in a multithread manner to obtain a third matrix, wherein the third matrix is used for expressing the number of times of meeting of the preset sequence of the ginseng and insurance persons with all the ginseng and insurance persons in the medical insurance settlement data; and acquiring the meeting participation protection marks of different participation protection persons with the meeting times larger than the preset meeting threshold value in the third matrix to obtain the meeting participation protection marks.

In one embodiment, the grouping of the medical insurance settlement data according to the settlement date implemented when the processor executes the computer program, resulting in a data grouping, comprises: analyzing and extracting the medical insurance settlement data to obtain a settlement statement; checking abnormal values in the settlement statement, and correspondingly deleting the abnormal values in the settlement statement; determining medical insurance settlement data parameters most related to the variables of the missing values in the settlement list, and substituting the medical insurance settlement data parameters into the variables of the missing values to obtain the cleaned settlement list; and grouping the cleaned settlement statement according to the settlement date to obtain a data group corresponding to the settlement date.

In one embodiment, the grouping of the medical insurance settlement data according to the settlement date implemented when the processor executes the computer program, resulting in a data grouping, comprises: determining matrix layout according to the participation identification and the settlement address, and obtaining a correlation coefficient matrix according to the matrix layout and the address perspective parameter of the medical insurance settlement data; calculating the correlation coefficient between the participation identification according to the correlation coefficient matrix; and classifying different insured persons with the correlation coefficient larger than a preset threshold value to obtain a data packet.

In one embodiment, the comparing the classification results according to a time axis of the medical insurance settlement data and identifying the violation participation identification, which is implemented when the processor executes the computer program, includes: constructing a time axis according to the medical insurance settlement data, and dividing the time axis into a plurality of time periods; mapping settlement moments corresponding to the encounter participation identifiers of the classification results on a time axis; and analyzing the meeting participation identification in each time period on the time axis to obtain the violation participation identification corresponding to the violation participant.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: receiving medical insurance settlement data in a preset period sent by a database, wherein the medical insurance settlement data carries an insurance participation identifier, a settlement date and a settlement address; grouping medical insurance settlement data according to the settlement date to obtain data groups; acquiring an operation thread channel, and performing multi-thread parallel processing on the data packet through the operation thread channel to obtain an encounter participation and protection identifier, wherein the operation thread channel is adopted to perform thread concurrent analysis on date data corresponding to the participation and protection identifier in the data packet, and the encounter participation and protection identifier is screened out; classifying the meeting participation identification processed in the multithreading according to the settlement address to obtain a classification result; and comparing the classification results according to a time axis of the medical insurance settlement data, identifying the illegal participation identification, and sending the illegal participation identification to the medical insurance terminal.

In one embodiment, the multithread parallel processing of the data packet through the operation thread channel implemented when the computer program is executed by the processor to obtain the encounter join guarantee identifier includes: distributing the data packets to the operation thread channels, and generating packet queues corresponding to the operation thread channels; storing a buffer area array with a fixed size at the same address according to medical insurance settlement data to serve as a shared space of an operation thread channel, and initializing a counting signal; creating an entry thread and an operation thread in the operation thread channel, updating a set counting signal by the operation thread channel when data are written into the buffer area by the entry thread, and determining the synchronization between the entry thread and the operation thread through the counting signal; and storing the data packets into a buffer area array according to the packet queue in a data prefetching mode by adopting an input thread, wherein the operation thread is used for performing operation processing on the data packets written into the buffer area array to obtain the encounter participation identifier.

In one embodiment, the multithread parallel processing of the data packet through the operation thread channel implemented when the computer program is executed by the processor to obtain the encounter join guarantee identifier includes: acquiring date perspective parameters corresponding to the participation identifiers in the data packets, and constructing a first matrix corresponding to the date perspective parameters; acquiring date perspective parameters of the participation identification in a preset sequence according to a preset rule to construct a second matrix, wherein the number of rows and columns of the second matrix is not more than the number of rows and columns of the first matrix; calculating the product of the transposed matrix corresponding to the second matrix and the first matrix in a multithread manner to obtain a third matrix, wherein the third matrix is used for expressing the number of times of meeting of the preset sequence of the ginseng and insurance persons with all the ginseng and insurance persons in the medical insurance settlement data; and acquiring the meeting participation protection marks of different participation protection persons with the meeting times larger than the preset meeting threshold value in the third matrix to obtain the meeting participation protection marks.

In one embodiment, the computer program when executed by a processor implements grouping of medical insurance settlement data according to settlement dates resulting in data groupings, comprising: analyzing and extracting the medical insurance settlement data to obtain a settlement statement; checking abnormal values in the settlement statement, and correspondingly deleting the abnormal values in the settlement statement; determining medical insurance settlement data parameters most related to the variables of the missing values in the settlement list, and substituting the medical insurance settlement data parameters into the variables of the missing values to obtain the cleaned settlement list; and grouping the cleaned settlement detail tables according to the settlement dates to obtain data groups corresponding to the settlement dates.

In one embodiment, the computer program when executed by a processor implements grouping of medical insurance settlement data according to settlement dates resulting in data groupings, comprising: determining matrix layout according to the participation identification and the settlement address, and obtaining a correlation coefficient matrix according to the matrix layout and the address perspective parameter of the medical insurance settlement data; calculating the correlation coefficient between the participation identification according to the correlation coefficient matrix; and classifying different insured persons with the correlation coefficient larger than a preset threshold value to obtain data groups.

In one embodiment, the comparing the classification results according to a time axis of the medical insurance settlement data and identifying the violation participation identification, implemented by the computer program when executed by the processor, includes: constructing a time axis according to the medical insurance settlement data, and dividing the time axis into a plurality of time periods; mapping settlement moments corresponding to the encounter participation identifiers of the classification results on a time axis; and analyzing the meeting participation identification in each time period on the time axis to obtain the violation participation identification corresponding to the violation participation person.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of medical insurance data analysis, the method comprising:

acquiring an operation thread channel and date perspective parameters corresponding to insurance marks in the data packets, constructing a first matrix corresponding to the date perspective parameters, acquiring date perspective parameters of the insurance marks in a preset sequence according to a preset rule to construct a second matrix, wherein the number of rows and the number of columns of the second matrix are not more than the number of rows and the number of columns of the first matrix, and performing multi-thread calculation on the product of a transposed matrix corresponding to the second matrix and the first matrix to obtain a third matrix, wherein the third matrix is used for expressing the number of times of meeting between preset sequence insurance persons and all insurance persons in medical insurance settlement data, acquiring meeting insurance marks of different insurance persons with the number of times of meeting more than a preset meeting threshold value in the third matrix to obtain meeting insurance marks, wherein the operation thread channel is adopted to perform thread concurrent analysis on the date data corresponding to the insurance marks in the data packets, and the meeting insurance marks are screened out;

classifying the meeting participation identification processed in a multithread mode according to the settlement address to obtain a classification result;

2. The method of claim 1, wherein the performing multi-threaded parallel processing on the data packet through the operation thread channel to obtain an encounter-join-guarantee identifier comprises:

creating an input thread and an operation thread in the operation thread channel, updating a set counting signal by the operation thread channel when data are written into a buffer area by the input thread, and determining the synchronization between the input thread and the operation thread through the counting signal;

3. The method of claim 1, wherein said grouping said medical insurance settlement data according to said settlement date into data groups comprises:

determining medical insurance settlement data parameters most related to the variables of the missing values in the settlement detail table, and substituting the medical insurance settlement data parameters into the variables of the missing values to obtain the cleaned settlement detail table;

4. The method of claim 1, wherein said grouping said medical insurance settlement data according to said settlement date into data groups comprises:

5. The method of claim 1, wherein the comparing the classification results according to the time axis of the medical insurance settlement data to identify an illegal insurance participation identifier comprises:

6. A medical insurance data analysis apparatus, the apparatus comprising:

the violation identification module is used for comparing the classification results according to the time axis of the medical insurance settlement data, identifying violation participation identification and sending the violation participation identification to the medical insurance terminal;

the data processing module comprises a matrix construction unit, a matrix extraction unit, a matrix calculation unit and a classification unit, wherein:

the matrix construction unit is used for acquiring date perspective parameters corresponding to the participation identifiers in the data packets and constructing a first matrix corresponding to the date perspective parameters;

the matrix extraction unit is used for acquiring date perspective parameters of the participation identification in a preset sequence according to a preset rule to construct a second matrix, and the number of rows and columns of the second matrix is not more than the number of rows and columns of the first matrix;

the matrix calculation unit is used for calculating the product of the transposed matrix corresponding to the second matrix and the first matrix through multiple threads to obtain a third matrix, and the third matrix is used for representing the number of times of meeting of the preset sequence reference insurance persons and all reference insurance persons in the medical insurance settlement data;

and the classification unit is used for acquiring the meeting participation identification of different participation persons with the meeting times larger than a preset meeting threshold value in the third matrix to obtain the meeting participation identification.

7. The apparatus of claim 6, wherein the data processing module comprises:

8. The apparatus of claim 6, wherein the data grouping module comprises a data extraction unit, an exception checking unit, a missing supplement unit, and a grouping unit, wherein:

the data extraction unit is used for analyzing and extracting the medical insurance settlement data to obtain a settlement statement;

the abnormal investigation unit is used for investigating abnormal values in the settlement statement and correspondingly deleting the abnormal values in the settlement statement;

the missing supplement unit is used for determining medical insurance settlement data parameters most relevant to the variables of the missing values in the settlement list and substituting the medical insurance settlement data parameters into the variables of the missing values to obtain the cleaned settlement list;

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.